• Stars
    star
    263
  • Rank 155,624 (Top 4 %)
  • Language
    C++
  • License
    Other
  • Created almost 11 years ago
  • Updated almost 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A C++ implementaton of MapReduce without distributed filesystem

MapReduce Lite is a C++ implementation of the MapReduce programming paradigm.

Pros

First of all, MapReduce Lite is Lite!

  • It does not rely on a distributed filesystem -- it can simply use local filesystem;
  • It does not have a dynamic task scheduling system -- the map/reduce tasks were scheduled before the parallel job is started;
  • There is zero deployment / configuration cost -- just link your program against the MapReduce Lite library statically and run it.

In addition to the functions described in Google's famous MapReduce paper, known as batch reduction mode in MapReduce Lite, there is also an incremental reduction mode, doing the shuffling phase in memory and without disk access. In this mode, MapReduce Lite programs run much faster than rigid implementations like Hadoop.

Cons

As a lite implementation, MapReduce Lite does not support fault recovery, which, however, is arguably not too difficult to achieve if we do not require backup workers or global counters and can use a distributed filesystem (DFS).

Applications

In Tencent, we have been using MapReduce Lite with a Tencent's DFS to run jobs like search engine log processing, search and ads click model training, and distributed language model training.

A Sample

using mapreduce_lite::Mapper;
using mapreduce_lite::BatchReducer;
using mapreduce_lite::ReduceInputIterator;

class WordCountMapper : public Mapper {
 public:
  void Map(const std::string& key, const std::string& value) {
    std::vector<std::string> words;
    SplitStringUsing(value, " ", &words);
    for (int i = 0; i < words.size(); ++i) {
      Output(words[i], "1");
    }
  }
};
REGISTER_MAPPER(WordCountMapper);

class WordCountBatchReducer : public BatchReducer {
 public:
  void Reduce(const string& key, ReduceInputIterator* values) {
    int sum = 0;
    LOG(INFO) << "key:[" << key << "]";
    for (; !values->Done(); values->Next()) {
      //LOG(INFO) << "value:[" << values->value() << "]";
      istringstream parser(values->value());
      int count = 0;
      parser >> count;
      sum += count;
    }
    ostringstream formater;
    formater << key << " " << sum;
    Output(key, formater.str());
  }
};
REGISTER_BATCH_REDUCER(WordCountBatchReducer);

Install

Please refer to the HowToInstall document.

Updates

  1. 2013-10-4: MapReduce Lite supports Mac OS X and FreeBSD in addition to Linux. You can build your MapReduce Lite programs using GCC or Clang.

More Repositories

1

gotorch

A Go idiomatic binding to the C++ core of PyTorch
Go
316
star
2

huggingface-tokenizer-in-cxx

C++
49
star
3

go-cpp

This project demonstrates how to let Go programs invoke statically linked C++ libraries without using SWIG.
C++
45
star
4

k8s-ml

42
star
5

graphviz-server

A Racket Web server and Ajax client to convert Graphviz source code embedded in HTML pages into PNG images.
Racket
24
star
6

paddle-code-browse

Shell
23
star
7

parallel

Some OpenMP like syntax for Go
Go
23
star
8

deeplearning

The Gritty Details of Deep Learning
Shell
21
star
9

learn-cuda

Cuda
20
star
10

risk_model

This is an tutorial on credit risk model designed for peer-to-peer lending (Internet finance)
TeX
19
star
11

hmm

A hidden Markov model implementation
Go
17
star
12

recordio

Go
11
star
13

markdown-renderer

An HTTP server that renders Markdown documents loaded from a downstream HTTP server. It can be used with Nginx as a rendering filter.
CSS
11
star
14

lua-vs-go

HTML
9
star
15

RussCoxAcmeTourSubtitle

Subtitle of Russ Cox's video tour of Acme.
9
star
16

ipynb

A Go package of Jupyter Notebook format
Go
7
star
17

ios-go-jsonrpc

Objective-C
7
star
18

goyaccfmt

Pretty format goyacc source code
Go
7
star
19

markdown-converter

A Racket module using FFI to load Discount and support Markdown text conversion.
Racket
6
star
20

fs

A Go package that can access files on local filesystem, HDFS and an in-memory filesystem designed for unit testing.
Go
5
star
21

phoenix-1

Go
5
star
22

code-review-what

C++
4
star
23

build-statically-linked-go-programs

Go
4
star
24

mysql-server-in-docker

Connect from the host to MySQL server running in a Docker container
Shell
4
star
25

weakand

Go
4
star
26

prism

Prism is a RPC server for deploying and running distributed systems.
Go
3
star
27

cjk-tokenzier

A unigram CJK tokenizer
C++
3
star
28

gritty-details-latent-topic-modeling

TeX
3
star
29

google-libs

Blade BUILD files that refers to Google libraries installed on your MacOSX computers as //thirdparty
C++
3
star
30

hdfs

The Reader/Writer interface wrapping github.com/zyxar/hdfs
Go
3
star
31

ci

Go
3
star
32

chinese-whisper

Chinese Whisper is an interesting program introduced by Rob Pike. Here we have the program implemented in Go, Scheme and maybe other languages.
Racket
3
star
33

jsonrpc

Go
2
star
34

nan

`nan` rewrites your Go program to add panic checks for NaN. It can also remove these panic checks. Machine learning guys would love `nan`.
Go
2
star
35

wangkuiyi.github.io

HTML
2
star
36

notes

Python
2
star
37

eisvogel-cjk

Pandoc template support CJK
2
star
38

swiftgo

A slide that compare Swift and Go, the two new programming languages
2
star
39

canonicalize-go-python-grpc-dev-env

Go
1
star
40

kubernetes

1
star
41

fluid-try

An experimental, simplified re-implementation of PaddlePaddle Fluid in a more concise way
Python
1
star
42

cudago

Go
1
star
43

cxxtorch

A demo using PyTorch C++ functional API
Go
1
star
44

learn-calcite

Dockerfile
1
star
45

sqlfs

Go io.Write and io.Reader implementations that treats a MySQL table as a file
Go
1
star
46

sqlflowserver

The gRPC proxy server of SQL engines
Go
1
star
47

gonet

Go CSP model over network
Go
1
star
48

mmdn

TeX
1
star
49

hevea-xelatex

Shows how to write Chinese document using LaTeX and generate HTML and PDF.
TeX
1
star
50

dockerize-devbox

Dockerfile
1
star
51

avl

C
1
star
52

nn

Go
1
star
53

farm

Fast routines for the ARM processors
C++
1
star
54

pytorch-distributed-tutorials

Python
1
star
55

gocopycat

How if we want to change part of a Go package while keep the rest?
Go
1
star
56

sstable

1
star
57

mdfmt

Auto reformat Markdown files
Shell
1
star
58

delay-srt

delay-srt is a Racket (Scheme) program which increases/decreases the delay of an SRT subtitle file.
Racket
1
star
59

repo

1
star