Question
Using map reduce to count word frequency.
Example
chunk1: "Google Bye GoodBye Hadoop code"
chunk2: "lintcode code Bye"
Get MapReduce result:
Bye: 2
GoodBye: 1
Google: 1
Hadoop: 1
code: 2
lintcode: 1
Solution
MapReduce的map和reduce基本操作。
class WordCount:
# @param {str} line a text, for example "Bye Bye see you next"
def mapper(self, _, line):
# Write your code here
# Please use 'yield key, value'
# 這個(gè)實(shí)際上就是單純的統(tǒng)計(jì)詞頻鳄乏,但是使用yield,結(jié)果會(huì)被buffer收集起來的
for word in line.split():
yield word, 1
# @param key is from mapper
# @param values is a set of value with the same key
def reducer(self, key, values):
# Write your code here
# Please use 'yield key, value'
# values 是一組數(shù)字链方,代表key在不同的mapper或者chunck里面出現(xiàn)的次數(shù)
yield key, sum(values)