vscode 安裝 Java 開(kāi)發(fā)擴(kuò)展
安裝 Extension Pack for Java 擴(kuò)展,其包含了一系列 java 開(kāi)發(fā)所需要的擴(kuò)展
安裝 Maven
brew install maven
創(chuàng)建 WordCount 項(xiàng)目
VSCode 控制臺(tái)中輸入 Maven:Create Meaven Project朗徊,基于 maven-archetype-quickstart 創(chuàng)建 wordcount 項(xiàng)目
pom.xml
中添加 hadoop-core
首妖、hadoop-common
依賴(lài)
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.henryme</groupId>
<artifactId>wordcount</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>wordcount</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.3.4</version>
</dependency>
</dependencies>
</project>
代碼實(shí)現(xiàn)
MapReduce 任務(wù)過(guò)程可以分為兩個(gè)處理階段:map 階段和 reduce 階段,每個(gè)階段都已鍵值對(duì)作為輸入和輸出爷恳。
例如對(duì)于字符串 a a b
悯搔,我們期望 WordCount 計(jì)算的最終結(jié)果是:
a 2
b 1
map 函數(shù)的輸出為
a 1
a 1
b 1
然后 reduce 階段對(duì)同個(gè) key 的 value 進(jìn)行累加异雁,得到最終結(jié)果
Mapper 類(lèi)實(shí)現(xiàn)
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Reducer 類(lèi)實(shí)現(xiàn)
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
WordCount 實(shí)現(xiàn)
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: WordCount <input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJobName("word count");
job.setMapperClass(TokenizerMapper.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
運(yùn)行
VSCode 運(yùn)行時(shí)趋箩,可以在 launch.json
文件中添加如下配置
{
"type": "java",
"name": "Launch WordCount",
"request": "launch",
"mainClass": "com.henryme.WordCount",
"projectName": "wordcount",
"args": [
"~/Desktop/input",
"~/Desktop/output",
]
}
WordCount 的計(jì)算結(jié)果存儲(chǔ)在 output/part-r-00000
文件中
問(wèn)題與解決
運(yùn)行出現(xiàn) Type 'org/apache/hadoop/metrics2/lib/DefaultMetricsSystem' (current frame, stack[2]) is not assignab
報(bào)錯(cuò)
hadoop-core 和 hadoop-common 依賴(lài)順序問(wèn)題,pom.xml
中調(diào)換二者順序解決