Version:1.0 StartHTML:000000208 EndHTML:000056271 StartFragment:000013223 EndFragment:000056185 StartSelection:000013223 EndSelection:000056149 SourceURL:https://www.cnblogs.com/airnew/p/9574309.html <title>一起學(xué)Hadoop——使用自定義Partition實(shí)現(xiàn)hadoop部分排序 - summer哥 - 博客園</title><link href="/bundles/blog-common.css?v=D7Le-lOZiZVAXQkZQuNwdTWqjabXaVBE_2YAWzY_YZs1" rel="stylesheet" type="text/css"><link id="MainCss" href="/skins/iMetro_HD/bundle-iMetro_HD.css?v=cDVgAMQ7FTtxrKvup-MCLDNCyny4xFuA__ZZf74gd_s1" rel="stylesheet" type="text/css"><link id="mobile-style" href="/skins/iMetro_HD/bundle-iMetro_HD-mobile.css?v=VYvBZbXPqDcuZyq3IqW6JPMJ-xBqKhDr2P6dzCO3y041" rel="stylesheet" type="text/css" media="only screen and (max-width: 767px)"><link title="RSS" rel="alternate" type="application/rss+xml"><link title="RSD" rel="EditURI" type="application/rsd+xml"><link rel="wlwmanifest" type="application/wlwmanifest+xml"> <script type="text/javascript">var currentBlogApp = 'airnew', cb_enable_mathjax=true;var isLogined=true;</script> <script type="text/x-mathjax-config;executed=true"> MathJax.Hub.Config({ tex2jax: { inlineMath: [['/script>,'/script>], ['\(','\)']], processClass: 'blogpost-body', processEscapes: true }, TeX: { equationNumbers: { autoNumber: ['AMS'], useLabelIds: true }, extensions: ['extpfeil.js'] }, 'HTML-CSS': { linebreaks: { automatic: true } }, SVG: { linebreaks: { automatic: true } } }); </script> <style type="text/css">.MathJax_Hover_Frame {border-radius: .25em; -webkit-border-radius: .25em; -moz-border-radius: .25em; -khtml-border-radius: .25em; box-shadow: 0px 0px 15px #83A; -webkit-box-shadow: 0px 0px 15px #83A; -moz-box-shadow: 0px 0px 15px #83A; -khtml-box-shadow: 0px 0px 15px #83A; border: 1px solid #A6D ! important; display: inline-block; position: absolute} .MathJax_Menu_Button .MathJax_Hover_Arrow {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 4px; -webkit-border-radius: 4px; -moz-border-radius: 4px; -khtml-border-radius: 4px; font-family: 'Courier New',Courier; font-size: 9px; color: #F0F0F0} .MathJax_Menu_Button .MathJax_Hover_Arrow span {display: block; background-color: #AAA; border: 1px solid; border-radius: 3px; line-height: 0; padding: 4px} .MathJax_Hover_Arrow:hover {color: white!important; border: 2px solid #CCC!important} .MathJax_Hover_Arrow:hover span {background-color: #CCC!important} </style> <style type="text/css">#MathJax_About {position: fixed; left: 50%; width: auto; text-align: center; border: 3px outset; padding: 1em 2em; background-color: #DDDDDD; color: black; cursor: default; font-family: message-box; font-size: 120%; font-style: normal; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; border-radius: 15px; -webkit-border-radius: 15px; -moz-border-radius: 15px; -khtml-border-radius: 15px; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} #MathJax_About.MathJax_MousePost {outline: none} .MathJax_Menu {position: absolute; background-color: white; color: black; width: auto; padding: 2px; border: 1px solid #CCCCCC; margin: 0; cursor: default; font: menu; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} .MathJax_MenuItem {padding: 2px 2em; background: transparent} .MathJax_MenuArrow {position: absolute; right: .5em; padding-top: .25em; color: #666666; font-size: .75em} .MathJax_MenuActive .MathJax_MenuArrow {color: white} .MathJax_MenuArrow.RTL {left: .5em; right: auto} .MathJax_MenuCheck {position: absolute; left: .7em} .MathJax_MenuCheck.RTL {right: .7em; left: auto} .MathJax_MenuRadioCheck {position: absolute; left: 1em} .MathJax_MenuRadioCheck.RTL {right: 1em; left: auto} .MathJax_MenuLabel {padding: 2px 2em 4px 1.33em; font-style: italic} .MathJax_MenuRule {border-top: 1px solid #CCCCCC; margin: 4px 1px 0px} .MathJax_MenuDisabled {color: GrayText} .MathJax_MenuActive {background-color: Highlight; color: HighlightText} .MathJax_MenuDisabled:focus, .MathJax_MenuLabel:focus {background-color: #E8E8E8} .MathJax_ContextMenu:focus {outline: none} .MathJax_ContextMenu .MathJax_MenuItem:focus {outline: none} #MathJax_AboutClose {top: .2em; right: .2em} .MathJax_Menu .MathJax_MenuClose {top: -10px; left: -10px} .MathJax_MenuClose {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; font-family: 'Courier New',Courier; font-size: 24px; color: #F0F0F0} .MathJax_MenuClose span {display: block; background-color: #AAA; border: 1.5px solid; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; line-height: 0; padding: 8px 0 6px} .MathJax_MenuClose:hover {color: white!important; border: 2px solid #CCC!important} .MathJax_MenuClose:hover span {background-color: #CCC!important} .MathJax_MenuClose:hover:focus {outline: none} </style> <style type="text/css">.MathJax_Preview .MJXf-math {color: inherit!important} </style> <style type="text/css">.MJX_Assistive_MathML {position: absolute!important; top: 0; left: 0; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display: block!important; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none} .MJX_Assistive_MathML.MJX_Assistive_MathML_Block {width: 100%!important} </style> <style type="text/css">#MathJax_Zoom {position: absolute; background-color: #F0F0F0; overflow: auto; display: block; z-index: 301; padding: .5em; border: 1px solid black; margin: 0; font-weight: normal; font-style: normal; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box; box-shadow: 5px 5px 15px #AAAAAA; -webkit-box-shadow: 5px 5px 15px #AAAAAA; -moz-box-shadow: 5px 5px 15px #AAAAAA; -khtml-box-shadow: 5px 5px 15px #AAAAAA; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} #MathJax_ZoomOverlay {position: absolute; left: 0; top: 0; z-index: 300; display: inline-block; width: 100%; height: 100%; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)} #MathJax_ZoomFrame {position: relative; display: inline-block; height: 0; width: 0} #MathJax_ZoomEventTrap {position: absolute; left: 0; top: 0; z-index: 302; display: inline-block; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)} </style> <style type="text/css">.MathJax_Preview {color: #888} #MathJax_Message {position: fixed; left: 1px; bottom: 2px; background-color: #E6E6E6; border: 1px solid #959595; margin: 0px; padding: 2px 8px; z-index: 102; color: black; font-size: 80%; width: auto; white-space: nowrap} #MathJax_MSIE_Frame {position: absolute; top: 0; left: 0; width: 0px; z-index: 101; border: 0px; margin: 0px; padding: 0px} .MathJax_Error {color: #CC0000; font-style: italic} </style> <style type="text/css">.MJXp-script {font-size: .8em} .MJXp-right {-webkit-transform-origin: right; -moz-transform-origin: right; -ms-transform-origin: right; -o-transform-origin: right; transform-origin: right} .MJXp-bold {font-weight: bold} .MJXp-italic {font-style: italic} .MJXp-scr {font-family: MathJax_Script,'Times New Roman',Times,STIXGeneral,serif} .MJXp-frak {font-family: MathJax_Fraktur,'Times New Roman',Times,STIXGeneral,serif} .MJXp-sf {font-family: MathJax_SansSerif,'Times New Roman',Times,STIXGeneral,serif} .MJXp-cal {font-family: MathJax_Caligraphic,'Times New Roman',Times,STIXGeneral,serif} .MJXp-mono {font-family: MathJax_Typewriter,'Times New Roman',Times,STIXGeneral,serif} .MJXp-largeop {font-size: 150%} .MJXp-largeop.MJXp-int {vertical-align: -.2em} .MJXp-math {display: inline-block; line-height: 1.2; text-indent: 0; font-family: 'Times New Roman',Times,STIXGeneral,serif; white-space: nowrap; border-collapse: collapse} .MJXp-display {display: block; text-align: center; margin: 1em 0} .MJXp-math span {display: inline-block} .MJXp-box {display: block!important; text-align: center} .MJXp-box:after {content: " "} .MJXp-rule {display: block!important; margin-top: .1em} .MJXp-char {display: block!important} .MJXp-mo {margin: 0 .15em} .MJXp-mfrac {margin: 0 .125em; vertical-align: .25em} .MJXp-denom {display: inline-table!important; width: 100%} .MJXp-denom > * {display: table-row!important} .MJXp-surd {vertical-align: top} .MJXp-surd > * {display: block!important} .MJXp-script-box > * {display: table!important; height: 50%} .MJXp-script-box > * > * {display: table-cell!important; vertical-align: top} .MJXp-script-box > *:last-child > * {vertical-align: bottom} .MJXp-script-box > * > * > * {display: block!important} .MJXp-mphantom {visibility: hidden} .MJXp-munderover {display: inline-table!important} .MJXp-over {display: inline-block!important; text-align: center} .MJXp-over > * {display: block!important} .MJXp-munderover > * {display: table-row!important} .MJXp-mtable {vertical-align: .25em; margin: 0 .125em} .MJXp-mtable > * {display: inline-table!important; vertical-align: middle} .MJXp-mtr {display: table-row!important} .MJXp-mtd {display: table-cell!important; text-align: center; padding: .5em 0 0 .5em} .MJXp-mtr > .MJXp-mtd:first-child {padding-left: 0} .MJXp-mtr:first-child > .MJXp-mtd {padding-top: 0} .MJXp-mlabeledtr {display: table-row!important} .MJXp-mlabeledtr > .MJXp-mtd:first-child {padding-left: 0} .MJXp-mlabeledtr:first-child > .MJXp-mtd {padding-top: 0} .MJXp-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 1px 3px; font-style: normal; font-size: 90%} .MJXp-scale0 {-webkit-transform: scaleX(.0); -moz-transform: scaleX(.0); -ms-transform: scaleX(.0); -o-transform: scaleX(.0); transform: scaleX(.0)} .MJXp-scale1 {-webkit-transform: scaleX(.1); -moz-transform: scaleX(.1); -ms-transform: scaleX(.1); -o-transform: scaleX(.1); transform: scaleX(.1)} .MJXp-scale2 {-webkit-transform: scaleX(.2); -moz-transform: scaleX(.2); -ms-transform: scaleX(.2); -o-transform: scaleX(.2); transform: scaleX(.2)} .MJXp-scale3 {-webkit-transform: scaleX(.3); -moz-transform: scaleX(.3); -ms-transform: scaleX(.3); -o-transform: scaleX(.3); transform: scaleX(.3)} .MJXp-scale4 {-webkit-transform: scaleX(.4); -moz-transform: scaleX(.4); -ms-transform: scaleX(.4); -o-transform: scaleX(.4); transform: scaleX(.4)} .MJXp-scale5 {-webkit-transform: scaleX(.5); -moz-transform: scaleX(.5); -ms-transform: scaleX(.5); -o-transform: scaleX(.5); transform: scaleX(.5)} .MJXp-scale6 {-webkit-transform: scaleX(.6); -moz-transform: scaleX(.6); -ms-transform: scaleX(.6); -o-transform: scaleX(.6); transform: scaleX(.6)} .MJXp-scale7 {-webkit-transform: scaleX(.7); -moz-transform: scaleX(.7); -ms-transform: scaleX(.7); -o-transform: scaleX(.7); transform: scaleX(.7)} .MJXp-scale8 {-webkit-transform: scaleX(.8); -moz-transform: scaleX(.8); -ms-transform: scaleX(.8); -o-transform: scaleX(.8); transform: scaleX(.8)} .MJXp-scale9 {-webkit-transform: scaleX(.9); -moz-transform: scaleX(.9); -ms-transform: scaleX(.9); -o-transform: scaleX(.9); transform: scaleX(.9)} .MathJax_PHTML .noError {vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}</style>
排序在很多業(yè)務(wù)場(chǎng)景都要用到涧窒,今天本文介紹如何借助于自定義Partition類實(shí)現(xiàn)hadoop部分排序奶段。本文還是使用java和python實(shí)現(xiàn)排序代碼丹禀。
1挑豌、部分排序骚腥。
部分排序就是在每個(gè)文件中都是有序的隧哮,和其他文件沒有關(guān)系熟掂,其實(shí)很多業(yè)務(wù)場(chǎng)景就需要到部分排序桨昙,而不需要全局排序讲弄。例如措左,有個(gè)水果電商網(wǎng)站,要對(duì)每個(gè)月的水果的銷量進(jìn)行排序避除,我們可以把reduce進(jìn)程之后的文件分成12份怎披,對(duì)應(yīng)1到12月份。每個(gè)文件按照水果的銷量從高到底排序瓶摆,1月份的排序和其他月份的排序沒有任何關(guān)系凉逛。
原始數(shù)據(jù)如下,有三個(gè)字段群井,第一個(gè)字段是水果名稱状飞,第二個(gè)字段是銷售月份,第三個(gè)字段是銷售量书斜,
Apple 201701 20
Pear 201701 30
Banana 201701 40
Orange 201701 90
Apple 201702 50
Pear 201702 60
Banana 201702 20
Orange 201702 10
Apple 201703 230
Pear 201703 302
Banana 201703 140
Orange 201703 290
Apple 201704 30
Pear 201704 102
Banana 201704 240
Orange 201704 190
經(jīng)過部分排序后會(huì)生成12個(gè)文件诬辈,內(nèi)容如下,銷量按照從高到低排序
Pear 302
Orange 290
Apple 230
Banana 140
實(shí)現(xiàn)思路:
1荐吉、自定義Partition類焙糟,因?yàn)橐荒暧?2個(gè)月 ,因此需要12個(gè)分區(qū)样屠,同時(shí)在MapReduce入口類中要指定Partition類穿撮,以及partition的數(shù)量。
2痪欲、在map函數(shù)中將年月作為key值悦穿,value變?yōu)椤癆pple_20”的格式。
3勤揩、在reduce函數(shù)中比較每種水果的銷量咧党,按照從高到低排序。
Java代碼如下陨亡,Map類:
[[圖片上傳失敗...(image-c6345b-1535881942398)]](javascript:void(0); "復(fù)制代碼")
<pre>1 public class PartSortMap extends Mapper<LongWritable,Text,Text,Text> { 2
3 public void map(LongWritable key,Text value,Context context)throws IOException,InterruptedException{ 4 String line = value.toString();//讀取一行數(shù)據(jù)傍衡,數(shù)據(jù)格式為“Apple 201701 30”
5 String str[] = line.split(" ");//
6 //年月當(dāng)做key值深员,因?yàn)橐鶕?jù)key值設(shè)置分區(qū),而Apple+“”+銷量當(dāng)做value
7 context.write(new Text(str[1]),new Text(str[0] + "" + str[2])); 8 } 9 }</pre>
[[圖片上傳失敗...(image-3ec0d8-1535881942398)]](javascript:void(0); "復(fù)制代碼")
自定義Partition類:
[[圖片上傳失敗...(image-5c5780-1535881942398)]](javascript:void(0); "復(fù)制代碼")
<pre> 1 public class PartParttition extends Partitioner<Text, Text> { 2 public int getPartition(Text arg0, Text arg1, int arg2) { 3 String key = arg0.toString(); 4 int month = Integer.parseInt(key.substring(4, key.length()));
5 if (month == 1) {
6 return 1 % arg2; 7 } else if (month == 2) {
8 return 2 % arg2; 9 } else if (month == 3) { 10 return 3 % arg2; 11 }else if (month == 4) { 12 return 4 % arg2; 13 }else if (month == 5) { 14 return 5 % arg2; 15 }else if (month == 6) { 16 return 6 % arg2; 17 }else if (month == 7) { 18 return 7 % arg2; 19 }else if (month == 8) { 20 return 8 % arg2; 21 }else if (month == 9) { 22 return 9 % arg2; 23 }else if (month == 10) { 24 return 10 % arg2; 25 }else if (month == 11) { 26 return 11 % arg2; 27 }else if (month == 12) { 28 return 12 % arg2; 29 } 30 return 0; 31 } 32 }</pre>
[[圖片上傳失敗...(image-487b42-1535881942398)]](javascript:void(0); "復(fù)制代碼")
Reduce類:
[[圖片上傳失敗...(image-19ba30-1535881942398)]](javascript:void(0); "復(fù)制代碼")
<pre> 1 public class PartSortReduce extends Reducer<Text,Text,Text,Text> { 2 class FruitSales implements Comparable<FruitSales>{
3 private String name;//水果名字
4 private double sales;//水果銷量
5 public void setName(String name){ 6 this.name = name; 7 }
8
9 public String getName(){ 10 return this.name; 11 } 12 public void setSales(double sales){ 13 this.sales = sales; 14 } 15
16 public double getSales() { 17 return this.sales; 18 } 19
20 @Override 21 public int compareTo(FruitSales o) { 22 if(this.getSales() > o.getSales()){ 23 return -1; 24 }else if(this.getSales() == o.getSales()){ 25 return 0; 26 }else { 27 return 1; 28 } 29 } 30 } 31
32 public void reduce(Text key, Iterable<Text> values,Context context)throws IOException,InterruptedException{ 33 List<FruitSales> fruitList = new ArrayList<FruitSales>(); 34
35 for(Text value: values) { 36 String[] str = value.toString().split("_"); 37 FruitSales f = new FruitSales(); 38 f.setName(str[0]); 39 f.setSales(Double.parseDouble(str[1])); 40 fruitList.add(f); 41 } 42 Collections.sort(fruitList); 43
44 for(FruitSales f : fruitList){ 45 context.write(new Text(f.getName()),new Text(String.valueOf(f.getSales()))); 46 } 47 } 48 }</pre>
[[圖片上傳失敗...(image-758ca4-1535881942398)]](javascript:void(0); "復(fù)制代碼")
入口類:
[[圖片上傳失敗...(image-dee533-1535881942398)]](javascript:void(0); "復(fù)制代碼")
<pre> 1 public class PartSortMain { 2 public static void main(String[] args)throws Exception{ 3 Configuration conf = new Configuration(); 4 //獲取運(yùn)行時(shí)輸入的參數(shù)蛙埂,一般是通過shell腳本文件傳進(jìn)來倦畅。
5 String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs(); 6 if(otherArgs.length < 2){
7 System.err.println("必須輸入讀取文件路徑和輸出路徑");
8 System.exit(2);
9 } 10 Job job = new Job(); 11 job.setJarByClass(PartSortMain.class); 12 job.setJobName("PartSort app"); 13
14 //設(shè)置讀取文件的路徑,都是從HDFS中讀取绣的。讀取文件路徑從腳本文件中傳進(jìn)來
15 FileInputFormat.addInputPath(job,new Path(args[0])); 16
17 //設(shè)置mapreduce程序的輸出路徑叠赐,MapReduce的結(jié)果都是輸入到文件中
18 FileOutputFormat.setOutputPath(job,new Path(args[1])); 19
20
21 job.setPartitionerClass(PartParttition.class);//設(shè)置自定義partition類
22 job.setNumReduceTasks(12);//設(shè)置為partiton數(shù)量 23 //設(shè)置實(shí)現(xiàn)了map函數(shù)的類
24 job.setMapperClass(PartSortMap.class); 25
26 //設(shè)置實(shí)現(xiàn)了reduce函數(shù)的類
27 job.setReducerClass(PartSortReduce.class); 28
29 //設(shè)置reduce函數(shù)的key值
30 job.setOutputKeyClass(Text.class); 31 //設(shè)置reduce函數(shù)的value值
32 job.setOutputValueClass(Text.class); 33
34 System.exit(job.waitForCompletion(true) ? 0 :1); 35 } 36 }</pre>
[[圖片上傳失敗...(image-2bc744-1535881942398)]](javascript:void(0); "復(fù)制代碼")
運(yùn)行后會(huì)在hdfs中生成12個(gè)文件,如下圖所示:
查看其中的一個(gè)文件會(huì)看到如下的內(nèi)容:
可以看到是按照銷量從高到低排序屡江。
使用Python實(shí)現(xiàn)部分排序芭概。
Python使用streaming的方式實(shí)現(xiàn)MapReduce,和Java方式不一樣惩嘉,不能自定義Partition罢洲,但是可以在腳本文件中指定哪個(gè)字段用作partition,哪個(gè)字段用于排序文黎。
下圖顯示數(shù)據(jù)經(jīng)過部分排序之后惹苗,數(shù)據(jù)變化的過程。即原始數(shù)據(jù)耸峭,經(jīng)過map函數(shù)桩蓉,然后到reduce函數(shù),最終在每個(gè)文件中按照銷量從高到底排序的過程:
上圖中的第一步是在map函數(shù)中將原始數(shù)據(jù)的第二列的“年月”轉(zhuǎn)換成“月”劳闹,當(dāng)做partition院究,將銷量當(dāng)做key,水果名當(dāng)做value玷或。第二步是經(jīng)過MapReduce的排序之后到達(dá)Reduce函數(shù)之間的結(jié)果儡首。第三步是在reduce函數(shù)中將map輸入的數(shù)據(jù)中將key當(dāng)做reduce的value,將value當(dāng)做reduce的key偏友。
代碼如下:
map_sort.py
[[圖片上傳失敗...(image-90ef10-1535881942391)]](javascript:void(0); "復(fù)制代碼")
<pre> 1 #!/usr/bin/python
2 import sys 3 base_numer = 99999
4 for line in sys.stdin: 5 ss = line.strip().split(' ')
6 fruit = ss[0] 7 yearmm = ss[1]
8 sales = ss[2]
9 new_key = base_number - int(sales) 10 mm = yearmm[4:6] 11 print "%s\t%s\t%s" % (int(mm), int(new_key), fruit)</pre>
[[圖片上傳失敗...(image-3ebe61-1535881942391)]](javascript:void(0); "復(fù)制代碼")
reduce_sort.py
[[圖片上傳失敗...(image-c4854d-1535881942391)]](javascript:void(0); "復(fù)制代碼")
<pre>1 #!/usr/bin/python
2 import sys 3 base_number = 99999
4 for line in sys.stdin: 5 idx_id, sales, fruit = line.strip().split('\t') 6 new_key = base_number - int(sales) 7 print '\t'.join([val, str(new_key)])</pre>
[[圖片上傳失敗...(image-6db3ef-1535881942391)]](javascript:void(0); "復(fù)制代碼")
執(zhí)行腳本如下:
run.sh
[[圖片上傳失敗...(image-6b17b9-1535881942391)]](javascript:void(0); "復(fù)制代碼")
<pre> 1 set -e -x
2 HADOOP_CMD="/usr/local/src/hadoop-2.6.1/bin/hadoop"
3 STREAM_JAR_PATH="/usr/local/src/hadoop-2.6.1/share/hadoop/tools/lib/hadoop-streaming-2.6.1.jar"
4 INPUT_FILE_PATH_A="/data/fruit.txt"
5 OUTPUT_SORT_PATH="/output_sort"
6 OUTPUT_SORT_PATH
7 STREAM_JAR_PATH
8 -input OUTPUT_SORT_PATH \ 10 -mapper "python map_sort.py" \ 11 -reducer "python reduce_sort.py" \ 12 -file ./map_sort.py \ 13 -file ./red_sort.py \ 14 -jobconf mapred.reduce.tasks=12 \ 15 -jobconf stream.num.map.output.key.fields=2 \ 16 -jobconf num.key.fields.for.partition=1 \ 17 -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner</pre>
[[圖片上傳失敗...(image-1f300c-1535881942391)]](javascript:void(0); "復(fù)制代碼")
-jobconf stream.num.map.output.key.fields=2 這行代碼用于指定排序的字段蔬胯,數(shù)字2指定map函數(shù)輸出數(shù)據(jù)的第幾列用于排序,就是例子中的sales字段位他。
-jobconf num.key.fields.for.partition=1這行代碼指定partition字段氛濒,數(shù)字1指定map函數(shù)輸出數(shù)據(jù)的第一列用于分區(qū)。
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner這行代碼是調(diào)用hadoop streaming包中的分區(qū)類鹅髓,實(shí)現(xiàn)分區(qū)功能舞竿。
實(shí)現(xiàn)streaming partition功能時(shí)這三行代碼必不可少。
總結(jié):
實(shí)現(xiàn)hadoop部分排序主要是通過partition方式實(shí)現(xiàn)窿冯。
java語言使用自定義分區(qū)Partition類實(shí)現(xiàn)分區(qū)的功能骗奖,而streaming是通過KeyFieldBasedPartitioner類,然后在腳本文件中指定partition類的方式實(shí)現(xiàn)。