別老扯什么Hadoop了,你的數(shù)據(jù)根本不夠大 - 極客頭條 - CSDN.NET
http://geek.csdn.net/news/detail/2780
我推薦使用Scalding,而不是Hive或者Pig能犯,因為你可以用Scala語言來寫級聯(lián)Hadoop任務拣宏,隱藏了MapReduce底層細節(jié)沈贝。
twitter/scalding: A Scala API for Cascading
https://github.com/twitter/scalding
Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs.
package com.twitter.scalding.examples
import com.twitter.scalding._
import com.twitter.scalding.source.TypedText
class WordCountJob(args: Args) extends Job(args) {
TypedPipe.from(TextLine(args("input")))
.flatMap { line => tokenize(line) }
.groupBy { word => word } // use each word for a key
.size // in each group, get the size
.write(TypedText.tsv(String, Long))
// Split a piece of text into individual words.
def tokenize(text: String): Array[String] = {
// Lowercase each word and remove punctuation.
text.toLowerCase.replaceAll("[^a-zA-Z0-9\s]", "").split("\s+")
}
}