別老扯什么Hadoop了,你的數(shù)據(jù)根本不夠大 - 極客頭條 - CSDN.NET
twitter/scalding: A Scala API for Cascading
Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs.
package com.twitter.scalding.examples
import com.twitter.scalding._
import com.twitter.scalding.source.TypedText
class WordCountJob(args: Args) extends Job(args) {
.flatMap { line => tokenize(line) }
.groupBy { word => word } // use each word for a key
.size // in each group, get the size
.write(TypedText.tsv(String, Long))
// Split a piece of text into individual words.
def tokenize(text: String): Array[String] = {
// Lowercase each word and remove punctuation.
text.toLowerCase.replaceAll("[^a-zA-Z0-9\s]", "").split("\s+")