kafka stream入門1

最近本人在單位經(jīng)常有對(duì)于大量心跳數(shù)據(jù)進(jìn)行匯總計(jì)算能真，然后更加計(jì)算匯總出不同種類的中間數(shù)據(jù)集合，來提供后期的處理的需求逢勾。
原先的方案是自己寫了不少的job涂召，然后利用zookeeper等進(jìn)行job進(jìn)度的控制，問題是這種模式下敏沉，需要大量的編碼，保證數(shù)據(jù)不被重復(fù)消費(fèi)炎码，感覺自己的程序在出現(xiàn)異常的時(shí)候盟迟，
還是會(huì)有部分?jǐn)?shù)據(jù)丟失的問題。

考慮采用一個(gè)業(yè)績主流的流式計(jì)算的方案潦闲，同時(shí)也要支持對(duì)于歷史數(shù)據(jù)的批量操作攒菠。

對(duì)比了spark，storm歉闰，kafka_stream辖众，首先本人完全沒有大數(shù)據(jù)的實(shí)戰(zhàn)經(jīng)驗(yàn)卓起，個(gè)人感覺，前兩者相對(duì)成熟很多凹炸，后者kafka_stream是新出來的戏阅，相對(duì)資源少。
但是前兩者是框架級(jí)別的啤它，以spark為例奕筐，看了下，一般要單獨(dú)部署一套自己的spark集群（除非單位有現(xiàn)成的給你使用）我們這邊是不具備的变骡。搭建的硬件要求也很高离赫。
對(duì)比kafkastream，其只是個(gè)庫塌碌，依賴只有kafka渊胸，硬件資源需求較小，決定自己先研究下台妆。如果可行翎猛，就投入生產(chǎn)。
以下摘錄一個(gè)最簡單的入門的案例频丘。

后續(xù)繼續(xù)補(bǔ)全办成。

package io.confluent.examples.streams;

import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KStreamBuilder;
import org.apache.kafka.streams.kstream.KTable;

import java.util.Arrays;
import java.util.Properties;
import java.util.regex.Pattern;

/**
 * Demonstrates, using the high-level KStream DSL, how to implement the WordCount program that
 * computes a simple word occurrence histogram from an input text. This example uses lambda
 * expressions and thus works with Java 8+ only.
 * <p>
 * In this example, the input stream reads from a topic named "TextLinesTopic", where the values of
 * messages represent lines of text; and the histogram output is written to topic
 * "WordsWithCountsTopic", where each record is an updated count of a single word, i.e. {@code word (String) -> currentCount (Long)}.
 * <p>
 * Note: Before running this example you must 1) create the source topic (e.g. via {@code kafka-topics --create ...}),
 * then 2) start this example and 3) write some data to the source topic (e.g. via {@code kafka-console-producer}).
 * Otherwise you won't see any data arriving in the output topic.
 * <p>
 * <br>
 * HOW TO RUN THIS EXAMPLE
 * <p>
 * 1) Start Zookeeper and Kafka. Please refer to <a >QuickStart</a>.
 * <p>
 * 2) Create the input and output topics used by this example.
 * <pre>
 * {@code
 * $ bin/kafka-topics --create --topic TextLinesTopic \
 *                    --zookeeper localhost:2181 --partitions 1 --replication-factor 1
 * $ bin/kafka-topics --create --topic WordsWithCountsTopic \
 *                    --zookeeper localhost:2181 --partitions 1 --replication-factor 1
 * }</pre>
 * Note: The above commands are for the Confluent Platform. For Apache Kafka it should be {@code bin/kafka-topics.sh ...}.
 * <p>
 * 3) Start this example application either in your IDE or on the command line.
 * <p>
 * If via the command line please refer to <a >Packaging</a>.
 * Once packaged you can then run:
 * <pre>
 * {@code
 * $ java -cp target/streams-examples-3.3.0-standalone.jar io.confluent.examples.streams.WordCountLambdaExample
 * }</pre>
 * 4) Write some input data to the source topic "TextLinesTopic" (e.g. via {@code kafka-console-producer}).
 * The already running example application (step 3) will automatically process this input data and write the
 * results to the output topic "WordsWithCountsTopic".
 * <pre>
 * {@code
 * # Start the console producer. You can then enter input data by writing some line of text, followed by ENTER:
 * #
 * #   hello kafka streams<ENTER>
 * #   all streams lead to kafka<ENTER>
 * #   join kafka summit<ENTER>
 * #
 * # Every line you enter will become the value of a single Kafka message.
 * $ bin/kafka-console-producer --broker-list localhost:9092 --topic TextLinesTopic
 * }</pre>
 * 5) Inspect the resulting data in the output topic, e.g. via {@code kafka-console-consumer}.
 * <pre>
 * {@code
 * $ bin/kafka-console-consumer --topic WordsWithCountsTopic --from-beginning \
 *                              --new-consumer --bootstrap-server localhost:9092 \
 *                              --property print.key=true \
 *                              --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
 * }</pre>
 * You should see output data similar to below. Please note that the exact output
 * sequence will depend on how fast you type the above sentences. If you type them
 * slowly, you are likely to get each count update, e.g., kafka 1, kafka 2, kafka 3.
 * If you type them quickly, you are likely to get fewer count updates, e.g., just kafka 3.
 * This is because the commit interval is set to 10 seconds. Anything typed within
 * that interval will be compacted in memory.
 * <pre>
 * {@code
 * hello    1
 * kafka    1
 * streams  1
 * all      1
 * streams  2
 * lead     1
 * to       1
 * join     1
 * kafka    3
 * summit   1
 * }</pre>
 * 6) Once you're done with your experiments, you can stop this example via {@code Ctrl-C}. If needed,
 * also stop the Kafka broker ({@code Ctrl-C}), and only then stop the ZooKeeper instance (`{@code Ctrl-C}).
 */
public class WordCountLambdaExample {

  public static void main(final String[] args) throws Exception {
    final String bootstrapServers = args.length > 0 ? args[0] : "localhost:9092";
    final Properties streamsConfiguration = new Properties();
    // Give the Streams application a unique name.  The name must be unique in the Kafka cluster
    // against which the application is run.
    streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example");
    streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "wordcount-lambda-example-client");
    // Where to find Kafka broker(s).
    streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
    // Specify default (de)serializers for record keys and for record values.
    streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
    streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
    // Records should be flushed every 10 seconds. This is less than the default
    // in order to keep this example interactive.
    streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000);
    // For illustrative purposes we disable record caches
    streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);

    // Set up serializers and deserializers, which we will use for overriding the default serdes
    // specified above.
    final Serde<String> stringSerde = Serdes.String();
    final Serde<Long> longSerde = Serdes.Long();

    // In the subsequent lines we define the processing topology of the Streams application.
    final KStreamBuilder builder = new KStreamBuilder();

    // Construct a `KStream` from the input topic "TextLinesTopic", where message values
    // represent lines of text (for the sake of this example, we ignore whatever may be stored
    // in the message keys).
    //
    // Note: We could also just call `builder.stream("TextLinesTopic")` if we wanted to leverage
    // the default serdes specified in the Streams configuration above, because these defaults
    // match what's in the actual topic.  However we explicitly set the deserializers in the
    // call to `stream()` below in order to show how that's done, too.
    final KStream<String, String> textLines = builder.stream(stringSerde, stringSerde, "TextLinesTopic");

    final Pattern pattern = Pattern.compile("\\W+", Pattern.UNICODE_CHARACTER_CLASS);

    final KTable<String, Long> wordCounts = textLines
      // Split each text line, by whitespace, into words.  The text lines are the record
      // values, i.e. we can ignore whatever data is in the record keys and thus invoke
      // `flatMapValues()` instead of the more generic `flatMap()`.
      .flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
      // Count the occurrences of each word (record key).
      //
      // This will change the stream type from `KStream<String, String>` to `KTable<String, Long>`
      // (word -> count).  In the `count` operation we must provide a name for the resulting KTable,
      // which will be used to name e.g. its associated state store and changelog topic.
      //
      // Note: no need to specify explicit serdes because the resulting key and value types match our default serde settings
      .groupBy((key, word) -> word)
      .count("Counts");

    // Write the `KStream<String, Long>` to the output topic.
    wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic");

    // Now that we have finished the definition of the processing topology we can actually run
    // it via `start()`.  The Streams application as a whole can be launched just like any
    // normal Java application that has a `main()` method.
    final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
    // Always (and unconditionally) clean local state prior to starting the processing topology.
    // We opt for this unconditional call here because this will make it easier for you to play around with the example
    // when resetting the application for doing a re-run (via the Application Reset Tool,
    // http://docs.confluent.io/current/streams/developer-guide.html#application-reset-tool).
    //
    // The drawback of cleaning up local state prior is that your app must rebuilt its local state from scratch, which
    // will take time and will require reading all the state-relevant data from the Kafka cluster over the network.
    // Thus in a production scenario you typically do not want to clean up always as we do here but rather only when it
    // is truly needed, i.e., only under certain conditions (e.g., the presence of a command line flag for your app).
    // See `ApplicationResetExample.java` for a production-like example.
    streams.cleanUp();
    streams.start();

    // Add shutdown hook to respond to SIGTERM and gracefully close Kafka Streams
    Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
  }

}

這里的邏輯答題上就是從kafka的輸入stream TextLinesTopic里，不斷讀入用戶輸入的文本行搂漠，

 final KStream<String, String> textLines = builder.stream(stringSerde, stringSerde, "TextLinesTopic");

然后針對(duì)每行輸入用正則表達(dá)式查封成各個(gè)word迂卢。flatMap到word 單詞數(shù)據(jù)流

flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))

接下來，按照不同的單詞進(jìn)行分組

groupBy((key, word) -> word)

最后把kstream 通過count進(jìn)行轉(zhuǎn)存到ktable里桐汤，后續(xù)可以通過ksql進(jìn)行查詢
切記而克，streams需要開啟才能工作

streams.start();

最后編輯于：2017.12.10 14:53:14

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市怔毛，隨后出現(xiàn)的幾起案子员萍，更是在濱河造成了極大的恐慌，老刑警劉巖拣度，帶你破解...
沈念sama閱讀 219,039評(píng)論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件碎绎，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡抗果，警方通過查閱死者的電腦和手機(jī)筋帖，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,426評(píng)論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來冤馏，“玉大人日麸，你說我怎么就攤上這事〈猓” “怎么了代箭？”我有些...
開封第一講書人閱讀 165,417評(píng)論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵墩划，是天一觀的道長。經(jīng)常有香客問我嗡综，道長乙帮，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,868評(píng)論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任蛤高，我火速辦了婚禮蚣旱，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘戴陡。我一直安慰自己塞绿，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 67,892評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布恤批。她就那樣靜靜地躺著异吻，像睡著了一般。火紅的嫁衣襯著肌膚如雪喜庞。梳的紋絲不亂的頭發(fā)上诀浪，一...
開封第一講書人閱讀 51,692評(píng)論 1贊 305
城市分裂傳說
那天，我揣著相機(jī)與錄音延都，去河邊找鬼雷猪。笑死，一個(gè)胖子當(dāng)著我的面吹牛晰房，可吹牛的內(nèi)容都是我干的求摇。我是一名探鬼主播，決...
沈念sama閱讀 40,416評(píng)論 3贊 419
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼殊者，長吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼与境！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起猖吴，我...
開封第一講書人閱讀 39,326評(píng)論 0贊 276
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤摔刁，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后海蔽，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體共屈，經(jīng)...
沈念sama閱讀 45,782評(píng)論 1贊 316
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,957評(píng)論 3贊 337
?白月光啟示錄
正文我和宋清朗相戀三年党窜，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了拗引。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 40,102評(píng)論 1贊 350
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡刑然，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出暇务，到底是詐尸還是另有隱情泼掠，我是刑警寧澤怔软，帶...
沈念sama閱讀 35,790評(píng)論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站择镇，受9級(jí)特大地震影響挡逼，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜腻豌，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,442評(píng)論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一家坎、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧吝梅，春花似錦虱疏、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,996評(píng)論 0贊 22
一樁弒父案做瞪，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至右冻，卻和暖如春装蓬，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背纱扭。一陣腳步聲響...
開封第一講書人閱讀 33,113評(píng)論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工牍帚，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人乳蛾。一個(gè)月前我還...
沈念sama閱讀 48,332評(píng)論 3贊 373
代替公主和親
正文我出身青樓暗赶，卻偏偏與公主長得像，于是被迫代替她去往敵國和親屡久。傳聞我的和親對(duì)象是個(gè)殘疾皇子忆首，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,044評(píng)論 2贊 355

kafka stream入門1

kafka stream入門1

推薦閱讀更多精彩內(nèi)容