Chapter 1 Introduction to Apache Flink-Quick start setup

Quick start setup

Now that we understand the details about Flink's architecture and its process model, it's time to get started with a quick setup and try out things on our own. Flink works on both . Windows and Linux machines
(截止目前镶殷,我們已經(jīng)理解了Flink的架構和運行模型的相關細節(jié),該安裝并嘗試做一些我們自己的事情了全释。Flink 可以同時工作在windows 和linux 平臺上)
The very first thing we need to do is to download Flink's binaries. Flink can be downloaded from the Flink download page at: http://flink. apache.org/downloads.html(首先我們要做的是下載Flink的二進制包艰亮,見下面的地址)
On the download page, you will see multiple options as shown in the following screenshot:
(下載地址你可以看到更多象下邊截圖的選項)

image.png

In order to install Flink, you don't need to have Hadoop installed. But in case you need to connect to Hadoop using Flink then you need to download the exact binary that is compatible with the Hadoop version you have with you.(為了安裝Flink,你不必安裝Hadoop前计。但如果你需要用 Flink去鏈接Hadoop ,你需要下載確切的hadoop 兼容版本垃杖。)

As I have latest version of Hadoop 2.7.0 installed with me, I am going to download the Flink binary compatible with Hadoop 2.7.0 and built on Scala 2.11.Here is direct link to download http://www-us.apache.orq/dist/flink/flink-1.1.4/flink-1.1.4-bin-hadoop27-scal.a_2.11.tgz
(因為我已經(jīng)安裝了hadoop2.7.0版本男杈,所以我下載了兼容hadoop 2.7.0 并構建在scala2.11上的Flink版本。下面是下載鏈接)

Pre-requisite

Flink needs Java to be installed first. So before you start, please make sure Java is installed. I have IDK 1.8 installed on my machine:

D:\\java -veresion
Java version "1.8.0_92"
Java<TM>SE Runtime Envirenment (Build 1.8.92-b14)
Java HotSport(TM) 64-Bit Server VM (build 25.92-b14 mixed mode)

Installing on Windows

Flink installation is very easy to install. Just extract the compressed file and store it on the desired location
(Flink安裝非常簡單调俘,只是解壓并將其保存至指定的目錄)
Once extracted, go to the folder and execute start-local.bat:
(解壓完后伶棒,CD 到該目錄并執(zhí)行start-local.bat

>cd flink-1.1.4
>bin\start-local.bat

And you will see that the local instance of Flink has started.You can also check the web UI on http://localhost :8081/:
(你會看到本地的Flink實例已經(jīng)啟動。你也可以通過WEB UI檢測是否安裝成功彩库。)
You can stop the Flink process by pressing Cltr +C
(可以按Ctrl+C停止Flink進程)

image.png

Installing on Linux

Similar to Windows, installing Flink on Linux machines is very easy. We need to download the binary, place it in a specific folder, extract, and finish:(象在windows 一下肤无,在linux 上安裝也很簡單。我們需要下載二進制包放到指定的目錄骇钦,解壓宛渐,并完成下面命令)

$sudo tar -xzf flink-1.1.4-bin-hadoop27-scala 2.11.tqz 
$cd flink-1.1.4
$bin/start-local.sh

As in Windows, please make sure Java is installed on the machine
(跟windows一樣,確保JAVA 已經(jīng)安裝)
Now we are all set to submit a Flink job. To stop the local Flink instance on Linux, execute following command:
(現(xiàn)在我們可以提交一個Flink Job了眯搭。停止本地Flink 實例窥翩,用下面的命令)

$bin/stop-local.sh

Cluster setup

Setting up a Flink cluster is very simple as well. Those who have a background of installing a Hadoop cluster will be able to relate to these steps very easily. In order to set up the cluster, let's assume we have four Linux machines with us, each having a moderate configuration. At least two cores and 4 GB RAM machines would be a good option to get started.The very first thing we need to do this is to choose the cluster design. As we have four machines, we will use one machine as the Job Manager and the other three machines as the Task Managers:
(安裝Flink Cluster 也是非常簡單。有安裝Hadoop安裝經(jīng)驗的人很容易將這幾步關聯(lián)起來鳞仙。為了安裝Flink集群寇蚊,假設我們有4個臺機器,每臺都有合適的配置(至少兩核和4G RAM將的機器比較合適)棍好。最首要的是選擇集群設計仗岸。因為我們有4臺機器,我們用一臺作為Job Manager并且另外三臺作為Task Managers

image.png

SSH configurations

In order to set up the cluster, we first need to do password less connections to the TaskManager from the Job Manager machine. The following steps needs to be performed on the Job Manager machine which creates an SSH key and copies it to authorized_keys
(為了安裝集群借笙,我們首先需要JobManagerTaskManager之間的免密連接扒怖。下面的步驟需要在Job Manager上創(chuàng)建SSH keycopyauthorized_keys)

Ssh-keygen

This will generate the public and private keys in the /home/flinkuser/.ssh folder. Now copy the public key to the Task Manager machine and perform the following steps on the Task Manager to allow password less connection from the Job Manager(這個命令會生成公/私鑰對在/home/flinuser/.ssh目錄下。現(xiàn)在我們copy public keyTask Manager上业稼,并在Task Manager上執(zhí)行下面步驟允許Job Manager可以免密登錄到Task Manager上)

sudo mkdir -p /home/flinkuser/.ssh
sudo touch /home/flinkuser/authorized_keys
sudo cp /home/flinkuser/.ssh  //error 譯者注
sudo sh -c "cat id rsa.pub >> /home/flinkuser/.ssh/authorized keys". 

Make sure the keys have restricted access by executing the following commands
(執(zhí)行下面的命令保證key的訪問權限盗痒。)

sudo chmod 700 /home/flinkuser/.ssh
sudo chmod 600 /home/flinkuser/.ssh/authorized keys

Now you can test the password less SSH connection from the Job Manager machine

現(xiàn)在你可以從Job Manager上測試一下免密登錄。

sudo ssh <task-manager-1> 
sudo ssh <task-manager-2> 
sudo ssh <task-manager-3>

If you are using any cloud service instances for the installations, please make sure that the ROOT login is enabled from SSH. In order to do this, you need to login to each machine: open file /etc/ssh/sshd_config.Then change the value to PermitRootLogin yes. Once you save the file, restart the SSH service by executing the command:
(如果你用的是云服務安裝盼忌,請確保ROOT用戶的SSH是可用的积糯。你需要登錄到每臺機器掂墓,然后打開/etc/ssh/sshd_config。改變PermitRootLogin的值為yes看成。保存后君编,重啟ssh服務)

sudo service sshd restart

Java installation

Next we need to install Java on each machine. The following command will help you install Java on Redhat/CentOS based UNIX machines.

wget --no-check-certificate -no-cookies --header "Cookie:oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/idk/8u92-b14/jdk-8u92-linux-x64 rpm
sudo rpm -ivh jdk-8u92-linux-x64.rpm

Next we need to set up the JAVA_HOME environment variable so that Java is available to access from everywhere.Create a java.sh file

接下來我們需要安裝JAVA_HOME 環(huán)境變量以至于JAVA命令可以被訪問,新建java.sh 文件。

sudo vi /etc/profile.d/java.sh

And add following content in it and save it:
(接下來把以下內(nèi)容加入到文件中并保存)

#!/bin/bash
JAVA HOME=/usr/java/jdk1.8.0_92 
PATH=$JAVA_HOME/bin:$PATH
export PATH JAVA_HOME 
export CLASSPATH=.

Make the file executable and source it:
(使java.sh文件可執(zhí)行)

sudo chmod +x /etc/profile.d/java.sh 
source /etc/profile.d/java.sh

You can now check if Java is installed properly:
(檢測 java 是否正確安裝)

$ java -version
java version "1.8.0_92"
Java (TM) SE Runtime Environment (build 1.8.0 92-b14).
Java HotSpot (TM) 64-Bit Server VM (build 25.92-b14, mixed mode).

Repeat these installations steps on the Job Manager and Task Manager machines.
(在其他的Job ManagerTask Manager 機器上重復這幾步)

Flink installation

Once SSH and Java installation is done, we need to download Flink binaries and extract them into a specific folder. Please make a note that the installation directory on all nodes should be same.
So let's get started:
SSHJAVA都安裝完之后川慌,我們需要下載Flink 的二進制安裝包并解壓到指定目錄吃嘿。請確保所有機器的安裝目錄都一致。好梦重,開始安裝兑燥。)

cd /usr/local
sudo wget http://www-eu.apache.orq/dist/flink/flink-1.1.4/flink1.1.4-bin-hadoop27-scala_2.11.tqz
sudo tar -xzf flink-1.1.4-bin-hadoop27-scala_2.11.tqz

現(xiàn)在,F(xiàn)link 安裝包我們已經(jīng)準備好了琴拧,我們需要做一些相關的配置

Configurations

Flink's configurations are simple. We need to tune a few parameters and we are all set. Most of the configurations are same for the Job Manager node and the Task Manager node. All configurations are done in the conf/flink-conf.yaml file.
(Flink的配置很簡單降瞳。我們需要調(diào)整一小部分參數(shù),我們都已經(jīng)準備好了蚓胸。大部分的參數(shù)在Job ManagerTask Manager節(jié)點都是相同的挣饥。所有的配置都在下邊的文件中配置好了。)
The following is a configuration file for a Job Manager node:

jobmanager.rpc.address: localhost 
jobmanager.rpc.port: 6123 
jobmanager.heap.mb:256 
taskmanager.heap.mb: 512
taskmanager.numberOfTaskSlots: 1

You may want to change memory configurations for the Job Manager and Task Manager based on your node configurations. For the Task Manager, jobmanager.rpc.address should be populated with the correct Job Manager hostname or IP address.
(你也許想根據(jù)你節(jié)點的配置改變Job ManagerTask Manager的內(nèi)存配置沛膳。對于Task Manager來講jobmanager.rpc.address應該是正確的Job ManagerhostnameIP address扔枫。)

So for all Task Managers, the configuration file should be like the following:
(所有對于所有的Task Mangers,它們的配置文件應該是下面這樣的)

jobmanager.rpc.address:<jobmanager-ip-or-host>
jobmanager.rpc.port:6123
jobmanaqer.heap.mb:250
taskmanager.heap.mb:512
taskmanager.numberOfTaskSlots:1

We need to add the JAVA_HOME details in this file so that Flink knows exactly where to look for Java binaries
(我們還需要將JAVA_HOME放在這個文件里锹安,以便Flink知道JAVA的確切位置短荐。)

export JAVA HOME=/usr/java/idk1.8.0_92

We also need to add the slave node details in the conf/slaves file, with each node on a separate new line.Here is how a sample conf/slaves file should look like:
我們也需要將slave 節(jié)點加到conf/slaves文件中,每行一個節(jié)點叹哭。下面是例子:

<task-manager-1>
<task-manager-2>
<task-manager-3>

Starting daemons

Now the only thing left is starting the Flink processes. We can start each process separately on individual nodes or we can execute the start-cluster.sh command to start the required processes on each node:
(現(xiàn)在剩下的事就是啟動Flink進程了忍宋。我們可以分別在每一個節(jié)點上啟動每個進程或我們可以在每一個節(jié)點上執(zhí)行start-cluster.sh命令)

bin/start-cluster.sh

If all the configurations are good, then you would see that the cluster is up and running.You can check the web UI at http://<job-manager-ip>:8081/.The following are some snapshots of the Flink Web UI
(如果所有的配置都是OK的,那么你可以看到集群已經(jīng)啟動并運行了话速。你可以通過下邊的地址訪問WEB UI 來驗證是否安裝成功讶踪。下面是Flink WEB UI的截圖)

image.png

You can click on the Job Manager link to get the following view:
點擊Job Manager顯示如下

image.png

Similarly, you can check out the Task Managers view as follows:
同樣的,點擊Task Manager顯示如下

image.png

Adding additional Job/Task Managers

Flink provides you with the facility to add additional instances of Job and Task Managers to the running cluster.Before we start the daemon, please make sure that you have followed the steps given previously.To add an additional Job Manager to the existing cluster, execute the following command: sudo bin/jobmanager.sh start cluster Similarly, we need to execute the following command to add an additional Task Manager sudo bin/taskmanager.sh start cluster

(Flink提供在線增加Job ManagerTask Manager的功能泊交。在我們啟動daemon之前,你確認已經(jīng)執(zhí)行了上面我們提到那些步驟柱查。加Job Manager到現(xiàn)有集群有如下命令

sudo bin/jobmanager.sh start cluster 

同樣的廓俭,追加Task Manager執(zhí)行下如下命令

sudo bin/taskmanager.sh  start cluster

)

Stopping daemons and cluster

Once the job execution is completed, you want to shut down the cluster. The following commands are used for that.
(一旦job 運行結束,你需要停止集群唉工⊙衅梗可以用下面的命令)
To stop the complete cluster in one go:
關閉集群里的所有進程

sudo bin/stop-cluster.sh

To stop the individual Job Manager:
關閉一個Job Manager

sudo bin/jobmanager.sh stop cluster

To stop the individual Task Manager:
關閉一個Task Manager

sudo bin/tasknanager.sh stop cluster

Running sample application

Flink binaries come with a sample application which can be used as it is. Let's start with a very simple application, word count. Here we are going try a streaming application which reads data from the netcat server on a specific port.
(Flink 的包里帶著一個簡單的程序,這個程序是可以用的淋硝。我們開始啟動這個簡單的應用程序word count雹熬。在這里宽菜,我們嘗試一個streaming應用程序,這個程序從metcat服務的批定端口讀數(shù)據(jù)竿报。)
So let's get started. First start the netcat server on port 9000 by executing the following command:
(我們先啟動netcat服務在端口9000)

nc -l 9999

Now the netcat server will be start listening on port 9000 so whatever you type on the command prompt will be sent to the Flink processing.
(現(xiàn)在netcat服務器已經(jīng)啟程并監(jiān)聽9000端口铅乡,所以你在命令提示符下敲的內(nèi)容都會被發(fā)送到Flink 進程)
Next we need to start the Flink sample program to listen to the netcat server. The following is the command
接下來我們需要啟動Flink的示例程序來監(jiān)聽netcat服務。命令如下:

bin/flink run examples/streaming/SocketTextstreamWordCount.iar --hostname localhost --port 9000
08/06/2016 10:32:40 Job execution switched to status RUNNING
08/06/2016 10:32:40 Source: Socket stream-> Flat Map(1/1) switched to SCHEDULED
08/06/2016 10:32:40 Source: Socket stream -> Flat Map (1/1) switched to DEPLOYING
08/06/2016 10:32:40 Keyed Aggregation -> Sink: Unnamed(1/1) switched to SCHRDULED
08/06/2016 10:32:40 Keyed Aqqreqation -> Sink: Unnamed (1/1) switched to DEPLOYING
08/06/2016 10:32:40 Source: Socket stream-> Flat Map (1/1) switched to RUNNING
08/06/2016 10:32:40 Keyed Aggregation -> Sink: Unnamed(1/1) switched to RUNNING

This will start the Flink job execution. Now you can type something on the netcat console and Flink will process it
For example, type the following on the netcat server:
(這會啟動Flink job的運行×揖現(xiàn)在我們可以在netcat控制臺敲一些內(nèi)容阵幸,F(xiàn)link會處理它。比如:敲下面的內(nèi)容在netcat控制臺)

Snc-1 9000
hi Hello
Hello World
This distribution includes crvptographic software. 
The country in.which you currently reside may have restrictions on the import,
 possession. use, and/or re-export to another country, of. encryption software BEFORE using any encryption software. 
please check your country's laws. regulations and policies 
concerning the import, possession, or use, and re-export of 
encryption software, to see if this is permitted. 
See <http://www.wassenaar.org/> for more information.

You can verify the output in logs

$ tail-f flink--taskmanager--flink-instance-*.out.
==> flink-root-taskmanager-0-flink-instance-1.out <==
(see, 2)
(http, 1)
(www, 1)
(wassenaar, 1).
(org, 1)
(for, 1)
(more, 1)
(information, 1)
(hellow, 1) 
(world, 1)
==> flink-root-taskmananer-1-flink-instance-1 out <==
(is,1)
(permitted, 1)
(see, 2)
(http, 1)
(www, 1)
(wassenaar, 1).
(orq, 1)
(for, 1)
(more, 1)
(information, 1)

==> flirk-root-taskmanager-2-flink-instance-1.out <==
(he11o, 1)
(worlds, 1)
(hi, 1)
(how, 1)
(are, 1)
(you, 1)
(how, 2)
(is,1)
(it,1)
(going, 1)

You can also checkout the Flink Web UI to see how your job is performing. The following screenshot shows the data flow plan for the execution:
你可以打開Flink WEB UI看一下你的job是如何 運行的芽世。下面的截圖顯示了data flow執(zhí)行計劃挚赊。

image.png

Here for the job execution, Flink has two operators. The first is the source operator which reads data from the Socket stream. The second operator is the transformation operator which aggregates counts of words We can also look at the timeline of the job execution:
(這里job的執(zhí)行,F(xiàn)link 有兩個操作(符)济瓢。第一個是source,這個從 Socket stream中讀數(shù)據(jù)荠割。第二個是transformation,它聚合單詞數(shù),我們也可以看一下job 執(zhí)行的時間線旺矾。)

image.png

Summary

In this chapter, we talked about how Flink started as a university project and then became a full-fledged enterprise-ready data processing platform. We looked at the details of Flink's.architecture and how its process model works. We also learnt how to run Flink in local and cluster modes
在這一章涨共,我們討論了從大學里發(fā)起的Flink項目,變成一個成熟的企業(yè)級的數(shù)據(jù)處理平臺宠漩。我們也學習了更多Flink架構的細節(jié)和它的處理模型举反。我們也學習了怎樣以localcluster方式運行。
In the next chapter, we are going to learn about Flink's Streaming API and look at its details and how can we use that API to solve our data streaming processing problems.
在下面的章節(jié)當中扒吁,我們將要學習Flink的Streaming API火鼻,并學習它的相關細節(jié),以及我們怎樣用API來解決我們的流處理問題雕崩。

最后編輯于
?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末魁索,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子盼铁,更是在濱河造成了極大的恐慌粗蔚,老刑警劉巖,帶你破解...
    沈念sama閱讀 219,366評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件饶火,死亡現(xiàn)場離奇詭異鹏控,居然都是意外死亡,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,521評論 3 395
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來覆旱,“玉大人刑棵,你說我怎么就攤上這事。” “怎么了香椎?”我有些...
    開封第一講書人閱讀 165,689評論 0 356
  • 文/不壞的土叔 我叫張陵畴栖,是天一觀的道長找筝。 經(jīng)常有香客問我蹈垢,道長,這世上最難降的妖魔是什么袖裕? 我笑而不...
    開封第一講書人閱讀 58,925評論 1 295
  • 正文 為了忘掉前任曹抬,我火速辦了婚禮,結果婚禮上陆赋,老公的妹妹穿的比我還像新娘沐祷。我一直安慰自己,他們只是感情好攒岛,可當我...
    茶點故事閱讀 67,942評論 6 392
  • 文/花漫 我一把揭開白布赖临。 她就那樣靜靜地躺著,像睡著了一般灾锯。 火紅的嫁衣襯著肌膚如雪兢榨。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,727評論 1 305
  • 那天顺饮,我揣著相機與錄音吵聪,去河邊找鬼。 笑死兼雄,一個胖子當著我的面吹牛吟逝,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播赦肋,決...
    沈念sama閱讀 40,447評論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼块攒,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了佃乘?” 一聲冷哼從身側響起囱井,我...
    開封第一講書人閱讀 39,349評論 0 276
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎趣避,沒想到半個月后庞呕,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,820評論 1 317
  • 正文 獨居荒郊野嶺守林人離奇死亡程帕,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,990評論 3 337
  • 正文 我和宋清朗相戀三年住练,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片骆捧。...
    茶點故事閱讀 40,127評論 1 351
  • 序言:一個原本活蹦亂跳的男人離奇死亡澎羞,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出敛苇,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 35,812評論 5 346
  • 正文 年R本政府宣布枫攀,位于F島的核電站括饶,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏来涨。R本人自食惡果不足惜图焰,卻給世界環(huán)境...
    茶點故事閱讀 41,471評論 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望蹦掐。 院中可真熱鬧技羔,春花似錦、人聲如沸卧抗。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,017評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽社裆。三九已至拙绊,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間泳秀,已是汗流浹背标沪。 一陣腳步聲響...
    開封第一講書人閱讀 33,142評論 1 272
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留嗜傅,地道東北人金句。 一個月前我還...
    沈念sama閱讀 48,388評論 3 373
  • 正文 我出身青樓,卻偏偏與公主長得像吕嘀,于是被迫代替她去往敵國和親违寞。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 45,066評論 2 355

推薦閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,336評論 0 10
  • 一天币他,真實和謊言到河邊去洗澡坞靶,洗完后,謊言偷穿著真實的衣服離開了蝴悉,而真實不愿意穿起謊言的衣服彰阴。從此人們只相信穿著真...
    美心美素閱讀 482評論 0 0
  • 努力的意義,就是...以后的日子里拍冠,放眼望去尿这,全部都是自己喜歡的人和事……
    3310b38e2780閱讀 131評論 0 0
  • 添加MySQL Yum存儲庫 首先,將MySQL Yum存儲庫添加到系統(tǒng)的存儲庫列表中庆杜。按著這些次序:訪問MySQ...
    KingsChan閱讀 380評論 0 0