Quick start setup
Now that we understand the details about Flink's architecture and its process model, it's time to get started with a quick setup and try out things on our own. Flink works on both . Windows and Linux machines
(截止目前镶殷,我們已經(jīng)理解了Flink的架構和運行模型的相關細節(jié),該安裝并嘗試做一些我們自己的事情了全释。Flink 可以同時工作在windows 和linux 平臺上)
The very first thing we need to do is to download Flink's binaries. Flink can be downloaded from the Flink download page at: http://flink. apache.org/downloads.html(首先我們要做的是下載Flink的二進制包艰亮,見下面的地址)
On the download page, you will see multiple options as shown in the following screenshot:
(下載地址你可以看到更多象下邊截圖的選項)
In order to install Flink, you don't need to have Hadoop installed. But in case you need to connect to Hadoop using Flink then you need to download the exact binary that is compatible with the Hadoop version you have with you.(為了安裝Flink,你不必安裝Hadoop前计。但如果你需要用 Flink去鏈接Hadoop ,你需要下載確切的hadoop 兼容版本垃杖。)
As I have latest version of Hadoop 2.7.0 installed with me, I am going to download the Flink binary compatible with Hadoop 2.7.0 and built on Scala 2.11.Here is direct link to download http://www-us.apache.orq/dist/flink/flink-1.1.4/flink-1.1.4-bin-hadoop27-scal.a_2.11.tgz
(因為我已經(jīng)安裝了hadoop2.7.0版本男杈,所以我下載了兼容hadoop 2.7.0 并構建在scala2.11上的Flink版本。下面是下載鏈接)
Pre-requisite
Flink needs Java to be installed first. So before you start, please make sure Java is installed. I have IDK 1.8 installed on my machine:
D:\\java -veresion
Java version "1.8.0_92"
Java<TM>SE Runtime Envirenment (Build 1.8.92-b14)
Java HotSport(TM) 64-Bit Server VM (build 25.92-b14 mixed mode)
Installing on Windows
Flink installation is very easy to install. Just extract the compressed file and store it on the desired location
(Flink安裝非常簡單调俘,只是解壓并將其保存至指定的目錄)
Once extracted, go to the folder and execute start-local.bat:
(解壓完后伶棒,CD 到該目錄并執(zhí)行start-local.bat
)
>cd flink-1.1.4
>bin\start-local.bat
And you will see that the local instance of Flink has started.You can also check the web UI on http://localhost :8081/:
(你會看到本地的Flink實例已經(jīng)啟動。你也可以通過WEB UI檢測是否安裝成功彩库。)
You can stop the Flink process by pressing Cltr +C
(可以按Ctrl+C停止Flink進程)
Installing on Linux
Similar to Windows, installing Flink on Linux machines is very easy. We need to download the binary, place it in a specific folder, extract, and finish:(象在windows 一下肤无,在linux 上安裝也很簡單。我們需要下載二進制包放到指定的目錄骇钦,解壓宛渐,并完成下面命令)
$sudo tar -xzf flink-1.1.4-bin-hadoop27-scala 2.11.tqz
$cd flink-1.1.4
$bin/start-local.sh
As in Windows, please make sure Java is installed on the machine
(跟windows一樣,確保JAVA 已經(jīng)安裝)
Now we are all set to submit a Flink job. To stop the local Flink instance on Linux, execute following command:
(現(xiàn)在我們可以提交一個Flink Job了眯搭。停止本地Flink 實例窥翩,用下面的命令)
$bin/stop-local.sh
Cluster setup
Setting up a Flink cluster is very simple as well. Those who have a background of installing a Hadoop cluster will be able to relate to these steps very easily. In order to set up the cluster, let's assume we have four Linux machines with us, each having a moderate configuration. At least two cores and 4 GB RAM machines would be a good option to get started.The very first thing we need to do this is to choose the cluster design. As we have four machines, we will use one machine as the Job Manager and the other three machines as the Task Managers:
(安裝Flink Cluster 也是非常簡單。有安裝Hadoop
安裝經(jīng)驗的人很容易將這幾步關聯(lián)起來鳞仙。為了安裝Flink集群寇蚊,假設我們有4個臺機器,每臺都有合適的配置(至少兩核和4G RAM將的機器比較合適)棍好。最首要的是選擇集群設計仗岸。因為我們有4臺機器,我們用一臺作為Job Manager
并且另外三臺作為Task Managers
)
SSH configurations
In order to set up the cluster, we first need to do password less connections to the TaskManager from the Job Manager machine. The following steps needs to be performed on the Job Manager machine which creates an SSH key and copies it to authorized_keys
(為了安裝集群借笙,我們首先需要JobManager
到TaskManager
之間的免密連接扒怖。下面的步驟需要在Job Manager
上創(chuàng)建SSH key
并copy
至authorized_keys
)
Ssh-keygen
This will generate the public and private keys in the /home/flinkuser/.ssh folder. Now copy the public key to the Task Manager machine and perform the following steps on the Task Manager to allow password less connection from the Job Manager(這個命令會生成公/私鑰對在/home/flinuser/.ssh
目錄下。現(xiàn)在我們copy public key
到Task Manager
上业稼,并在Task Manager
上執(zhí)行下面步驟允許Job Manager
可以免密登錄到Task Manager
上)
sudo mkdir -p /home/flinkuser/.ssh
sudo touch /home/flinkuser/authorized_keys
sudo cp /home/flinkuser/.ssh //error 譯者注
sudo sh -c "cat id rsa.pub >> /home/flinkuser/.ssh/authorized keys".
Make sure the keys have restricted access by executing the following commands
(執(zhí)行下面的命令保證key
的訪問權限盗痒。)
sudo chmod 700 /home/flinkuser/.ssh
sudo chmod 600 /home/flinkuser/.ssh/authorized keys
Now you can test the password less SSH connection from the Job Manager machine
現(xiàn)在你可以從Job Manager
上測試一下免密登錄。
sudo ssh <task-manager-1>
sudo ssh <task-manager-2>
sudo ssh <task-manager-3>
If you are using any cloud service instances for the installations, please make sure that the ROOT login is enabled from SSH. In order to do this, you need to login to each machine: open file /etc/ssh/sshd_config.Then change the value to PermitRootLogin yes. Once you save the file, restart the SSH service by executing the command:
(如果你用的是云服務安裝盼忌,請確保ROOT
用戶的SSH
是可用的积糯。你需要登錄到每臺機器掂墓,然后打開/etc/ssh/sshd_config
。改變PermitRootLogin
的值為yes
看成。保存后君编,重啟ssh服務)
sudo service sshd restart
Java installation
Next we need to install Java on each machine. The following command will help you install Java on Redhat/CentOS based UNIX machines.
wget --no-check-certificate -no-cookies --header "Cookie:oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/idk/8u92-b14/jdk-8u92-linux-x64 rpm
sudo rpm -ivh jdk-8u92-linux-x64.rpm
Next we need to set up the JAVA_HOME environment variable so that Java is available to access from everywhere.Create a java.sh
file
接下來我們需要安裝JAVA_HOME
環(huán)境變量以至于JAVA命令可以被訪問,新建java.sh
文件。
sudo vi /etc/profile.d/java.sh
And add following content in it and save it:
(接下來把以下內(nèi)容加入到文件中并保存)
#!/bin/bash
JAVA HOME=/usr/java/jdk1.8.0_92
PATH=$JAVA_HOME/bin:$PATH
export PATH JAVA_HOME
export CLASSPATH=.
Make the file executable and source it:
(使java.sh文件可執(zhí)行)
sudo chmod +x /etc/profile.d/java.sh
source /etc/profile.d/java.sh
You can now check if Java is installed properly:
(檢測 java 是否正確安裝)
$ java -version
java version "1.8.0_92"
Java (TM) SE Runtime Environment (build 1.8.0 92-b14).
Java HotSpot (TM) 64-Bit Server VM (build 25.92-b14, mixed mode).
Repeat these installations steps on the Job Manager and Task Manager machines.
(在其他的Job Manager
和Task Manager
機器上重復這幾步)
Flink installation
Once SSH and Java installation is done, we need to download Flink binaries and extract them into a specific folder. Please make a note that the installation directory on all nodes should be same.
So let's get started:
(SSH
和JAVA
都安裝完之后川慌,我們需要下載Flink 的二進制安裝包并解壓到指定目錄吃嘿。請確保所有機器的安裝目錄都一致。好梦重,開始安裝兑燥。)
cd /usr/local
sudo wget http://www-eu.apache.orq/dist/flink/flink-1.1.4/flink1.1.4-bin-hadoop27-scala_2.11.tqz
sudo tar -xzf flink-1.1.4-bin-hadoop27-scala_2.11.tqz
現(xiàn)在,F(xiàn)link 安裝包我們已經(jīng)準備好了琴拧,我們需要做一些相關的配置
Configurations
Flink's configurations are simple. We need to tune a few parameters and we are all set. Most of the configurations are same for the Job Manager node and the Task Manager node. All configurations are done in the conf/flink-conf.yaml file.
(Flink的配置很簡單降瞳。我們需要調(diào)整一小部分參數(shù),我們都已經(jīng)準備好了蚓胸。大部分的參數(shù)在Job Manager
和Task Manager
節(jié)點都是相同的挣饥。所有的配置都在下邊的文件中配置好了。)
The following is a configuration file for a Job Manager node:
jobmanager.rpc.address: localhost
jobmanager.rpc.port: 6123
jobmanager.heap.mb:256
taskmanager.heap.mb: 512
taskmanager.numberOfTaskSlots: 1
You may want to change memory configurations for the Job Manager and Task Manager based on your node configurations. For the Task Manager, jobmanager.rpc.address should be populated with the correct Job Manager hostname or IP address.
(你也許想根據(jù)你節(jié)點的配置改變Job Manager
和Task Manager
的內(nèi)存配置沛膳。對于Task Manager
來講jobmanager.rpc.address
應該是正確的Job Manager
的hostname
或IP address
扔枫。)
So for all Task Managers, the configuration file should be like the following:
(所有對于所有的Task Mangers
,它們的配置文件應該是下面這樣的)
jobmanager.rpc.address:<jobmanager-ip-or-host>
jobmanager.rpc.port:6123
jobmanaqer.heap.mb:250
taskmanager.heap.mb:512
taskmanager.numberOfTaskSlots:1
We need to add the JAVA_HOME details in this file so that Flink knows exactly where to look for Java binaries
(我們還需要將JAVA_HOME
放在這個文件里锹安,以便Flink知道JAVA的確切位置短荐。)
export JAVA HOME=/usr/java/idk1.8.0_92
We also need to add the slave node details in the conf/slaves file, with each node on a separate new line.Here is how a sample conf/slaves file should look like:
我們也需要將slave 節(jié)點加到conf/slaves
文件中,每行一個節(jié)點叹哭。下面是例子:
<task-manager-1>
<task-manager-2>
<task-manager-3>
Starting daemons
Now the only thing left is starting the Flink processes. We can start each process separately on individual nodes or we can execute the start-cluster.sh command to start the required processes on each node:
(現(xiàn)在剩下的事就是啟動Flink進程了忍宋。我們可以分別在每一個節(jié)點上啟動每個進程或我們可以在每一個節(jié)點上執(zhí)行start-cluster.sh
命令)
bin/start-cluster.sh
If all the configurations are good, then you would see that the cluster is up and running.You can check the web UI at http://<job-manager-ip>:8081/
.The following are some snapshots of the Flink Web UI
(如果所有的配置都是OK的,那么你可以看到集群已經(jīng)啟動并運行了话速。你可以通過下邊的地址訪問WEB UI 來驗證是否安裝成功讶踪。下面是Flink WEB UI的截圖)
You can click on the Job Manager link to get the following view:
點擊Job Manager
顯示如下
Similarly, you can check out the Task Managers view as follows:
同樣的,點擊Task Manager
顯示如下
Adding additional Job/Task Managers
Flink provides you with the facility to add additional instances of Job and Task Managers to the running cluster.Before we start the daemon, please make sure that you have followed the steps given previously.To add an additional Job Manager to the existing cluster, execute the following command: sudo bin/jobmanager.sh start cluster Similarly, we need to execute the following command to add an additional Task Manager sudo bin/taskmanager.sh start cluster
(Flink提供在線增加Job Manager
和Task Manager
的功能泊交。在我們啟動daemon之前,你確認已經(jīng)執(zhí)行了上面我們提到那些步驟柱查。加Job Manager
到現(xiàn)有集群有如下命令
sudo bin/jobmanager.sh start cluster
同樣的廓俭,追加Task Manager
執(zhí)行下如下命令
sudo bin/taskmanager.sh start cluster
)
Stopping daemons and cluster
Once the job execution is completed, you want to shut down the cluster. The following commands are used for that.
(一旦job 運行結束,你需要停止集群唉工⊙衅梗可以用下面的命令)
To stop the complete cluster in one go:
關閉集群里的所有進程
sudo bin/stop-cluster.sh
To stop the individual Job Manager:
關閉一個Job Manager
sudo bin/jobmanager.sh stop cluster
To stop the individual Task Manager:
關閉一個Task Manager
sudo bin/tasknanager.sh stop cluster
Running sample application
Flink binaries come with a sample application which can be used as it is. Let's start with a very simple application, word count. Here we are going try a streaming application which reads data from the netcat server on a specific port.
(Flink 的包里帶著一個簡單的程序,這個程序是可以用的淋硝。我們開始啟動這個簡單的應用程序word count
雹熬。在這里宽菜,我們嘗試一個streaming
應用程序,這個程序從metcat
服務的批定端口讀數(shù)據(jù)竿报。)
So let's get started. First start the netcat server on port 9000 by executing the following command:
(我們先啟動netcat
服務在端口9000)
nc -l 9999
Now the netcat server will be start listening on port 9000 so whatever you type on the command prompt will be sent to the Flink processing.
(現(xiàn)在netcat
服務器已經(jīng)啟程并監(jiān)聽9000端口铅乡,所以你在命令提示符下敲的內(nèi)容都會被發(fā)送到Flink 進程)
Next we need to start the Flink sample program to listen to the netcat server. The following is the command
接下來我們需要啟動Flink的示例程序來監(jiān)聽netcat
服務。命令如下:
bin/flink run examples/streaming/SocketTextstreamWordCount.iar --hostname localhost --port 9000
08/06/2016 10:32:40 Job execution switched to status RUNNING
08/06/2016 10:32:40 Source: Socket stream-> Flat Map(1/1) switched to SCHEDULED
08/06/2016 10:32:40 Source: Socket stream -> Flat Map (1/1) switched to DEPLOYING
08/06/2016 10:32:40 Keyed Aggregation -> Sink: Unnamed(1/1) switched to SCHRDULED
08/06/2016 10:32:40 Keyed Aqqreqation -> Sink: Unnamed (1/1) switched to DEPLOYING
08/06/2016 10:32:40 Source: Socket stream-> Flat Map (1/1) switched to RUNNING
08/06/2016 10:32:40 Keyed Aggregation -> Sink: Unnamed(1/1) switched to RUNNING
This will start the Flink job execution. Now you can type something on the netcat console and Flink will process it
For example, type the following on the netcat server:
(這會啟動Flink job
的運行×揖現(xiàn)在我們可以在netcat
控制臺敲一些內(nèi)容阵幸,F(xiàn)link會處理它。比如:敲下面的內(nèi)容在netcat
控制臺)
Snc-1 9000
hi Hello
Hello World
This distribution includes crvptographic software.
The country in.which you currently reside may have restrictions on the import,
possession. use, and/or re-export to another country, of. encryption software BEFORE using any encryption software.
please check your country's laws. regulations and policies
concerning the import, possession, or use, and re-export of
encryption software, to see if this is permitted.
See <http://www.wassenaar.org/> for more information.
You can verify the output in logs
$ tail-f flink--taskmanager--flink-instance-*.out.
==> flink-root-taskmanager-0-flink-instance-1.out <==
(see, 2)
(http, 1)
(www, 1)
(wassenaar, 1).
(org, 1)
(for, 1)
(more, 1)
(information, 1)
(hellow, 1)
(world, 1)
==> flink-root-taskmananer-1-flink-instance-1 out <==
(is,1)
(permitted, 1)
(see, 2)
(http, 1)
(www, 1)
(wassenaar, 1).
(orq, 1)
(for, 1)
(more, 1)
(information, 1)
==> flirk-root-taskmanager-2-flink-instance-1.out <==
(he11o, 1)
(worlds, 1)
(hi, 1)
(how, 1)
(are, 1)
(you, 1)
(how, 2)
(is,1)
(it,1)
(going, 1)
You can also checkout the Flink Web UI to see how your job is performing. The following screenshot shows the data flow plan for the execution:
你可以打開Flink WEB UI
看一下你的job是如何 運行的芽世。下面的截圖顯示了data flow
執(zhí)行計劃挚赊。
Here for the job execution, Flink has two operators. The first is the source operator which reads data from the Socket stream. The second operator is the transformation operator which aggregates counts of words We can also look at the timeline of the job execution:
(這里job的執(zhí)行,F(xiàn)link 有兩個操作(符)济瓢。第一個是source
,這個從 Socket stream
中讀數(shù)據(jù)荠割。第二個是transformation
,它聚合單詞數(shù),我們也可以看一下job 執(zhí)行的時間線旺矾。)
Summary
In this chapter, we talked about how Flink started as a university project and then became a full-fledged enterprise-ready data processing platform. We looked at the details of Flink's.architecture and how its process model works. We also learnt how to run Flink in local and cluster modes
在這一章涨共,我們討論了從大學里發(fā)起的Flink項目,變成一個成熟的企業(yè)級的數(shù)據(jù)處理平臺宠漩。我們也學習了更多Flink架構的細節(jié)和它的處理模型举反。我們也學習了怎樣以local
和cluster
方式運行。
In the next chapter, we are going to learn about Flink's Streaming API and look at its details and how can we use that API to solve our data streaming processing problems.
在下面的章節(jié)當中扒吁,我們將要學習Flink的Streaming API
火鼻,并學習它的相關細節(jié),以及我們怎樣用API來解決我們的流處理問題雕崩。