Flink官方翻譯-03Table API & SQL

Apache Flink features two relational APIs - the Table API and SQL - for unified stream and batch processing.

Flink’s SQL support is based on Apache Calcite which implements the SQL standard.

注意事項：table api和sql還處于活躍開發(fā)狀態(tài)烦衣，不是所有功能都已經實現(xiàn)

Please note that the Table API and SQL are not yet feature complete and are being actively developed. Not all operations are supported by every combination of [Table API, SQL] and [stream, batch] input.

Concepts & Common API

Streaming Table API & SQL

關系查詢在Data Streams

區(qū)別

<colgroup><col style="width: 424px;"><col style="width: 424px;"></colgroup>
|

Relational Algebra / SQL

Stream Processing

|
|

Relations (or tables) are bounded (multi-)sets of tuples.

A stream is an infinite sequences of tuples.

|
|

A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.

A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.

|
|

A batch query terminates after it produced a fixed sized result.

A streaming query continuously updates its result based on the received records and never completes.

聯(lián)系

A database table is the result of a stream of INSERT, UPDATE, and DELETE DML statements, often called changelog stream.
A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view’s base relations.
The materialized view is the result of the streaming SQL query.

Dynamic Tables & Continuous Queries

Dynamic tables are the core concept of Flink’s Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time.

[圖片上傳失敗...(image-da00e1-1524133656710)]

A stream is converted into a dynamic table.
A continuous query is evaluated on the dynamic table yielding a new dynamic table.
The resulting dynamic table is converted back into a stream.

Defining a Table on a Stream

Essentially, we are building a table from an INSERT-only changelog stream.

The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.

[圖片上傳失敗...(image-b30af4-1524133656710)]

注意: A table which is defined on a stream is internally not materialized

Continuous Queries

A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables.At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.本質上與在batch table中使用sql查詢，效果是一致的

[圖片上傳失敗...(image-1345f7-1524133656710)]

When the query is started, the clicks table (left-hand side) is empty. The query starts to compute the result table, when the first row is inserted into the clicks table. After the first row [Mary, ./home] was inserted, the result table (right-hand side, top) consists of a single row [Mary, 1]. When the second row [Bob, ./cart] is inserted into the clicks table, the query updates the result table and inserts a new row [Bob, 1]. The third row [Mary, ./prod?id=1] yields an update of an already computed result row such that [Mary, 1] is updated to [Mary, 2]. Finally, the query inserts a third row [Liz, 1] into the result table, when the fourth row is appended to the clicks table.

[圖片上傳失敗...(image-6a8473-1524133656710)]

As before, the input table clicks is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (cTime) between 12:00:00 and 12:59:59. The query computes two results rows from this input (one for each user) and appends them to the result table. For the next window between 13:00:00 and 13:59:59, the clicks table contains three rows, which results in another two rows being appended to the result table. The result table is updated, as more rows are appended to clicks over time.

Query Restrictions

State Size: 有些程序會運行經年累月的，中間的state都會做保存，如果數據量大了會造成查詢失敗潮尝。

Computing Updates:

Table to Stream Conversion

A dynamic table can be continuously modified by INSERT, UPDATE, and DELETE changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without UPDATE and DELETE modifications, or anything in between.

Append-only stream: A dynamic table that is only modified by INSERT changes can be converted into a stream by emitting the inserted rows.
Retract stream: A retract stream is a stream with two types of messages, add messages and retract messages. A dynamic table is converted into an retract stream by encoding an INSERT change as add message, a DELETE change as retract message, and an UPDATE change as a retract message for the updated (previous) row and an add message for the updating (new) row. The following figure visualizes the conversion of a dynamic table into a retract stream.
Upsert stream: An upsert stream is a stream with two types of messages, upsert messages and delete message. A dynamic table that is converted into an upsert stream requires a (possibly composite) unique key. A dynamic table with unique key is converted into a dynamic table by encoding INSERT and UPDATE changes as upsert message and DELETE changes as delete message. The stream consuming operator needs to be aware of the unique key attribute in order to apply messages correctly. The main difference to a retract stream is that UPDATE changes are encoded with a single message and hence more efficient. The following figure visualizes the conversion of a dynamic table into an upsert stream.

Time Attributes

Flink主要提供三種時間粒度：

1.Processing time：使用機器的的時間斜纪，通常被稱為：wall-clock time

2.Event time ：事件時間，通常都是基于每行數據本身的timestamp

3.Ingestion time：通常是事件進入到flink的洞就，通常與事件時間相同棍厌。

Processing Time

During DataStream-to-Table Conversion

Using a TableSource

Event time

During DataStream-to-Table Conversion

There are two ways of defining the time attribute when converting a DataStream into a Table:

Extending the physical schema by an additional logical field
Replacing a physical field by a logical field (e.g. because it is no longer needed after timestamp extraction).

Using a TableSource

The event time attribute is defined by a TableSource that implements the DefinedRowtimeAttribute interface. The logical time attribute is appended to the physical schema defined by the return type of the TableSource.

Timestamps and watermarks must be assigned in the stream that is returned by the getDataStream() method.

Query Configuration

Table API

Update and Append Queries

第一個栗子肾胯，是可以更新前面的數據，changelog stream定義了結果包含insert 和 update changes耘纱。第二個栗子敬肚，只是追加結果到結果表中。changelog stream 結果表只是由insert changes組成

SQL

SQL queries are specified with the sql() method of the TableEnvironment. The method returns the result of the SQL query as a Table. A Table can be used in subsequent SQL and Table API queries, be converted into a DataSet or DataStream, or written to a TableSink). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single program.

Table可以從 TableSource, Table, DataStream, 或者 DataSet 轉化而來束析。

或者艳馒，用戶可以從注冊外部的目錄到 TableEnvironment中

為了用sql查詢的方式查詢table，必須注冊到 tableEnvironment中员寇。一個table可以從 TableSource弄慰，Table，DataStream 蝶锋，或者 DataSet中注冊生成陆爽。或者扳缕，用戶也可以從外部目錄注冊到TableEnvironment中慌闭，通過指定一個特定的目錄。

In order to access a table in a SQL query, it must be registered in the TableEnvironment. A table can be registered from a TableSource, Table, DataStream, or DataSet. Alternatively, users can also register external catalogs in a TableEnvironment to specify the location of the data sources.

注意:flink sql的功能支持還沒有健全躯舔。有些不支持的sql查詢了之后會報 TableException驴剔。所以支持的sql在批量處理中或者流式處理中都列舉在下面

Supported Syntax

Flink parses SQL using Apache Calcite, which supports standard ANSI SQL. DML and DDL statements are not supported by Flink.

Scan, Projection, and Filter

Aggregations

GroupBy Aggregation

GroupBy Window Aggregation

Over Window aggregation

Group Windows

<colgroup><col style="width: 310px;"><col style="width: 310px;"></colgroup>
|

TUMBLE(time_attr, interval)

固定窗口

|
|

HOP(time_attr, interval, interval)

滑動窗口（跳躍窗口）。有兩個interval參數庸毫，第一個主要定義了滑動間隔，第二個主要定義了窗口大小

|
|

SESSION(time_attr, interval)

會話窗口衫樊。session time window沒有固定的持續(xù)時間飒赃，但是它們的邊界是通過時間 interval來交互的利花。如果在固定的間隙之間沒有新的event進入，session window就會關閉载佳，或者這行數據就會添加到已有的window中

Time Attributes

Processing time炒事。記錄是機器和系統(tǒng)的時間，當在做處理的時候蔫慧。
Event time挠乳。消息自帶的時間，可以通過encode等指定姑躲。
Ingestion time睡扬。事件到達flink的時間。這個與process time的功能類似黍析。

<colgroup><col style="width: 297px;"><col style="width: 293px;"></colgroup>
|

Auxiliary Function

Description

|
|

TUMBLE_START(time_attr, interval)

HOP_START(time_attr, interval, interval)

SESSION_START(time_attr, interval)

常用于計算窗口的開始時間和結束時間卖怜。Returns the start timestamp of the corresponding tumbling, hopping, and session window.

|
|

TUMBLE_END(time_attr, interval)

HOP_END(time_attr, interval, interval)

SESSION_END(time_attr, interval)

Returns the end timestamp of the corresponding tumbling, hopping, and session window.

Table Sources & Sinks

User-Defined Functions

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市阐枣，隨后出現(xiàn)的幾起案子马靠，更是在濱河造成了極大的恐慌，老刑警劉巖蔼两，帶你破解...
沈念sama閱讀 219,270評論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件甩鳄，死亡現(xiàn)場離奇詭異，居然都是意外死亡额划，警方通過查閱死者的電腦和手機妙啃，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,489評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來锁孟，“玉大人彬祖，你說我怎么就攤上這事∑烦椋” “怎么了储笑？”我有些...
開封第一講書人閱讀 165,630評論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長圆恤。經常有香客問我突倍，道長，這世上最難降的妖魔是什么盆昙？我笑而不...
開封第一講書人閱讀 58,906評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任羽历，我火速辦了婚禮，結果婚禮上淡喜，老公的妹妹穿的比我還像新娘秕磷。我一直安慰自己，他們只是感情好炼团，可當我...
茶點故事閱讀 67,928評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布澎嚣。她就那樣靜靜地躺著疏尿，像睡著了一般。火紅的嫁衣襯著肌膚如雪易桃。梳的紋絲不亂的頭發(fā)上褥琐，一...
開封第一講書人閱讀 51,718評論 1贊 305
城市分裂傳說
那天，我揣著相機與錄音晤郑，去河邊找鬼敌呈。笑死，一個胖子當著我的面吹牛造寝，可吹牛的內容都是我干的磕洪。我是一名探鬼主播，決...
沈念sama閱讀 40,442評論 3贊 420
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼匹舞，長吁一口氣：“原來是場噩夢啊……” “哼褐鸥！你這毒婦竟也來了？” 一聲冷哼從身側響起赐稽，我...
開封第一講書人閱讀 39,345評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤叫榕，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后姊舵，有當地人在樹林里發(fā)現(xiàn)了一具尸體晰绎，經...
沈念sama閱讀 45,802評論 1贊 317
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 37,984評論 3贊 337
?白月光啟示錄
正文我和宋清朗相戀三年括丁，在試婚紗的時候發(fā)現(xiàn)自己被綠了荞下。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 40,117評論 1贊 351
活死人
序言：一個原本活蹦亂跳的男人離奇死亡史飞，死狀恐怖尖昏，靈堂內的尸體忽然破棺而出，到底是詐尸還是另有隱情构资，我是刑警寧澤抽诉，帶...
沈念sama閱讀 35,810評論 5贊 346
?日本核電站爆炸內幕
正文年R本政府宣布，位于F島的核電站吐绵，受9級特大地震影響迹淌，放射性物質發(fā)生泄漏。R本人自食惡果不足惜己单，卻給世界環(huán)境...
茶點故事閱讀 41,462評論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一唉窃、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧纹笼，春花似錦纹份、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,011評論 0贊 22
一樁弒父案蔓涧，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽削咆。三九已至，卻和暖如春蠢笋，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背鳞陨。一陣腳步聲響...
開封第一講書人閱讀 33,139評論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工昨寞，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人厦滤。一個月前我還...
沈念sama閱讀 48,377評論 3贊 373
代替公主和親
正文我出身青樓援岩，卻偏偏與公主長得像，于是被迫代替她去往敵國和親掏导。傳聞我的和親對象是個殘疾皇子享怀，可洞房花燭夜當晚...
茶點故事閱讀 45,060評論 2贊 355

Flink官方翻譯-03Table API & SQL

推薦閱讀更多精彩內容