Flink官方翻譯-03Table API & SQL

Apache Flink features two relational APIs - the Table API and SQL - for unified stream and batch processing.

Flink’s SQL support is based on Apache Calcite which implements the SQL standard.

注意事項:table api和sql還處于活躍開發(fā)狀態(tài)烦衣,不是所有功能都已經實現(xiàn)

Please note that the Table API and SQL are not yet feature complete and are being actively developed. Not all operations are supported by every combination of [Table API, SQL] and [stream, batch] input.

Concepts & Common API

Concepts & Common API

Streaming Table API & SQL

關系查詢在Data Streams

區(qū)別

<colgroup><col style="width: 424px;"><col style="width: 424px;"></colgroup>
|

Relational Algebra / SQL

|

Stream Processing

|
|

Relations (or tables) are bounded (multi-)sets of tuples.

|

A stream is an infinite sequences of tuples.

|
|

A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.

|

A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.

|
|

A batch query terminates after it produced a fixed sized result.

|

A streaming query continuously updates its result based on the received records and never completes.

|

聯(lián)系

  • A database table is the result of a stream of INSERT, UPDATE, and DELETE DML statements, often called changelog stream.
  • A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view’s base relations.
  • The materialized view is the result of the streaming SQL query.

Dynamic Tables & Continuous Queries

Dynamic tables are the core concept of Flink’s Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time.

[圖片上傳失敗...(image-da00e1-1524133656710)]

  1. A stream is converted into a dynamic table.
  2. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
  3. The resulting dynamic table is converted back into a stream.

Defining a Table on a Stream

Essentially, we are building a table from an INSERT-only changelog stream.

The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.

[圖片上傳失敗...(image-b30af4-1524133656710)]

注意: A table which is defined on a stream is internally not materialized

Continuous Queries

A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables.At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.本質上與在batch table中使用sql查詢,效果是一致的

[圖片上傳失敗...(image-1345f7-1524133656710)]

When the query is started, the clicks table (left-hand side) is empty. The query starts to compute the result table, when the first row is inserted into the clicks table. After the first row [Mary, ./home] was inserted, the result table (right-hand side, top) consists of a single row [Mary, 1]. When the second row [Bob, ./cart] is inserted into the clicks table, the query updates the result table and inserts a new row [Bob, 1]. The third row [Mary, ./prod?id=1] yields an update of an already computed result row such that [Mary, 1] is updated to [Mary, 2]. Finally, the query inserts a third row [Liz, 1] into the result table, when the fourth row is appended to the clicks table.

[圖片上傳失敗...(image-6a8473-1524133656710)]

As before, the input table clicks is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (cTime) between 12:00:00 and 12:59:59. The query computes two results rows from this input (one for each user) and appends them to the result table. For the next window between 13:00:00 and 13:59:59, the clicks table contains three rows, which results in another two rows being appended to the result table. The result table is updated, as more rows are appended to clicks over time.

Query Restrictions

State Size: 有些程序會運行經年累月的,中間的state都會做保存,如果數據量大了會造成查詢失敗潮尝。

Computing Updates:

Table to Stream Conversion

A dynamic table can be continuously modified by INSERT, UPDATE, and DELETE changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without UPDATE and DELETE modifications, or anything in between.

  • Append-only stream: A dynamic table that is only modified by INSERT changes can be converted into a stream by emitting the inserted rows.
  • Retract stream: A retract stream is a stream with two types of messages, add messages and retract messages. A dynamic table is converted into an retract stream by encoding an INSERT change as add message, a DELETE change as retract message, and an UPDATE change as a retract message for the updated (previous) row and an add message for the updating (new) row. The following figure visualizes the conversion of a dynamic table into a retract stream.
  • Upsert stream: An upsert stream is a stream with two types of messages, upsert messages and delete message. A dynamic table that is converted into an upsert stream requires a (possibly composite) unique key. A dynamic table with unique key is converted into a dynamic table by encoding INSERT and UPDATE changes as upsert message and DELETE changes as delete message. The stream consuming operator needs to be aware of the unique key attribute in order to apply messages correctly. The main difference to a retract stream is that UPDATE changes are encoded with a single message and hence more efficient. The following figure visualizes the conversion of a dynamic table into an upsert stream.

Time Attributes

Flink主要提供三種時間粒度:

1.Processing time:使用機器的的時間斜纪,通常被稱為:wall-clock time

2.Event time :事件時間,通常都是基于每行數據本身的timestamp

3.Ingestion time:通常是事件進入到flink的洞就,通常與事件時間相同棍厌。

Processing Time

During DataStream-to-Table Conversion

Using a TableSource

Event time

During DataStream-to-Table Conversion

There are two ways of defining the time attribute when converting a DataStream into a Table:

  • Extending the physical schema by an additional logical field
  • Replacing a physical field by a logical field (e.g. because it is no longer needed after timestamp extraction).

Using a TableSource

The event time attribute is defined by a TableSource that implements the DefinedRowtimeAttribute interface. The logical time attribute is appended to the physical schema defined by the return type of the TableSource.

Timestamps and watermarks must be assigned in the stream that is returned by the getDataStream() method.

Query Configuration

Table API

Update and Append Queries

第一個栗子肾胯,是可以更新前面的數據,changelog stream定義了結果包含insert 和 update changes耘纱。第二個栗子敬肚,只是追加結果到結果表中。changelog stream 結果表只是由insert changes組成

SQL

SQL queries are specified with the sql() method of the TableEnvironment. The method returns the result of the SQL query as a Table. A Table can be used in subsequent SQL and Table API queries, be converted into a DataSet or DataStream, or written to a TableSink). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single program.

Table可以從 TableSource, Table, DataStream, 或者 DataSet 轉化而來束析。

或者艳馒,用戶可以從 注冊外部的目錄到 TableEnvironment中

為了用sql查詢的方式查詢table,必須注冊到 tableEnvironment中员寇。一個table可以從 TableSource弄慰,Table,DataStream 蝶锋, 或者 DataSet中注冊生成陆爽。或者扳缕,用戶也可以從外部目錄注冊到TableEnvironment中慌闭,通過指定一個特定的目錄。

In order to access a table in a SQL query, it must be registered in the TableEnvironment. A table can be registered from a TableSource, Table, DataStream, or DataSet. Alternatively, users can also register external catalogs in a TableEnvironment to specify the location of the data sources.

注意:flink sql的功能支持還沒有健全躯舔。有些不支持的sql查詢了之后會報 TableException驴剔。所以支持的sql在批量處理中或者流式處理中都列舉在下面

Supported Syntax

Flink parses SQL using Apache Calcite, which supports standard ANSI SQL. DML and DDL statements are not supported by Flink.

Scan, Projection, and Filter

Aggregations

GroupBy Aggregation

GroupBy Window Aggregation

Over Window aggregation

Group Windows

<colgroup><col style="width: 310px;"><col style="width: 310px;"></colgroup>
|

TUMBLE(time_attr, interval)

|

固定窗口

|
|

HOP(time_attr, interval, interval)

|

滑動窗口(跳躍窗口)。有兩個interval參數庸毫,第一個主要定義了滑動間隔,第二個主要定義了窗口大小

|
|

SESSION(time_attr, interval)

|

會話窗口衫樊。session time window沒有固定的持續(xù)時間飒赃,但是它們的邊界是通過時間 interval來交互的利花。如果在固定的間隙之間沒有新的event進入,session window就會關閉载佳,或者這行數據就會添加到已有的window中

|

Time Attributes

  • Processing time炒事。記錄是機器和系統(tǒng)的時間,當在做處理的時候蔫慧。
  • Event time挠乳。消息自帶的時間,可以通過encode等指定姑躲。
  • Ingestion time睡扬。事件到達flink的時間。這個與process time的功能類似黍析。

<colgroup><col style="width: 297px;"><col style="width: 293px;"></colgroup>
|

Auxiliary Function

|

Description

|
|

TUMBLE_START(time_attr, interval)

HOP_START(time_attr, interval, interval)

SESSION_START(time_attr, interval)

|

常用于計算窗口的開始時間和結束時間卖怜。Returns the start timestamp of the corresponding tumbling, hopping, and session window.

|
|

TUMBLE_END(time_attr, interval)

HOP_END(time_attr, interval, interval)

SESSION_END(time_attr, interval)

|

Returns the end timestamp of the corresponding tumbling, hopping, and session window.

|

Table Sources & Sinks

User-Defined Functions

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市阐枣,隨后出現(xiàn)的幾起案子马靠,更是在濱河造成了極大的恐慌,老刑警劉巖蔼两,帶你破解...
    沈念sama閱讀 219,270評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件甩鳄,死亡現(xiàn)場離奇詭異,居然都是意外死亡额划,警方通過查閱死者的電腦和手機妙啃,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,489評論 3 395
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來锁孟,“玉大人彬祖,你說我怎么就攤上這事∑烦椋” “怎么了储笑?”我有些...
    開封第一講書人閱讀 165,630評論 0 356
  • 文/不壞的土叔 我叫張陵,是天一觀的道長圆恤。 經常有香客問我突倍,道長,這世上最難降的妖魔是什么盆昙? 我笑而不...
    開封第一講書人閱讀 58,906評論 1 295
  • 正文 為了忘掉前任羽历,我火速辦了婚禮,結果婚禮上淡喜,老公的妹妹穿的比我還像新娘秕磷。我一直安慰自己,他們只是感情好炼团,可當我...
    茶點故事閱讀 67,928評論 6 392
  • 文/花漫 我一把揭開白布澎嚣。 她就那樣靜靜地躺著疏尿,像睡著了一般。 火紅的嫁衣襯著肌膚如雪易桃。 梳的紋絲不亂的頭發(fā)上褥琐,一...
    開封第一講書人閱讀 51,718評論 1 305
  • 那天,我揣著相機與錄音晤郑,去河邊找鬼敌呈。 笑死,一個胖子當著我的面吹牛造寝,可吹牛的內容都是我干的磕洪。 我是一名探鬼主播,決...
    沈念sama閱讀 40,442評論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼匹舞,長吁一口氣:“原來是場噩夢啊……” “哼褐鸥!你這毒婦竟也來了?” 一聲冷哼從身側響起赐稽,我...
    開封第一講書人閱讀 39,345評論 0 276
  • 序言:老撾萬榮一對情侶失蹤叫榕,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后姊舵,有當地人在樹林里發(fā)現(xiàn)了一具尸體晰绎,經...
    沈念sama閱讀 45,802評論 1 317
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 37,984評論 3 337
  • 正文 我和宋清朗相戀三年括丁,在試婚紗的時候發(fā)現(xiàn)自己被綠了荞下。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,117評論 1 351
  • 序言:一個原本活蹦亂跳的男人離奇死亡史飞,死狀恐怖尖昏,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情构资,我是刑警寧澤抽诉,帶...
    沈念sama閱讀 35,810評論 5 346
  • 正文 年R本政府宣布,位于F島的核電站吐绵,受9級特大地震影響迹淌,放射性物質發(fā)生泄漏。R本人自食惡果不足惜己单,卻給世界環(huán)境...
    茶點故事閱讀 41,462評論 3 331
  • 文/蒙蒙 一唉窃、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧纹笼,春花似錦纹份、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,011評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽削咆。三九已至,卻和暖如春蠢笋,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背鳞陨。 一陣腳步聲響...
    開封第一講書人閱讀 33,139評論 1 272
  • 我被黑心中介騙來泰國打工昨寞, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人厦滤。 一個月前我還...
    沈念sama閱讀 48,377評論 3 373
  • 正文 我出身青樓援岩,卻偏偏與公主長得像,于是被迫代替她去往敵國和親掏导。 傳聞我的和親對象是個殘疾皇子享怀,可洞房花燭夜當晚...
    茶點故事閱讀 45,060評論 2 355

推薦閱讀更多精彩內容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,334評論 0 10
  • 一、備課反思 高二物理綜合性強趟咆,我在準備過程添瓷,參照了近幾年全國各地高考題,同時結合教學進度和學生情況值纱,進行了綜合分...
    酒泉教研室王乾祥閱讀 1,105評論 1 1
  • 想寫點什么鳞贷,不知道從那里開始,正好想復習安卓基礎那就寫寫安卓基礎吧!! 當創(chuàng)建第一個項目時,文件新建一直下一步直到...
    icechao閱讀 670評論 0 1
  • 清明雪紛紛虐唠,誰料人間白搀愧? 若有祭掃人,勿動墳上雪疆偿! 一瓶二鍋頭咱筛,生前吾所愛。 今日可帶來杆故?幽冥尤寒潮迅箩。 讓我痛飲盡...
    樓臺花舍閱讀 170評論 0 11
  • 悼一位親人 我來了你卻走了 一個無語的老人走了 一個無欲的老人走了 一個無痕的老人走了 一個無恨的老人走了 我也會...
    一了0820閱讀 311評論 3 3