擁抱開源蝉仇，我們是認真的-網(wǎng)易易數(shù)2020年Apache Spark貢獻總結(jié)

姓名：鄧復元

學號17020120041

轉(zhuǎn)載自：https://blog.csdn.net/NetEaseResearch/article/details/111661473?spm=1000.2115.3001.4128

【嵌牛導讀】：本文介紹了網(wǎng)絡開源的前景和網(wǎng)易易數(shù)年終總結(jié)

【嵌牛鼻子】：網(wǎng)絡開源的新時代

【嵌牛模塊】：開源的背景，現(xiàn)狀以及發(fā)展前景

【嵌牛正文】：

前言

“自研不等于自主可控，開放才是未來轿衔〕良＃”來自網(wǎng)易副總裁汪源的一席話，體現(xiàn)了擁抱開源和構(gòu)建開源生態(tài)方面害驹，網(wǎng)易人的決心和一貫堅持鞭呕。

我們?yōu)槭裁匆獡肀ч_源

對企業(yè)而言，四字真言裙秋，有利可圖琅拌。這是擺在我們面前現(xiàn)實且需要正視的目的。對于企業(yè)來講摘刑，“使用”開源可以降低總體擁有成本并提升軟件質(zhì)量进宝，可以提前獲取最前沿的創(chuàng)新技術(shù)，降本提效促發(fā)展枷恕；“參與”開源可以提升雇員技術(shù)水平党晋，在技術(shù)社區(qū)建立品牌形象，與技術(shù)大拿建立信任徐块，人才培養(yǎng)選拔兩手抓未玻，“構(gòu)建”開源生態(tài)可推廣技術(shù)理念、構(gòu)建行業(yè)標準和加深上下游行業(yè)合作胡控，技術(shù)帶頭共協(xié)同扳剿。

根據(jù)『紅帽? 2020 年度企業(yè)開源現(xiàn)狀報告』，企業(yè)對于開源的擁抱程度逐漸增強：

越來越多的企業(yè)意識到開源的重要性昼激。有 95% 的 IT 領導者認為庇绽，企業(yè)開源對于他們的企業(yè)基礎架構(gòu)軟件戰(zhàn)略至關重要。

專有軟件的使用正在快速減少橙困。昂貴且不靈活的專有軟件許可瞧掺，導致高昂的資本支出(CapEx)和供應商鎖定。

對技術(shù)人員而言凡傅，也是為自身熱愛的事業(yè)全情投入辟狈，用技術(shù)創(chuàng)新改變世界的最佳場所。

網(wǎng)易與Apache Spark

Apache Spark 目前是網(wǎng)易集團內(nèi)部主流大數(shù)據(jù)計算引擎夏跷，日承接PB級數(shù)據(jù)處理哼转，涵蓋離線計算、實時計算和傳統(tǒng)機器學習等方方面面的任務槽华。

為了減少Spark在網(wǎng)易內(nèi)部的維護成本释簿，和促進 Spark 新技術(shù)在網(wǎng)易的快速落地，網(wǎng)易易數(shù)團隊采取了以下三個策略硼莽。

1. 一體化

我們的技術(shù)開發(fā)人員都在不同程度上積極參與社區(qū)貢獻庶溶，加深和社區(qū)的合作煮纵，和社區(qū)融為一體。

社區(qū)尋求能夠持續(xù)貢獻的開發(fā)者偏螺，可以建立長期良好的合作關系行疏，并相互給與足夠的信任氓英。這將一定程度上使我們的內(nèi)部需求積極轉(zhuǎn)變?yōu)樾袠I(yè)標準碱璃，社區(qū)最新的技術(shù)也可以實時落地朝墩。

文末所附清單不完全統(tǒng)計了截至2020年底網(wǎng)易人在Apache Spark 的主要貢獻墅诡，約 300 commits。

2. 插件化

當然板熊，業(yè)務傾向使然帘饶，相應的技術(shù)配套在各企業(yè)實體或行業(yè)中總是伴隨分歧的甜无。所以柳譬，對于Spark源代碼的改造是不可避免的喳张。出于可維護的目的，我們的策略是將這樣特異性的需求從Spark中獨立出來美澳，形成插件销部，降低與Spark主干的耦合性，輕量化的迭代制跟。以下是幾個插件的介紹：

1. Spark-ranger

Spark-ranger 是權(quán)限控制插件舅桩，為提供Spark計算引擎 SQL 標準的細粒度權(quán)限控制，包括列級別的鑒權(quán)雨膨、行級別的過濾擂涛，及數(shù)據(jù)匿名等功能。在大數(shù)據(jù)數(shù)倉場景下聊记，Spark SQL作為一款高性能的查詢引擎在數(shù)據(jù)安全方面的功能一直是其短板歼指。本項目創(chuàng)立的目的，旨在彌補猛犸產(chǎn)品在數(shù)倉管理功能上最后一塊權(quán)限漏洞甥雕。Spark-ranger 作為猛犸安全組件的一部分，在公司內(nèi)部每天需要為業(yè)務方數(shù)十萬的Spark任務提供鑒權(quán)服務胀茵，同時也在公司外部所有的商業(yè)局點保證著客戶的數(shù)據(jù)安全社露。項目目前已經(jīng)托管給Apache 基金會，作為一個子模塊在?https://submarine.apache.org/?項目中進行維護琼娘。

Spark-ranger開源地址：https://github.com/NetEase/spark-ranger

2. Spark-greenplum

Spark-greenplum 是大數(shù)據(jù)數(shù)倉和PostgreSQL及Greenplum數(shù)據(jù)庫的性能傳輸工具峭弟，提供Apache Spark原生 API 百倍性能的提升。項目創(chuàng)立的目的是為了提升網(wǎng)易猛犸和網(wǎng)易有數(shù)之間數(shù)據(jù)交換的能力脱拼。Spark-greenplum 項目用于網(wǎng)易有數(shù)從網(wǎng)易猛犸大數(shù)據(jù)平臺的取數(shù)環(huán)節(jié)瞒瘸。

Spark-greenplum 開源地址：https://github.com/NetEase/spark-greenplum

3. Spark-alarm

Spark-alarm 是細粒度的 Spark 任務監(jiān)控工具，可以對 Spark 任務進行全面的監(jiān)控熄浓，已經(jīng)自定義關鍵指標的監(jiān)控情臭，并提供豐富的報警手段，如網(wǎng)易哨兵，郵件和EasyOps等俯在。項目的目的是有效的保障各類業(yè)務KPI/SLA任務的安全運行竟秫。spark-alarm 是一個任務級別的SDK，目前提供網(wǎng)易內(nèi)部各業(yè)務方跷乐，埋點在各自的關鍵任務中肥败。

Spark-alarm開源地址：https://github.com/NetEase/spark-alarm

3. 生態(tài)化

如前面提到的構(gòu)建開源生態(tài)可推廣技術(shù)理念、構(gòu)建行業(yè)標準和加深上下游行業(yè)合作愕提，起到技術(shù)帶頭共協(xié)同的作用馒稍。

在大數(shù)據(jù)領域，我們最初圍繞Google的三篇論文打造了Apache Hadoop生態(tài)浅侨，然后我們有圍繞Hadoop生態(tài)構(gòu)建了活躍的Apache Spark生態(tài)纽谒，現(xiàn)在又有不同層面的產(chǎn)品，如數(shù)據(jù)湖等圍繞該生態(tài)構(gòu)建實現(xiàn)真正的批流一體靜計算仗颈，同時和CNCF的Kubernetes社區(qū)又可以交叉融合實現(xiàn)大數(shù)據(jù)與云計算的深度融合佛舱。我們也基于該生態(tài)之上，從網(wǎng)易及網(wǎng)易合作伙伴的業(yè)態(tài)出發(fā)挨决，打造了Kyuubi生態(tài)请祖。

Kyuubi是高性能大數(shù)據(jù)JDBC通用服務引擎。在大數(shù)據(jù)領域脖祈，Kyuubi以靈活的架構(gòu)和統(tǒng)一的SQL API去適配不同的計算引擎以追求極致的計算性能肆捕，適配不同的資源調(diào)度器以適應存算耦合分離的自由切換，適配不同的計算模型以實現(xiàn)批流一體架構(gòu)盖高，適配不同的業(yè)務場景以實現(xiàn)一站式的大數(shù)據(jù)應用開發(fā)慎陵。目標是讓用戶能像處理普通數(shù)據(jù)一樣處理大數(shù)據(jù)。

第一喻奥、通用易用的數(shù)據(jù)訪問方式席纽。Kyuubi依托標準化的JDBC接口提供大數(shù)據(jù)場景下便捷易用的數(shù)據(jù)訪問訪問方式，終端用戶無需對底層大數(shù)據(jù)平臺（計算引擎撞蚕、存儲服務润梯、元數(shù)據(jù)管理等）感知即可專注開發(fā)自身業(yè)務系統(tǒng)及挖掘數(shù)據(jù)價值。

第二甥厦、高性能的數(shù)據(jù)查詢能力纺铭。Kyuubi依托Apache Spark及Flink等計算引擎提供高性能的數(shù)據(jù)查詢能力，引擎自身能力每一次提升都可以幫助 Kyuubi服務的性能產(chǎn)生質(zhì)的飛躍刀疙，在此基礎之上舶赔，Kyuubi 同時提供數(shù)據(jù)緩存、查詢動態(tài)優(yōu)化等能力進一步提升性能谦秧。一方面竟纳，對于訪問頻率高的查詢通過設置緩存提升查詢效率撵溃；另一方面，根據(jù)用戶訪問數(shù)據(jù)量的規(guī)模動態(tài)優(yōu)化查詢計劃蚁袭，在支持海量結(jié)果流式返回的同時保證性能優(yōu)化征懈。

第三、完備的企業(yè)級特性支持揩悄。依托 Kyuubi 自身架構(gòu)的特點卖哎，提供認證、鑒權(quán)服務删性，保障數(shù)據(jù)安全性亏娜；提供健壯的高可用服務，保障服務的可用性蹬挺；提供多租戶資源資源隔離的能力维贺，提供端到端的計算資源及數(shù)據(jù)安全隔離；提供兩級彈性資源管理巴帮，在有效提升資源利用率的基礎上合理控制成本溯泣，并且有效的覆蓋交互式、批處理和點查榕茧、全表Scan等各種場景的性能及響應要求垃沦。

第四，豐富的生態(tài)支持與構(gòu)建用押。一個優(yōu)秀的開源產(chǎn)品離不開優(yōu)秀的開源生態(tài)支持肢簿。Kyuubi 在擁抱Spark等頂級開源生態(tài)的同時，一方面有效的利用這些項目本身生態(tài)的開放性蜻拨，可以快速使得Kyuubi對其既有生態(tài)及新特性新生態(tài)的拓展池充，如云原生支持、數(shù)據(jù)湖（Data Lake/Lake House)的支持缎讼；另一方面收夸，Kyuubi也積極構(gòu)建和完善自己的生態(tài)，彌補各個環(huán)節(jié)的空缺血崭，如?https://github.com/netease/spark-ranger項目可完善大數(shù)據(jù)鏈路中權(quán)限控制短板卧惜，https://github.com/netease/spark-greenplum項目可解決Spark與傳統(tǒng)數(shù)據(jù)庫PostgreSQL和MPP數(shù)據(jù)庫Greenplum數(shù)據(jù)交換的性能問題等等。

Kyuubi開源地址：https://github.com/netease/kyuubi

總結(jié)

2020年功氨，不平凡的一年。來自大自然的威脅手幢，讓我們深刻地認識到全人類開放合作的重要性捷凄。

一個開源社區(qū)的本質(zhì)是開發(fā)者。擁抱開源围来，構(gòu)建開源生態(tài)跺涤，符合網(wǎng)易的使命愿景：網(wǎng)聚人的力量匈睁，以科技創(chuàng)新締造美好生活

參與開源，當然除了上面所提到的符合企業(yè)自身利益桶错，同時也是因為熱愛：為熱愛全心投入航唆。

附：截至2020年底網(wǎng)易人在Apache Spark 的主要貢獻

*ae1d05927a [SPARK-33892][SQL] Display char/varchar in DESC and SHOW CREATE TABLE

*2287f56a3e (origin/master, origin/HEAD, master) [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns

*a3dd8dacee [SPARK-33877][SQL] SQL reference documents for INSERT w/ a column list

*6da5cdf1db [SPARK-33876][SQL] Add length-check for reading char/varchar from tables w/ a external location

*f5fd10b1bc (SparkSPARK-33877) [SPARK-33834][SQL] Verify ALTER TABLE CHANGE COLUMN with Char and Varchar

*dd44ba5460 [SPARK-32976][SQL][FOLLOWUP] SET and RESTORE hive.exec.dynamic.partition.mode for HiveSQLInsertTestSuite to avoid flakiness

*c17c76dd16 [SPARK-33599][SQL][FOLLOWUP] FIX Github Action with unidoc

*728a1298af [SPARK-33806][SQL] limit partition num to 1 when distributing by foldable expressions

*205d8e40bc [SPARK-32991][SQL] [FOLLOWUP] Reset command relies on session initials first

*4d47ac4b4b [SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness

*31e0baca30 [SPARK-33740][SQL] hadoop configs in hive-site.xml can overrides pre-existing hadoop ones

*c88eddac3b [SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for CHAR VARCHAR types

*da72b87374 [SPARK-33641][SQL] Invalidate new char/varchar types in public APIs that produce incorrect results

*2da72593c1 [SPARK-32976][SQL] Support column list in INSERT statement

*cdd8e51742 [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql

*4335af075a [MINOR][DOC] spark.executor.memoryOverhead is not cluster-mode only

*036c11b0d4 [SPARK-33397][YARN][DOC] Fix generating md to html for available-patterns-for-shs-custom-executor-log-url

*82d500a05c [SPARK-33193][SQL][TEST] Hive ThriftServer JDBC Database MetaData API Behavior Auditing

*e21bb710e5 [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET

*dcb0820433 [SPARK-32785][SQL][DOCS][FOLLOWUP] Update migaration guide for incomplete interval literals

*2507301705 [SPARK-33159][SQL] Use hive-service-rpc as dependency instead of inlining the generated code

*17d309dfac [SPARK-32963][SQL] empty string should be consistent for schema name in SparkGetSchemasOperation

*e2a740147c [SPARK-32874][SQL][FOLLOWUP][TEST-HIVE1.2][TEST-HADOOP2.7] Fix spark-master-test-sbt-hadoop-2.7-hive-1.2

*9e9d4b6994 [SPARK-32905][CORE][YARN] ApplicationMaster fails to receive UpdateDelegationTokens message

*316242b768 [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server

*5669b212ec [SPARK-32840][SQL] Invalid interval value can happen to be just adhesive with the unit

*9ab8a2c36d [SPARK-32826][SQL] Set the right column size for the null type in SparkGetColumnsOperation

*de44e9cfa0 [SPARK-32785][SQL] Interval with dangling parts should not results null

*1fba286407 [SPARK-32781][SQL] Non-ASCII characters are mistakenly omitted in the middle of intervals

*6dacba7fa0 [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation

*0626901bcb [SPARK-32729][SQL][DOCS] Add missing since version for math functions

*f14f3742e0 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly

*1f3bb51757 [SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F

*c26a97637f Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv…

*1b6f482adb [SPARK-32492][SQL][FOLLOWUP][TEST-MAVEN] Fix jenkins maven jobs

*7f5326c082 [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools

* 3deb59d5c2 [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

* f4800406a4 [SPARK-32406][SQL][FOLLOWUP] Make RESET fail against static and core configs

* 510a1656e6 [SPARK-32412][SQL] Unify error handling for spark thrift server operations

* d315ebf3a7 [SPARK-32424][SQL] Fix silent data change for timestamp parsing if overflow happens

* d3596c04b0 [SPARK-32406][SQL] Make RESET syntax support single configuration reset

* b151194299 [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server

* 29b7eaa438 [MINOR][SQL] Fix warning message for ThriftCLIService.GetCrossReference and GetPrimaryKeys

* efa70b8755 [SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation

* bdeb626c5a [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

* 4609f1fdab [SPARK-32207][SQL] Support 'F'-suffixed Float Literals

* 59a70879c0 [SPARK-32145][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

* 9f8e15bb2e [SPARK-32034][SQL] Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown

* 93529a8536 [SPARK-31957][SQL] Cleanup hive scratch dir for the developer api startWithContext

* abc8ccc37b [SPARK-31926][SQL][TESTS][FOLLOWUP][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber

* a0187cd6b5 [SPARK-31926][SQL][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber

* 22dda6e18e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing

* 6a424b93e5 [SPARK-31830][SQL] Consistent error handling for datetime formatting and parsing functions

* 02f32cfae4 [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber

* fc6af9d900 [SPARK-31867][SQL][FOLLOWUP] Check result differences for datetime formatting

* 9d5b5d0a58 [SPARK-31879][SQL][TEST-JAVA11] Make week-based pattern invalid for formatting too

* afcc14c6d2 [SPARK-31896][SQL] Handle am-pm timestamp parsing when hour is missing

* afe95bd9ad [SPARK-31892][SQL] Disable week-based date filed for parsing

* c59f51bcc2 [SPARK-31879][SQL] Using GB as default Locale for datetime formatters

* 547c5bf552 [SPARK-31867][SQL] Disable year type datetime patterns which are longer than 10

* fe1da296da [SPARK-31833][SQL][TEST-HIVE1.2] Set HiveThriftServer2 with actual port while configured 0

* 311fe6a880 [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite

* 695cb617d4 (t1) [SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M/L/E/u/Q/q'

* 0df8dd6073 [SPARK-30352][SQL] DataSourceV2: Add CURRENT_CATALOG function

*7e2ed40d58 [SPARK-31759][DEPLOY] Support configurable max number of rotate logs for spark daemons

*1f29f1ba58 [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table

*1d66085a93 [SPARK-31289][TEST][TEST-HIVE1.2] Eliminate org.apache.spark.sql.hive.thriftserver.CliSuite flakiness

*503faa24d3 [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard

*ce714d8189 [SPARK-31678][SQL] Print error stack trace for Spark SQL CLI when error occurs

*b31ae7bb0b [SPARK-31615][SQL] Pretty string output for sql method of RuntimeReplaceable expressions

*bd6b53cc0b [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry

*9241f8282f [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

*ea525fe8c0 [SPARK-31597][SQL] extracting day from intervals should be interval.days + days in interval.microsecond

*295d866969 [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc

*54996be4d2 [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations

*beec8d535f [SPARK-31586][SQL] Replace expression TimeSub(l, r) with TimeAdd(l -r)

*5ba467ca1d [SPARK-31550][SQL][DOCS] Set nondeterministic configurations with general meanings in sql configuration doc

*ebc8fa50d0 [SPARK-31527][SQL] date add/subtract interval only allow those day precision in ansi mode

*7959808e96 [SPARK-31564][TESTS] Fix flaky AllExecutionsPageSuite for checking 1970

*f92652d0b5 [SPARK-31528][SQL] Remove millennium, century, decade from trunc/date_trunc fucntions

* caf3ab8411 [SPARK-31552][SQL] Fix ClassCastException in ScalaReflection arrayClassFor

* 8424f55229 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession

* 8dc2c0247b [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static

* 3b5792114a [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info

* 37d2e037ed [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

* 2c2062ea7c [SPARK-31498][SQL][DOCS] Dump public static sql configurations through doc generation

* 1985437110 [SPARK-31474][SQL] Consistency between dayofweek/dow in extract exprsession and dayofweek function

* 77cb7cde0d [SPARK-31469][SQL][TESTS][FOLLOWUP] Remove unsupported fields from ExtractBenchmark

* 697083c051 [SPARK-31469][SQL] Make extract interval field ANSI compliance

* 31b907748d [SPARK-31414][SQL][DOCS][FOLLOWUP] Update default datetime pattern for json/csv APIs documentations

* d65f534c5a [SPARK-31414][SQL] Fix performance regression with new TimestampFormatter for json and csv time parsing

* a454510917 [SPARK-31392][SQL] Support CalendarInterval to be reflect to CalendarntervalType

* 3c94a7c8f5 [SPARK-29311][SQL][FOLLOWUP] Add migration guide for extracting second from datetimes

* 1ce584f6b7 [SPARK-31321][SQL] Remove SaveMode check in v2 FileWriteBuilder

* f376d24ea1 [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery

* 5945d46c11 [SPARK-31225][SQL] Override sql method of OuterReference

* 8be16907c2 [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir

* 44bd36ad7b [SPARK-31234][SQL] ResetCommand should reset config to sc.conf only

* b024a8a69e [MINOR][DOCS] Fix some links for python api doc

* 336621e277 [SPARK-31258][BUILD] Pin the avro version in SBT

* f81f11822c [SPARK-31189][R][DOCS][FOLLOWUP] Replace Datetime pattern links in R doc

* 88ae6c4481 [SPARK-31189][SQL][DOCS] Fix errors and missing parts for datetime pattern document

* 3d695954e5 [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text

* 57fcc49306 [SPARK-31176][SQL] Remove support for 'e'/'c' as datetime pattern charactar

* f1d27cdd91 [SPARK-31119][SQL] Add interval value support for extract expression as extract source

* 5bc0d76591 [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir

* 0946a9514f [SPARK-31150][SQL] Parsing seconds fraction with variable length for timestamp

* fbc9dc7e9d [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark

* 7b4b29e8d9 [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled

* 18f2730874 [SPARK-31066][SQL][TEST-HIVE1.2] Disable useless and uncleaned hive SessionState initialization parts

* 2b46662bd0 [SPARK-31111][SQL][TESTS] Fix interval output issue in ExtractBenchmark

* 3bd6ebff81 [SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces

* f45ae7f2c5 [SPARK-31038][SQL] Add checkValue for spark.sql.session.timeZone

* 3edab6cc1d [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit

* 1fac06c430 Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server"

* 1383bd459a [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url

* 2d2706cb86 [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite

* a6026c830a [MINOR][BUILD] Fix make-distribution.sh to show usage without 'echo' cmd

* 761209c1f2 [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations

* 46019b6e6c [MINOR][DOCS] Fix fabric8 version in documentation

* 0353cbf092 [MINOR][DOC] Fix 2 style issues in running-on-kubernetes doc

* 58b9ca1e6f [SPARK-30592][SQL][FOLLOWUP] Add some round-trip test cases

* 3228d723a4 [SPARK-30603][SQL] Move RESERVED_PROPERTIES from SupportsNamespaces and TableCatalog to CatalogV2Util

*8e280cebf2 [SPARK-30592][SQL] Interval support for csv and json funtions

*f2d71f5838 [SPARK-30591][SQL] Remove the nonstandard SET OWNER syntax for namespaces

*af705421db [SPARK-30593][SQL] Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and no round trip

*730388b369 [SPARK-30547][SQL][FOLLOWUP] Update since anotation for CalendarInterval class

*0388b7a3ec [SPARK-30568][SQL] Invalidate interval type as a field table schema

*24efa43826 [SPARK-30019][SQL] Add the owner property to v2 table

*4806cc5bd1 [SPARK-30547][SQL] Add unstable annotation to the CalendarInterval class

*17857f9b8b [SPARK-30551][SQL] Disable comparison for interval type

*82f25f5855 [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties

*bcf07cbf5f [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

*c37312342e [SPARK-30183][SQL] Disallow to specify reserved properties in CREATE/ALTER NAMESPACE syntax

*8c121b0827 [SPARK-30431][SQL] Update SqlBase.g4 to create commentSpec pattern like locationSpec

*c49388a484 [SPARK-30214][SQL] A new framework to resolve v2 commands

*e04309cb1f [SPARK-30341][SQL] Overflow check for interval arithmetic operations

*f0bf2eb006 [SPARK-30356][SQL] Codegen support for the function str_to_map

*da65a955ed [SPARK-30266][SQL] Avoid match error and int overflow in ApproximatePercentile and Percentile

*12249fcdc7 [SPARK-30301][SQL] Fix wrong results when datetimes as fields of complex types

*d38f816748 [MINOR][SQL][DOC] Fix some format issues in Dataset API Doc

*cc7f1eb874 [SPARK-29774][SQL][FOLLOWUP] Add a migration guide for date_add and date_sub

*bf7215c510 [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache

*d3ec8b1735 [SPARK-30066][SQL] Support columnar execution on interval types

*8f0eb7dc86 [SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal

*24c4ce1e64 [SPARK-28351][SQL][FOLLOWUP] Remove 'DELETE FROM' from unsupportedHiveNativeCommands

*e88d74052b [SPARK-30147][SQL] Trim the string when cast string type to booleans

*35bab33984 [SPARK-30121][BUILD] Fix memory usage in sbt build script

*b9cae37750 [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

*332e252a14 [SPARK-29425][SQL] The ownership of a database should be respected

*65552a81d1 [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking

*39291cff95 [SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset

*4e073f3c50 [SPARK-30047][SQL] Support interval types in UnsafeRow

*4fd585d2c5 [SPARK-30008][SQL] The dataType of collect_list/collect_set aggs should be ArrayType(_, false)

* ed0c33fdd4 [SPARK-30026][SQL] Whitespaces can be identified as delimiters in interval string

* 8b0121bea8 [MINOR][DOC] Fix the CalendarIntervalType description

* de21f28f8a [SPARK-29986][SQL] casting string to date/timestamp/interval should trim all whitespaces

* 5cf475d288 [SPARK-30000][SQL] Trim the string when cast string type to decimals

* 2dd6807e42 [SPARK-28023][SQL] Add trim logic in UTF8String's toInt/toLong to make it consistent with other string-numeric casting

* d555f8fcc9 [SPARK-29961][SQL][FOLLOWUP] Remove useless test for VectorUDT

* 7a70670345 [SPARK-29961][SQL] Implement builtin function - typeof

* 79ed4ae2db [SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point

* ea010a2bc2 [SPARK-29873][SQL][TEST][FOLLOWUP] set operations should not escape when regen golden file with --SET --import both specified

* ae6b711b26 [SPARK-29941][SQL] Add ansi type aliases for char and decimal

* 50f6d930da [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval

* 5cebe587c7 [SPARK-29783][SQL] Support SQL Standard/ISO_8601 output style for interval type

*0c68578fa9 [SPARK-29888][SQL] new interval string parser shall handle numeric with only fractional part

*15a72f3755 [SPARK-29287][CORE] Add LaunchedExecutor message to tell driver which executor is ready for making offers

*f926809a1f [SPARK-29390][SQL] Add the justify_days(), justify_hours() and justif_interval() functions

* d99398e9f5 [SPARK-29855][SQL] typed literals with negative sign with proper result or exception

* d06a9cc4bd [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values

* e026412d9c [SPARK-29679][SQL] Make interval type comparable and orderable

* e7f7990bc3 [SPARK-29688][SQL] Support average for interval type values

* 0a03839366 [SPARK-29787][SQL] Move methods add/subtract/negate from CalendarInterval to IntervalUtils

* 9562b26914 [SPARK-29757][SQL] Move calendar interval constants together

* 3437862975 [SPARK-29387][SQL][FOLLOWUP] Fix issues of the multiply and divide for intervals

* 4615769736 [SPARK-29603][YARN] Support application priority for YARN priority scheduling

* 44b8fbcc58 [SPARK-29663][SQL] Support sum with interval type values

* 8cf76f8d61 [SPARK-29285][SHUFFLE] Temporary shuffle files should be able to handle disk failures

* 5ba17d09ac [SPARK-29722][SQL] Non reversed keywords should be able to be used in high order functions

* dc987f0c8b [SPARK-29653][SQL] Fix MICROS_PER_MONTH in IntervalUtils

* 8e667db5d8 [SPARK-29629][SQL] Support typed integer literal expression

* 9a46702791 [SPARK-29554][SQL] Add `version` SQL function

* 0cf4f07c66 [SPARK-29545][SQL] Add support for bit_xor aggregate function

*5b4d9170ed [SPARK-27879][SQL] Add support for bit_and and bit_or aggregates

*ef4c298cc9 [SPARK-29405][SQL] Alter table / Insert statements should not change a table's ownership

*4b902d3b45 [SPARK-29491][SQL] Add bit_count function support

* 6d4cc7b855 [SPARK-27880][SQL] Add bool_and for every and bool_or for any as function aliases

* 02c5b4f763 [SPARK-28947][K8S] Status logging not happens at an interval for liveness

* f4c73b7c68 [SPARK-27301][DSTREAM] Shorten the FileSystem cached life cycle to the cleanup method inner scope

* ac9c0536bc [SPARK-26794][SQL] SparkSession enableHiveSupport does not point to hive but in-memory while the SparkContext exists

* f8346d2fc0 [SPARK-25174][YARN] Limit the size of diagnostic message for am to unregister itself from rm

* 4a2b15f0af [SPARK-24241][SUBMIT] Do not fail fast when dynamic resource allocation enabled with 0 executor

* a7755fd8ce [SPARK-23639][SQL] Obtain token before init metastore client in SparkSQL CLI

* 189f56f3dc [SPARK-23383][BUILD][MINOR] Make a distribution should exit with usage while detecting wrong options

* eefec93d19 [SPARK-23295][BUILD][MINOR] Exclude Waring message when generating versions in make-distribution.sh

* dd52681bf5 [SPARK-23253][CORE][SHUFFLE] Only write shuffle temporary index file when there is not an existing one

* 793841c6b8 [SPARK-21771][SQL] remove useless hive client in SparkSQLEnv

* 9fa703e893 [SPARK-22950][SQL] Handle ChildFirstURLClassLoader's parent

* 28ab5bf597 [SPARK-22487][SQL][HIVE] Remove the unused HIVE_EXECUTION_VERSION property

* c755b0d910 [SPARK-22463][YARN][SQL][HIVE] add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive

* ee571d79e5 [SPARK-22466][SPARK SUBMIT] export SPARK_CONF_DIR while conf is default

* 99e32f8ba5 [SPARK-22224][SQL] Override toString of KeyValue/Relational-GroupedDataset

* 581200af71 [SPARK-21428][SQL][FOLLOWUP] CliSessionState should point to the actual metastore not a dummy one

* b83b502c41 [SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for reusing CliSessionState

* 2387f1e316 [SPARK-21675][WEBUI] Add a navigation bar at the bottom of the Details for Stage Page

* e9d268f63e [SPARK-20096][SPARK SUBMIT][MINOR] Expose the right queue name not null if set by --conf or configure file

* 7363dde634 [SPARK-19626][YARN] Using the correct config to set credentials update time

* e33053ee00 [SPARK-11583] [CORE] MapStatus Using RoaringBitmap More Properly

* 7466031632 [SPARK-32106][SQL] Implement script transform in sql/core

* 0603913c66 [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value

* 25c6cc25f7 [SPARK-26341][WEBUI] Expose executor memory metrics at the stage level, in the Stages tab

* 5f9a7fea06 [SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow

* d7f4b2ad50 [SPARK-28704][SQL][TEST] Add back Skiped HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+

* 47326ac1c6 [SPARK-28704][SQL][TEST] Add back Skiped HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+

* dd32f45d20 [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTransferred` use default value

* 34f5e7ce77 [SPARK-33302][SQL] Push down filters through Expand

* 0c943cd2fb [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

* e43cd8ccef [SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive

* a1629b4a57 [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

* f8277d3aa3 [SPARK-32069][CORE][SQL] Improve error message on reading unexpected directory

* ddc7012b3d [SPARK-32243][SQL] HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments number error

* 0b5a379c1f [SPARK-33023][CORE] Judge path of Windows need add condition `Utils.isWindows`

* c336ddfdb8 [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

* 5e6173ebef [SPARK-31670][SQL] Trim unnecessary Struct field alias in Aggregate/GroupingSets

* 55ce49ed28 [SPARK-32400][SQL][TEST][FOLLOWUP][TEST-MAVEN] Fix resource loading error in HiveScripTransformationSuite

* 9808c15eec [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value

* c75a82794f [SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column

* 6dae11d034 [SPARK-32607][SQL] Script Transformation ROW FORMAT DELIMITED `TOK_TABLEROWFORMATLINES` only support '\n'

*03e2de99ab [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value

*643cd876e4 [SPARK-32352][SQL] Partially push down support data filter if it mixed in partition filters

*4cf8c1d07d [SPARK-32400][SQL] Improve test coverage of HiveScriptTransformationExec

*d251443a02 [SPARK-32403][SQL] Refactor current ScriptTransformationExec

*5521afbd22 [SPARK-32220][SQL][FOLLOW-UP] SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

*6d499647b3 [SPARK-32105][SQL] Refactor current ScriptTransformationExec code

*09789ff725 [SPARK-31226][CORE][TESTS] SizeBasedCoalesce logic will lose partition

*560fe1f54c [SPARK-32220][SQL] SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

*15fb5d7677 [SPARK-28169][SQL] Convert scan predicate condition to CNF

*0d9faf602e [SPARK-31655][BUILD] Upgrade snappy-java to 1.1.7.5

*6bc8d84130 [SPARK-29492][SQL] Reset HiveSession's SessionState conf's ClassLoader when sync mode

*246c398d59 [SPARK-30435][DOC] Update doc of Supported Hive Features

*3eade744f8 [SPARK-29800][SQL] Rewrite non-correlated EXISTS subquery use ScalaSubquery to optimize perf

*da27f91560 [SPARK-29957][TEST] Reset MiniKDC's default enctypes to fit jdk8/jdk11

*6146dc4562 [SPARK-29874][SQL] Optimize Dataset.isEmpty()

*eb79af8dae [SPARK-29145][SQL][FOLLOW-UP] Move tests from`SubquerySuite`to`subquery/in-subquery/in-joins.sql`

*e524a3a223 [SPARK-29742][BUILD] Update checkstyle plugin's check dir scope

*d6e33dc377 [SPARK-29599][WEBUI] Support pagination for session table in JDBC/ODBC Tab

*67cf0433ee [SPARK-29145][SQL] Support sub-queries in join conditions

*484f93e255 [SPARK-29530][SQL] Make SQLConf in SQL parse process thread safe

*9a3dccae72 [SPARK-29379][SQL] SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'

*ef81525a1a [SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2

*178a1f3558 [SPARK-29305][BUILD] Update LICENSE and NOTICE for Hadoop 3.2

*0cf2f48dfe [SPARK-29022][SQL] Fix SparkSQLCLI can not add jars by AddJarCommand

*1d4b2f010b [SPARK-29247][SQL] Redact sensitive information in when construct HiveClientHive.state

*cc852d4eec [SPARK-29015][SQL][TEST-HADOOP3.2] Reset class loader after initializing SessionState for built-in Hive 2.3

*d22768a6be [SPARK-29036][SQL] SparkThriftServer cancel job after execute() thread interrupted

*fe4bee8fd8 [SPARK-29162][SQL] Simplify NOT(IsNull(x)) and NOT(IsNotNull(x))

*54d3f6e7ec [SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation

*9f478a6832 [SPARK-28901][SQL] SparkThriftServer's Cancel SQL Operation show it in JDBC Tab UI

*036fd3903f [SPARK-27637][SHUFFLE][FOLLOW-UP] For nettyBlockTransferService, if IOException occurred while create client, check whether relative executor is alive before retry #24533

*e853f068f6 [SPARK-33526][SQL][FOLLOWUP] Fix flaky test due to timeout and fix docs

*1dd63dccd8 [SPARK-33860][SQL] Make CatalystTypeConverters.convertToCatalyst match special Array value

*bc46d273e0 [SPARK-33840][DOCS] Add spark.sql.files.minPartitionNum to performence tuning doc

*839d6899ad [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field

*5bab27e00b [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver

作者：網(wǎng)易易數(shù)Spark開發(fā)團隊

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市院刁，隨后出現(xiàn)的幾起案子糯钙，更是在濱河造成了極大的恐慌，老刑警劉巖退腥，帶你破解...
沈念sama閱讀 206,013評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件任岸，死亡現(xiàn)場離奇詭異，居然都是意外死亡狡刘，警方通過查閱死者的電腦和手機享潜，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來嗅蔬，“玉大人剑按，你說我怎么就攤上這事±绞酰” “怎么了艺蝴？”我有些...
開封第一講書人閱讀 152,370評論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長瘪板。經(jīng)常有香客問我吴趴，道長，這世上最難降的妖魔是什么侮攀？我笑而不...
開封第一講書人閱讀 55,168評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任锣枝，我火速辦了婚禮，結(jié)果婚禮上兰英，老公的妹妹穿的比我還像新娘撇叁。我一直安慰自己，他們只是感情好畦贸，可當我...
茶點故事閱讀 64,153評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布陨闹。她就那樣靜靜地躺著，像睡著了一般薄坏。火紅的嫁衣襯著肌膚如雪趋厉。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 48,954評論 1贊 283
城市分裂傳說
那天胶坠，我揣著相機與錄音君账，去河邊找鬼。笑死沈善，一個胖子當著我的面吹牛乡数，可吹牛的內(nèi)容都是我干的椭蹄。我是一名探鬼主播，決...
沈念sama閱讀 38,271評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼净赴，長吁一口氣：“原來是場噩夢啊……” “哼绳矩！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起玖翅，我...
開封第一講書人閱讀 36,916評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤翼馆，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后烧栋，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體写妥，經(jīng)...
沈念sama閱讀 43,382評論 1贊 300
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,877評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年审姓，在試婚紗的時候發(fā)現(xiàn)自己被綠了珍特。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 37,989評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡魔吐，死狀恐怖扎筒，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情酬姆，我是刑警寧澤嗜桌，帶...
沈念sama閱讀 33,624評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站辞色，受9級特大地震影響骨宠，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜相满，卻給世界環(huán)境...
茶點故事閱讀 39,209評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一层亿、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧立美，春花似錦匿又、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,199評論 0贊 19
一樁弒父案碌更，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至洞慎，卻和暖如春痛单，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背劲腿。一陣腳步聲響...
開封第一講書人閱讀 31,418評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工旭绒，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人。一個月前我還...
沈念sama閱讀 45,401評論 2贊 352
代替公主和親
正文我出身青樓快压，卻偏偏與公主長得像，于是被迫代替她去往敵國和親垃瞧。傳聞我的和親對象是個殘疾皇子蔫劣，可洞房花燭夜當晚...
茶點故事閱讀 42,700評論 2贊 345

擁抱開源，我們是認真的-網(wǎng)易易數(shù)2020年Apache Spark貢獻總結(jié)

擁抱開源蝉仇，我們是認真的-網(wǎng)易易數(shù)2020年Apache Spark貢獻總結(jié)

推薦閱讀更多精彩內(nèi)容