姓名:鄧復元
學號17020120041
轉(zhuǎn)載自:https://blog.csdn.net/NetEaseResearch/article/details/111661473?spm=1000.2115.3001.4128
【嵌牛導讀】:本文介紹了網(wǎng)絡開源的前景和網(wǎng)易易數(shù)年終總結(jié)
【嵌牛鼻子】:網(wǎng)絡開源的新時代
【嵌牛模塊】:開源的背景,現(xiàn)狀以及發(fā)展前景
【嵌牛正文】:
前言
“自研不等于自主可控, 開放才是未來轿衔〕良#”來自網(wǎng)易副總裁汪源的一席話,體現(xiàn)了擁抱開源和構(gòu)建開源生態(tài)方面害驹,網(wǎng)易人的決心和一貫堅持鞭呕。
對企業(yè)而言,四字真言裙秋,有利可圖琅拌。這是擺在我們面前現(xiàn)實且需要正視的目的。對于企業(yè)來講摘刑,“使用”開源可以降低總體擁有成本并提升軟件質(zhì)量进宝,可以提前獲取最前沿的創(chuàng)新技術(shù),降本提效促發(fā)展枷恕;“參與”開源可以提升雇員技術(shù)水平党晋,在技術(shù)社區(qū)建立品牌形象,與技術(shù)大拿建立信任徐块,人才培養(yǎng)選拔兩手抓未玻,“構(gòu)建”開源生態(tài)可推廣技術(shù)理念、構(gòu)建行業(yè)標準和加深上下游行業(yè)合作胡控,技術(shù)帶頭共協(xié)同扳剿。
根據(jù)『紅帽? 2020 年度企業(yè)開源現(xiàn)狀報告』,企業(yè)對于開源的擁抱程度逐漸增強:
越來越多的企業(yè)意識到開源的重要性昼激。有 95% 的 IT 領導者認為庇绽,企業(yè)開源對于他們的企業(yè)基礎架構(gòu)軟件戰(zhàn)略至關重要。
專有軟件的使用正在快速減少橙困。昂貴且不靈活的專有軟件許可瞧掺,導致高昂的資本支出(CapEx)和供應商鎖定。
對技術(shù)人員而言凡傅,也是為自身熱愛的事業(yè)全情投入辟狈,用技術(shù)創(chuàng)新改變世界的最佳場所。
Apache Spark 目前是網(wǎng)易集團內(nèi)部主流大數(shù)據(jù)計算引擎夏跷,日承接PB級數(shù)據(jù)處理哼转,涵蓋離線計算、實時計算和傳統(tǒng)機器學習等方方面面的任務槽华。
為了減少Spark在網(wǎng)易內(nèi)部的維護成本释簿,和促進 Spark 新技術(shù)在網(wǎng)易的快速落地,網(wǎng)易易數(shù)團隊采取了以下三個策略硼莽。
我們的技術(shù)開發(fā)人員都在不同程度上積極參與社區(qū)貢獻庶溶,加深和社區(qū)的合作煮纵,和社區(qū)融為一體。
社區(qū)尋求能夠持續(xù)貢獻的開發(fā)者偏螺,可以建立長期良好的合作關系行疏,并相互給與足夠的信任氓英。這將一定程度上使我們的內(nèi)部需求積極轉(zhuǎn)變?yōu)樾袠I(yè)標準碱璃,社區(qū)最新的技術(shù)也可以實時落地朝墩。
文末所附清單不完全統(tǒng)計了截至2020年底網(wǎng)易人在Apache Spark 的主要貢獻墅诡,約 300 commits。
當然板熊,業(yè)務傾向使然帘饶,相應的技術(shù)配套在各企業(yè)實體或行業(yè)中總是伴隨分歧的甜无。所以柳譬,對于Spark源代碼的改造是不可避免的喳张。出于可維護的目的,我們的策略是將這樣特異性的需求從Spark中獨立出來美澳,形成插件销部,降低與Spark主干的耦合性,輕量化的迭代制跟。以下是幾個插件的介紹:
1. Spark-ranger
Spark-ranger 是權(quán)限控制插件舅桩,為提供Spark計算引擎 SQL 標準的細粒度權(quán)限控制,包括列級別的鑒權(quán)雨膨、行級別的過濾擂涛,及數(shù)據(jù)匿名等功能。在大數(shù)據(jù)數(shù)倉場景下聊记,Spark SQL作為一款高性能的查詢引擎在數(shù)據(jù)安全方面的功能一直是其短板歼指。本項目創(chuàng)立的目的,旨在彌補猛犸產(chǎn)品在數(shù)倉管理功能上最后一塊權(quán)限漏洞甥雕。Spark-ranger 作為猛犸安全組件的一部分,在公司內(nèi)部每天需要為業(yè)務方數(shù)十萬的Spark任務提供鑒權(quán)服務胀茵,同時也在公司外部所有的商業(yè)局點保證著客戶的數(shù)據(jù)安全社露。項目目前已經(jīng)托管給Apache 基金會,作為一個子模塊在?https://submarine.apache.org/?項目中進行維護琼娘。
Spark-ranger開源地址:https://github.com/NetEase/spark-ranger
2. Spark-greenplum
Spark-greenplum 是大數(shù)據(jù)數(shù)倉和PostgreSQL及Greenplum數(shù)據(jù)庫的性能傳輸工具峭弟,提供Apache Spark原生 API 百倍性能的提升。項目創(chuàng)立的目的是為了提升網(wǎng)易猛犸和網(wǎng)易有數(shù)之間數(shù)據(jù)交換的能力脱拼。Spark-greenplum 項目用于網(wǎng)易有數(shù)從網(wǎng)易猛犸大數(shù)據(jù)平臺的取數(shù)環(huán)節(jié)瞒瘸。
Spark-greenplum 開源地址:https://github.com/NetEase/spark-greenplum
3. Spark-alarm
Spark-alarm 是細粒度的 Spark 任務監(jiān)控工具,可以對 Spark 任務進行全面的監(jiān)控熄浓,已經(jīng)自定義關鍵指標的監(jiān)控情臭,并提供豐富的報警手段,如網(wǎng)易哨兵,郵件和EasyOps等俯在。項目的目的是有效的保障各類業(yè)務KPI/SLA任務的安全運行竟秫。spark-alarm 是一個任務級別的SDK,目前提供網(wǎng)易內(nèi)部各業(yè)務方跷乐,埋點在各自的關鍵任務中肥败。
Spark-alarm開源地址:https://github.com/NetEase/spark-alarm
如前面提到的構(gòu)建開源生態(tài)可推廣技術(shù)理念、構(gòu)建行業(yè)標準和加深上下游行業(yè)合作愕提,起到技術(shù)帶頭共協(xié)同的作用馒稍。
在大數(shù)據(jù)領域,我們最初圍繞Google的三篇論文打造了Apache Hadoop生態(tài)浅侨,然后我們有圍繞Hadoop生態(tài)構(gòu)建了活躍的Apache Spark生態(tài)纽谒,現(xiàn)在又有不同層面的產(chǎn)品,如數(shù)據(jù)湖等圍繞該生態(tài)構(gòu)建實現(xiàn)真正的批流一體靜計算仗颈,同時和CNCF的Kubernetes社區(qū)又可以交叉融合實現(xiàn)大數(shù)據(jù)與云計算的深度融合佛舱。我們也基于該生態(tài)之上,從網(wǎng)易及網(wǎng)易合作伙伴的業(yè)態(tài)出發(fā)挨决,打造了Kyuubi生態(tài)请祖。
Kyuubi是高性能大數(shù)據(jù)JDBC通用服務引擎。在大數(shù)據(jù)領域脖祈,Kyuubi以靈活的架構(gòu)和統(tǒng)一的SQL API去適配不同的計算引擎以追求極致的計算性能肆捕,適配不同的資源調(diào)度器以適應存算耦合分離的自由切換,適配不同的計算模型以實現(xiàn)批流一體架構(gòu)盖高, 適配不同的業(yè)務場景以實現(xiàn)一站式的大數(shù)據(jù)應用開發(fā)慎陵。目標是讓用戶能像處理普通數(shù)據(jù)一樣處理大數(shù)據(jù)。
第一喻奥、通用易用的數(shù)據(jù)訪問方式席纽。Kyuubi依托標準化的JDBC接口提供大數(shù)據(jù)場景下便捷易用的數(shù)據(jù)訪問訪問方式,終端用戶無需對底層大數(shù)據(jù)平臺(計算引擎撞蚕、存儲服務润梯、元數(shù)據(jù)管理等)感知即可專注開發(fā)自身業(yè)務系統(tǒng)及挖掘數(shù)據(jù)價值。
第二甥厦、高性能的數(shù)據(jù)查詢能力纺铭。Kyuubi依托Apache Spark及Flink等計算引擎提供高性能的數(shù)據(jù)查詢能力,引擎自身能力每一次提升都可以幫助 Kyuubi服務的性能產(chǎn)生質(zhì)的飛躍刀疙,在此基礎之上舶赔,Kyuubi 同時提供數(shù)據(jù)緩存、查詢動態(tài)優(yōu)化等能力進一步提升性能谦秧。一方面竟纳,對于訪問頻率高的查詢通過設置緩存提升查詢效率撵溃;另一方面,根據(jù)用戶訪問數(shù)據(jù)量的規(guī)模動態(tài)優(yōu)化查詢計劃蚁袭,在支持海量結(jié)果流式返回的同時保證性能優(yōu)化征懈。
第三、完備的企業(yè)級特性支持揩悄。依托 Kyuubi 自身架構(gòu)的特點卖哎,提供認證、鑒權(quán)服務删性,保障數(shù)據(jù)安全性亏娜;提供健壯的高可用服務,保障服務的可用性蹬挺;提供多租戶資源資源隔離的能力维贺,提供端到端的計算資源及數(shù)據(jù)安全隔離;提供兩級彈性資源管理巴帮,在有效提升資源利用率的基礎上合理控制成本溯泣,并且有效的覆蓋交互式、批處理和點查榕茧、全表Scan等各種場景的性能及響應要求垃沦。
第四,豐富的生態(tài)支持與構(gòu)建用押。一個優(yōu)秀的開源產(chǎn)品離不開優(yōu)秀的開源生態(tài)支持肢簿。Kyuubi 在擁抱Spark等頂級開源生態(tài)的同時,一方面有效的利用這些項目本身生態(tài)的開放性蜻拨,可以快速使得Kyuubi對其既有生態(tài)及新特性新生態(tài)的拓展池充,如云原生支持、數(shù)據(jù)湖(Data Lake/Lake House)的支持缎讼;另一方面收夸,Kyuubi也積極構(gòu)建和完善自己的生態(tài),彌補各個環(huán)節(jié)的空缺血崭,如?https://github.com/netease/spark-ranger項目可完善大數(shù)據(jù)鏈路中權(quán)限控制短板卧惜,https://github.com/netease/spark-greenplum項目可解決Spark與傳統(tǒng)數(shù)據(jù)庫PostgreSQL和MPP數(shù)據(jù)庫Greenplum數(shù)據(jù)交換的性能問題等等。
Kyuubi開源地址:https://github.com/netease/kyuubi
2020年功氨,不平凡的一年。來自大自然的威脅手幢,讓我們深刻地認識到全人類開放合作的重要性捷凄。
一個開源社區(qū)的本質(zhì)是開發(fā)者。擁抱開源围来,構(gòu)建開源生態(tài)跺涤,符合網(wǎng)易的使命愿景:網(wǎng)聚人的力量匈睁,以科技創(chuàng)新締造美好生活
參與開源,當然除了上面所提到的符合企業(yè)自身利益桶错,同時也是因為熱愛:為熱愛全心投入航唆。
附:截至2020年底網(wǎng)易人在Apache Spark 的主要貢獻
*ae1d05927a [SPARK-33892][SQL] Display char/varchar in DESC and SHOW CREATE TABLE
*2287f56a3e (origin/master, origin/HEAD, master) [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns
*a3dd8dacee [SPARK-33877][SQL] SQL reference documents for INSERT w/ a column list
*6da5cdf1db [SPARK-33876][SQL] Add length-check for reading char/varchar from tables w/ a external location
*f5fd10b1bc (SparkSPARK-33877) [SPARK-33834][SQL] Verify ALTER TABLE CHANGE COLUMN with Char and Varchar
*dd44ba5460 [SPARK-32976][SQL][FOLLOWUP] SET and RESTORE hive.exec.dynamic.partition.mode for HiveSQLInsertTestSuite to avoid flakiness
*c17c76dd16 [SPARK-33599][SQL][FOLLOWUP] FIX Github Action with unidoc
*728a1298af [SPARK-33806][SQL] limit partition num to 1 when distributing by foldable expressions
*205d8e40bc [SPARK-32991][SQL] [FOLLOWUP] Reset command relies on session initials first
*4d47ac4b4b [SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness
*31e0baca30 [SPARK-33740][SQL] hadoop configs in hive-site.xml can overrides pre-existing hadoop ones
*c88eddac3b [SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for CHAR VARCHAR types
*da72b87374 [SPARK-33641][SQL] Invalidate new char/varchar types in public APIs that produce incorrect results
*2da72593c1 [SPARK-32976][SQL] Support column list in INSERT statement
*cdd8e51742 [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql
*4335af075a [MINOR][DOC] spark.executor.memoryOverhead is not cluster-mode only
*036c11b0d4 [SPARK-33397][YARN][DOC] Fix generating md to html for available-patterns-for-shs-custom-executor-log-url
*82d500a05c [SPARK-33193][SQL][TEST] Hive ThriftServer JDBC Database MetaData API Behavior Auditing
*e21bb710e5 [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET
*dcb0820433 [SPARK-32785][SQL][DOCS][FOLLOWUP] Update migaration guide for incomplete interval literals
*2507301705 [SPARK-33159][SQL] Use hive-service-rpc as dependency instead of inlining the generated code
*17d309dfac [SPARK-32963][SQL] empty string should be consistent for schema name in SparkGetSchemasOperation
*e2a740147c [SPARK-32874][SQL][FOLLOWUP][TEST-HIVE1.2][TEST-HADOOP2.7] Fix spark-master-test-sbt-hadoop-2.7-hive-1.2
*9e9d4b6994 [SPARK-32905][CORE][YARN] ApplicationMaster fails to receive UpdateDelegationTokens message
*316242b768 [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server
*5669b212ec [SPARK-32840][SQL] Invalid interval value can happen to be just adhesive with the unit
*9ab8a2c36d [SPARK-32826][SQL] Set the right column size for the null type in SparkGetColumnsOperation
*de44e9cfa0 [SPARK-32785][SQL] Interval with dangling parts should not results null
*1fba286407 [SPARK-32781][SQL] Non-ASCII characters are mistakenly omitted in the middle of intervals
*6dacba7fa0 [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation
*0626901bcb [SPARK-32729][SQL][DOCS] Add missing since version for math functions
*f14f3742e0 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly
*1f3bb51757 [SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F
*c26a97637f Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv…
*1b6f482adb [SPARK-32492][SQL][FOLLOWUP][TEST-MAVEN] Fix jenkins maven jobs
*7f5326c082 [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools
* 3deb59d5c2 [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
* f4800406a4 [SPARK-32406][SQL][FOLLOWUP] Make RESET fail against static and core configs
* 510a1656e6 [SPARK-32412][SQL] Unify error handling for spark thrift server operations
* d315ebf3a7 [SPARK-32424][SQL] Fix silent data change for timestamp parsing if overflow happens
* d3596c04b0 [SPARK-32406][SQL] Make RESET syntax support single configuration reset
* b151194299 [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server
* 29b7eaa438 [MINOR][SQL] Fix warning message for ThriftCLIService.GetCrossReference and GetPrimaryKeys
* efa70b8755 [SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation
* bdeb626c5a [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE
* 4609f1fdab [SPARK-32207][SQL] Support 'F'-suffixed Float Literals
* 59a70879c0 [SPARK-32145][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message
* 9f8e15bb2e [SPARK-32034][SQL] Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown
* 93529a8536 [SPARK-31957][SQL] Cleanup hive scratch dir for the developer api startWithContext
* abc8ccc37b [SPARK-31926][SQL][TESTS][FOLLOWUP][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber
* a0187cd6b5 [SPARK-31926][SQL][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber
* 22dda6e18e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing
* 6a424b93e5 [SPARK-31830][SQL] Consistent error handling for datetime formatting and parsing functions
* 02f32cfae4 [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber
* fc6af9d900 [SPARK-31867][SQL][FOLLOWUP] Check result differences for datetime formatting
* 9d5b5d0a58 [SPARK-31879][SQL][TEST-JAVA11] Make week-based pattern invalid for formatting too
* afcc14c6d2 [SPARK-31896][SQL] Handle am-pm timestamp parsing when hour is missing
* afe95bd9ad [SPARK-31892][SQL] Disable week-based date filed for parsing
* c59f51bcc2 [SPARK-31879][SQL] Using GB as default Locale for datetime formatters
* 547c5bf552 [SPARK-31867][SQL] Disable year type datetime patterns which are longer than 10
* fe1da296da [SPARK-31833][SQL][TEST-HIVE1.2] Set HiveThriftServer2 with actual port while configured 0
* 311fe6a880 [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite
* 695cb617d4 (t1) [SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M/L/E/u/Q/q'
* 0df8dd6073 [SPARK-30352][SQL] DataSourceV2: Add CURRENT_CATALOG function
*7e2ed40d58 [SPARK-31759][DEPLOY] Support configurable max number of rotate logs for spark daemons
*1f29f1ba58 [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
*1d66085a93 [SPARK-31289][TEST][TEST-HIVE1.2] Eliminate org.apache.spark.sql.hive.thriftserver.CliSuite flakiness
*503faa24d3 [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard
*ce714d8189 [SPARK-31678][SQL] Print error stack trace for Spark SQL CLI when error occurs
*b31ae7bb0b [SPARK-31615][SQL] Pretty string output for sql method of RuntimeReplaceable expressions
*bd6b53cc0b [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry
*9241f8282f [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations
*ea525fe8c0 [SPARK-31597][SQL] extracting day from intervals should be interval.days + days in interval.microsecond
*295d866969 [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc
*54996be4d2 [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations
*beec8d535f [SPARK-31586][SQL] Replace expression TimeSub(l, r) with TimeAdd(l -r)
*5ba467ca1d [SPARK-31550][SQL][DOCS] Set nondeterministic configurations with general meanings in sql configuration doc
*ebc8fa50d0 [SPARK-31527][SQL] date add/subtract interval only allow those day precision in ansi mode
*7959808e96 [SPARK-31564][TESTS] Fix flaky AllExecutionsPageSuite for checking 1970
*f92652d0b5 [SPARK-31528][SQL] Remove millennium, century, decade from trunc/date_trunc fucntions
* caf3ab8411 [SPARK-31552][SQL] Fix ClassCastException in ScalaReflection arrayClassFor
* 8424f55229 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession
* 8dc2c0247b [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
* 3b5792114a [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info
* 37d2e037ed [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
* 2c2062ea7c [SPARK-31498][SQL][DOCS] Dump public static sql configurations through doc generation
* 1985437110 [SPARK-31474][SQL] Consistency between dayofweek/dow in extract exprsession and dayofweek function
* 77cb7cde0d [SPARK-31469][SQL][TESTS][FOLLOWUP] Remove unsupported fields from ExtractBenchmark
* 697083c051 [SPARK-31469][SQL] Make extract interval field ANSI compliance
* 31b907748d [SPARK-31414][SQL][DOCS][FOLLOWUP] Update default datetime pattern for json/csv APIs documentations
* d65f534c5a [SPARK-31414][SQL] Fix performance regression with new TimestampFormatter for json and csv time parsing
* a454510917 [SPARK-31392][SQL] Support CalendarInterval to be reflect to CalendarntervalType
* 3c94a7c8f5 [SPARK-29311][SQL][FOLLOWUP] Add migration guide for extracting second from datetimes
* 1ce584f6b7 [SPARK-31321][SQL] Remove SaveMode check in v2 FileWriteBuilder
* f376d24ea1 [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
* 5945d46c11 [SPARK-31225][SQL] Override sql method of OuterReference
* 8be16907c2 [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
* 44bd36ad7b [SPARK-31234][SQL] ResetCommand should reset config to sc.conf only
* b024a8a69e [MINOR][DOCS] Fix some links for python api doc
* 336621e277 [SPARK-31258][BUILD] Pin the avro version in SBT
* f81f11822c [SPARK-31189][R][DOCS][FOLLOWUP] Replace Datetime pattern links in R doc
* 88ae6c4481 [SPARK-31189][SQL][DOCS] Fix errors and missing parts for datetime pattern document
* 3d695954e5 [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text
* 57fcc49306 [SPARK-31176][SQL] Remove support for 'e'/'c' as datetime pattern charactar
* f1d27cdd91 [SPARK-31119][SQL] Add interval value support for extract expression as extract source
* 5bc0d76591 [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
* 0946a9514f [SPARK-31150][SQL] Parsing seconds fraction with variable length for timestamp
* fbc9dc7e9d [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark
* 7b4b29e8d9 [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled
* 18f2730874 [SPARK-31066][SQL][TEST-HIVE1.2] Disable useless and uncleaned hive SessionState initialization parts
* 2b46662bd0 [SPARK-31111][SQL][TESTS] Fix interval output issue in ExtractBenchmark
* 3bd6ebff81 [SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces
* f45ae7f2c5 [SPARK-31038][SQL] Add checkValue for spark.sql.session.timeZone
* 3edab6cc1d [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit
* 1fac06c430 Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server"
* 1383bd459a [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url
* 2d2706cb86 [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite
* a6026c830a [MINOR][BUILD] Fix make-distribution.sh to show usage without 'echo' cmd
* 761209c1f2 [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations
* 46019b6e6c [MINOR][DOCS] Fix fabric8 version in documentation
* 0353cbf092 [MINOR][DOC] Fix 2 style issues in running-on-kubernetes doc
* 58b9ca1e6f [SPARK-30592][SQL][FOLLOWUP] Add some round-trip test cases
* 3228d723a4 [SPARK-30603][SQL] Move RESERVED_PROPERTIES from SupportsNamespaces and TableCatalog to CatalogV2Util
*8e280cebf2 [SPARK-30592][SQL] Interval support for csv and json funtions
*f2d71f5838 [SPARK-30591][SQL] Remove the nonstandard SET OWNER syntax for namespaces
*af705421db [SPARK-30593][SQL] Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and no round trip
*730388b369 [SPARK-30547][SQL][FOLLOWUP] Update since anotation for CalendarInterval class
*0388b7a3ec [SPARK-30568][SQL] Invalidate interval type as a field table schema
*24efa43826 [SPARK-30019][SQL] Add the owner property to v2 table
*4806cc5bd1 [SPARK-30547][SQL] Add unstable annotation to the CalendarInterval class
*17857f9b8b [SPARK-30551][SQL] Disable comparison for interval type
*82f25f5855 [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties
*bcf07cbf5f [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
*c37312342e [SPARK-30183][SQL] Disallow to specify reserved properties in CREATE/ALTER NAMESPACE syntax
*8c121b0827 [SPARK-30431][SQL] Update SqlBase.g4 to create commentSpec pattern like locationSpec
*c49388a484 [SPARK-30214][SQL] A new framework to resolve v2 commands
*e04309cb1f [SPARK-30341][SQL] Overflow check for interval arithmetic operations
*f0bf2eb006 [SPARK-30356][SQL] Codegen support for the function str_to_map
*da65a955ed [SPARK-30266][SQL] Avoid match error and int overflow in ApproximatePercentile and Percentile
*12249fcdc7 [SPARK-30301][SQL] Fix wrong results when datetimes as fields of complex types
*d38f816748 [MINOR][SQL][DOC] Fix some format issues in Dataset API Doc
*cc7f1eb874 [SPARK-29774][SQL][FOLLOWUP] Add a migration guide for date_add and date_sub
*bf7215c510 [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
*d3ec8b1735 [SPARK-30066][SQL] Support columnar execution on interval types
*8f0eb7dc86 [SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal
*24c4ce1e64 [SPARK-28351][SQL][FOLLOWUP] Remove 'DELETE FROM' from unsupportedHiveNativeCommands
*e88d74052b [SPARK-30147][SQL] Trim the string when cast string type to booleans
*35bab33984 [SPARK-30121][BUILD] Fix memory usage in sbt build script
*b9cae37750 [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
*332e252a14 [SPARK-29425][SQL] The ownership of a database should be respected
*65552a81d1 [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
*39291cff95 [SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset
*4e073f3c50 [SPARK-30047][SQL] Support interval types in UnsafeRow
*4fd585d2c5 [SPARK-30008][SQL] The dataType of collect_list/collect_set aggs should be ArrayType(_, false)
* ed0c33fdd4 [SPARK-30026][SQL] Whitespaces can be identified as delimiters in interval string
* 8b0121bea8 [MINOR][DOC] Fix the CalendarIntervalType description
* de21f28f8a [SPARK-29986][SQL] casting string to date/timestamp/interval should trim all whitespaces
* 5cf475d288 [SPARK-30000][SQL] Trim the string when cast string type to decimals
* 2dd6807e42 [SPARK-28023][SQL] Add trim logic in UTF8String's toInt/toLong to make it consistent with other string-numeric casting
* d555f8fcc9 [SPARK-29961][SQL][FOLLOWUP] Remove useless test for VectorUDT
* 7a70670345 [SPARK-29961][SQL] Implement builtin function - typeof
* 79ed4ae2db [SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point
* ea010a2bc2 [SPARK-29873][SQL][TEST][FOLLOWUP] set operations should not escape when regen golden file with --SET --import both specified
* ae6b711b26 [SPARK-29941][SQL] Add ansi type aliases for char and decimal
* 50f6d930da [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
* 5cebe587c7 [SPARK-29783][SQL] Support SQL Standard/ISO_8601 output style for interval type
*0c68578fa9 [SPARK-29888][SQL] new interval string parser shall handle numeric with only fractional part
*15a72f3755 [SPARK-29287][CORE] Add LaunchedExecutor message to tell driver which executor is ready for making offers
*f926809a1f [SPARK-29390][SQL] Add the justify_days(), justify_hours() and justif_interval() functions
* d99398e9f5 [SPARK-29855][SQL] typed literals with negative sign with proper result or exception
* d06a9cc4bd [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values
* e026412d9c [SPARK-29679][SQL] Make interval type comparable and orderable
* e7f7990bc3 [SPARK-29688][SQL] Support average for interval type values
* 0a03839366 [SPARK-29787][SQL] Move methods add/subtract/negate from CalendarInterval to IntervalUtils
* 9562b26914 [SPARK-29757][SQL] Move calendar interval constants together
* 3437862975 [SPARK-29387][SQL][FOLLOWUP] Fix issues of the multiply and divide for intervals
* 4615769736 [SPARK-29603][YARN] Support application priority for YARN priority scheduling
* 44b8fbcc58 [SPARK-29663][SQL] Support sum with interval type values
* 8cf76f8d61 [SPARK-29285][SHUFFLE] Temporary shuffle files should be able to handle disk failures
* 5ba17d09ac [SPARK-29722][SQL] Non reversed keywords should be able to be used in high order functions
* dc987f0c8b [SPARK-29653][SQL] Fix MICROS_PER_MONTH in IntervalUtils
* 8e667db5d8 [SPARK-29629][SQL] Support typed integer literal expression
* 9a46702791 [SPARK-29554][SQL] Add `version` SQL function
* 0cf4f07c66 [SPARK-29545][SQL] Add support for bit_xor aggregate function
*5b4d9170ed [SPARK-27879][SQL] Add support for bit_and and bit_or aggregates
*ef4c298cc9 [SPARK-29405][SQL] Alter table / Insert statements should not change a table's ownership
*4b902d3b45 [SPARK-29491][SQL] Add bit_count function support
* 6d4cc7b855 [SPARK-27880][SQL] Add bool_and for every and bool_or for any as function aliases
* 02c5b4f763 [SPARK-28947][K8S] Status logging not happens at an interval for liveness
* f4c73b7c68 [SPARK-27301][DSTREAM] Shorten the FileSystem cached life cycle to the cleanup method inner scope
* ac9c0536bc [SPARK-26794][SQL] SparkSession enableHiveSupport does not point to hive but in-memory while the SparkContext exists
* f8346d2fc0 [SPARK-25174][YARN] Limit the size of diagnostic message for am to unregister itself from rm
* 4a2b15f0af [SPARK-24241][SUBMIT] Do not fail fast when dynamic resource allocation enabled with 0 executor
* a7755fd8ce [SPARK-23639][SQL] Obtain token before init metastore client in SparkSQL CLI
* 189f56f3dc [SPARK-23383][BUILD][MINOR] Make a distribution should exit with usage while detecting wrong options
* eefec93d19 [SPARK-23295][BUILD][MINOR] Exclude Waring message when generating versions in make-distribution.sh
* dd52681bf5 [SPARK-23253][CORE][SHUFFLE] Only write shuffle temporary index file when there is not an existing one
* 793841c6b8 [SPARK-21771][SQL] remove useless hive client in SparkSQLEnv
* 9fa703e893 [SPARK-22950][SQL] Handle ChildFirstURLClassLoader's parent
* 28ab5bf597 [SPARK-22487][SQL][HIVE] Remove the unused HIVE_EXECUTION_VERSION property
* c755b0d910 [SPARK-22463][YARN][SQL][HIVE] add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive
* ee571d79e5 [SPARK-22466][SPARK SUBMIT] export SPARK_CONF_DIR while conf is default
* 99e32f8ba5 [SPARK-22224][SQL] Override toString of KeyValue/Relational-GroupedDataset
* 581200af71 [SPARK-21428][SQL][FOLLOWUP] CliSessionState should point to the actual metastore not a dummy one
* b83b502c41 [SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for reusing CliSessionState
* 2387f1e316 [SPARK-21675][WEBUI] Add a navigation bar at the bottom of the Details for Stage Page
* e9d268f63e [SPARK-20096][SPARK SUBMIT][MINOR] Expose the right queue name not null if set by --conf or configure file
* 7363dde634 [SPARK-19626][YARN] Using the correct config to set credentials update time
* e33053ee00 [SPARK-11583] [CORE] MapStatus Using RoaringBitmap More Properly
* 7466031632 [SPARK-32106][SQL] Implement script transform in sql/core
* 0603913c66 [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
* 25c6cc25f7 [SPARK-26341][WEBUI] Expose executor memory metrics at the stage level, in the Stages tab
* 5f9a7fea06 [SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow
* d7f4b2ad50 [SPARK-28704][SQL][TEST] Add back Skiped HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+
* 47326ac1c6 [SPARK-28704][SQL][TEST] Add back Skiped HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+
* dd32f45d20 [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTransferred` use default value
* 34f5e7ce77 [SPARK-33302][SQL] Push down filters through Expand
* 0c943cd2fb [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
* e43cd8ccef [SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive
* a1629b4a57 [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
* f8277d3aa3 [SPARK-32069][CORE][SQL] Improve error message on reading unexpected directory
* ddc7012b3d [SPARK-32243][SQL] HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments number error
* 0b5a379c1f [SPARK-33023][CORE] Judge path of Windows need add condition `Utils.isWindows`
* c336ddfdb8 [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
* 5e6173ebef [SPARK-31670][SQL] Trim unnecessary Struct field alias in Aggregate/GroupingSets
* 55ce49ed28 [SPARK-32400][SQL][TEST][FOLLOWUP][TEST-MAVEN] Fix resource loading error in HiveScripTransformationSuite
* 9808c15eec [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value
* c75a82794f [SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column
* 6dae11d034 [SPARK-32607][SQL] Script Transformation ROW FORMAT DELIMITED `TOK_TABLEROWFORMATLINES` only support '\n'
*03e2de99ab [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
*643cd876e4 [SPARK-32352][SQL] Partially push down support data filter if it mixed in partition filters
*4cf8c1d07d [SPARK-32400][SQL] Improve test coverage of HiveScriptTransformationExec
*d251443a02 [SPARK-32403][SQL] Refactor current ScriptTransformationExec
*5521afbd22 [SPARK-32220][SQL][FOLLOW-UP] SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
*6d499647b3 [SPARK-32105][SQL] Refactor current ScriptTransformationExec code
*09789ff725 [SPARK-31226][CORE][TESTS] SizeBasedCoalesce logic will lose partition
*560fe1f54c [SPARK-32220][SQL] SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
*15fb5d7677 [SPARK-28169][SQL] Convert scan predicate condition to CNF
*0d9faf602e [SPARK-31655][BUILD] Upgrade snappy-java to 1.1.7.5
*6bc8d84130 [SPARK-29492][SQL] Reset HiveSession's SessionState conf's ClassLoader when sync mode
*246c398d59 [SPARK-30435][DOC] Update doc of Supported Hive Features
*3eade744f8 [SPARK-29800][SQL] Rewrite non-correlated EXISTS subquery use ScalaSubquery to optimize perf
*da27f91560 [SPARK-29957][TEST] Reset MiniKDC's default enctypes to fit jdk8/jdk11
*6146dc4562 [SPARK-29874][SQL] Optimize Dataset.isEmpty()
*eb79af8dae [SPARK-29145][SQL][FOLLOW-UP] Move tests from`SubquerySuite`to`subquery/in-subquery/in-joins.sql`
*e524a3a223 [SPARK-29742][BUILD] Update checkstyle plugin's check dir scope
*d6e33dc377 [SPARK-29599][WEBUI] Support pagination for session table in JDBC/ODBC Tab
*67cf0433ee [SPARK-29145][SQL] Support sub-queries in join conditions
*484f93e255 [SPARK-29530][SQL] Make SQLConf in SQL parse process thread safe
*9a3dccae72 [SPARK-29379][SQL] SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'
*ef81525a1a [SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2
*178a1f3558 [SPARK-29305][BUILD] Update LICENSE and NOTICE for Hadoop 3.2
*0cf2f48dfe [SPARK-29022][SQL] Fix SparkSQLCLI can not add jars by AddJarCommand
*1d4b2f010b [SPARK-29247][SQL] Redact sensitive information in when construct HiveClientHive.state
*cc852d4eec [SPARK-29015][SQL][TEST-HADOOP3.2] Reset class loader after initializing SessionState for built-in Hive 2.3
*d22768a6be [SPARK-29036][SQL] SparkThriftServer cancel job after execute() thread interrupted
*fe4bee8fd8 [SPARK-29162][SQL] Simplify NOT(IsNull(x)) and NOT(IsNotNull(x))
*54d3f6e7ec [SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation
*9f478a6832 [SPARK-28901][SQL] SparkThriftServer's Cancel SQL Operation show it in JDBC Tab UI
*036fd3903f [SPARK-27637][SHUFFLE][FOLLOW-UP] For nettyBlockTransferService, if IOException occurred while create client, check whether relative executor is alive before retry #24533
*e853f068f6 [SPARK-33526][SQL][FOLLOWUP] Fix flaky test due to timeout and fix docs
*1dd63dccd8 [SPARK-33860][SQL] Make CatalystTypeConverters.convertToCatalyst match special Array value
*bc46d273e0 [SPARK-33840][DOCS] Add spark.sql.files.minPartitionNum to performence tuning doc
*839d6899ad [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field
*5bab27e00b [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver
作者:網(wǎng)易易數(shù)Spark開發(fā)團隊