Leveraging Alluxio with Spark SQL to Speed Up Ad-hoc Analysis

Background

At present, hundreds of TB of data is processed in Momo bigdata cluster every day. However, most of the data will be read/write through disk repeatedly, which is ineffective. In order to speed up data processing and provide better user experience, after some investigation we found that Alluxio may fit our need. Alluxio works by providing a unified memory speed distributed storage system for various jobs. I/O speed in memory is faster than in hard disk. Hot data in Alluxio could server memory speed I/O just like memory cache. So the more frequent data read/written over Alluxio, the greater the benefit will have. In order to better understand the value Alluxio have to our ad-hoc service which uses Spark SQL as executing engine, we designed a series experiments of Alluxio with Spark SQL.

Experiment Design

There are a few designs which aims to take advantage of Alluxio:

Firstly, we use decoupled computer and storage architecture for the reason that mixed deployment will leave a heavy I/O burden to Alluxio, so DataNode is not deployed with Alluxio worker. The Alluxio cluster is decoupled from HDFS storage, it will read data from remote HDFS nodes for the first execution.
Secondly, in order to mock the online environment, we use YARN node label feature to divide an Alluxio cluster from production cluster, which means the Alluxio cluster share the same NameNode and ResourceManager from production cluster and may be affected by the pressure of production cluster.
Thirdly, there is only one copy of data stored in Alluxio, which means it can not guarantee high availability. What's more, persisting data to second tier storage such as HDFS is low efficient and space wasteful. Considering about stability and efficiency, we choose to use Alluxio as a read-only cache in our experiment .

The figure below shows the deployment of Alluxio cluster with production cluster.

Figure 1. Alluxio with Spark SQL Architecture

The experiment environment of Alluxio cluster is the same as production except for no DataNode process. So it will have data transportation cost for the first running. Besides, we use Spark Thrift Server to provide ad-hoc analysis service, then all the SQLs tests are made through Spark Thrift Server.

Environment Preparation

Basically, we followed the official instruction of Running Apache Hive with Alluxio and Running Spark on Alluxio to deploy Allxuio with Spark SQL.

Name	Configuration
Software	Spark 2.2.1, Hadoop 2.6.0(HDFS is 2.6.0 while YARN is 2.8.3), Alluxio 1.6.1, Hive 0.14.0
Hardware	32 core, 192G Mem, 5.4T*12 HDD

Here is the software and hardware environment on each node.

Name	Configuration
Software	Spark 2.2.1, Hadoop 2.6.0(HDFS is 2.6.0 while YARN is 2.8.3), Alluxio 1.6.1, Hive 0.14.0
Hardware	32 core, 192G Mem, 5.4T*12 HDD

And the configuration of Spark Thrift Server is:

/opt/spark-2.2.1-bin-2.6.0/sbin/start-thriftserver.sh   
--master yarn-client 
--name adhoc_STS_Alluxio_test
--driver.memory   15g
--num-executor 132
--executor-memory 10G
--executor-cores 3
--conf spark.yarn.driver.memoryOverhead=768m
--conf spark.yarn.executor.memoryOverhead=1024m
--conf spark.sql.adaptive.enabled=true

Performance Test

1) Test background

In production environment, we provide ad-hoc service by leveraging Spark SQL, which is of high performance and convenience over MR and TEZ.

2)Test case

1) Small data test

First we perform an approximately 5min job, which takes no more than 10 GB data. However, the average time cost is close to the time in production environment, which shows no obvious improvement. After searching for the reason, a blog from Alluxio User List explains that the testing job should be IO bounded because of OS's buffer cache. Spark will also temporarily persist the input data in the test. So when we performed the small data input test job, the running time of Spark on Alluxio has no difference with Spark alone. The data size is too small to fully utilize Alluxio's caching ability.

Figure 2. Detailed Information of Test Job

The content marked by red line shows the job only have approximate 5 GB data input size, which is relatively small and OS's buffer cache can hold the all data. Then we proposed another test.

2) Large data test

SQL NO.	Input data size
SQL1	300G
SQL2	1T
SQL3	1.5T
SQL4	5.5T

To better evaluate the performance of Alluxio, we picked 4 different SQLs from online with data input size ranging form 300GB to 5.5TB. Besides, we designed four test groups of Alluxio, Spark, Yarn and Alluxio on disk.

SQL NO.	Input data size
SQL1	300G
SQL2	1T
SQL3	1.5T
SQL4	5.5T

We executed each SQL for 4 times to eliminate the caching deviation of the first running and calculate the average running time of the latest 3 running times.

Test Group	Comment
Alluxio	Spark on Alluxio on Alluxio cluster
Spark	Spark without Alluxio on Alluxio cluster
Yarn	Spark without Alluxio on production cluster
Alluxio on Disk	Spark on Alluxio with only one HDD tier on Alluxio cluster

Here is the explaination of the test group.

Test Group	Comment
Alluxio	Spark on Alluxio on Alluxio cluster
Spark	Spark without Alluxio on Alluxio cluster
Yarn	Spark without Alluxio on production cluster
Alluxio on Disk	Spark on Alluxio with only one HDD tier on Alluxio cluster

Figure 3. Test Result

Then we get the following chart to better understand figure 3.

Figure 4. Comparision of Test Result

Conclusion

From the experiment we can get some interesting conclusions.

From the large and small data test, it is shown that Alluxio is not suitable for small data jobs when using large memory machines, and it should be used in large data scenario.
In figure 4, the max consumption of time is usually the first running time. In general, when getting data for the first running, reading data through Alluxio may be slower than directly from HDFS. The reason is that data is going from HDFS to Alluxio first and then go into Spark process. Actually, that method depends on SQL characteristics. In SQL 2, SQL 3 and SQL 4, Alluxio group performs better than Spark group.
Reading data from cache is generally faster than from disk. However, by comparing Alluxio and Alluxio on disk group, the speed of reading data from cache is similar to reading from disk in SQL 2 and SQL 3, so the performance improvement depends on the SQL workload.
Generally speaking, Spark on Alluxio can achieve a good acceleration, which is 3x - 5x than Spark on production environment and 1.5x - 3x than Spark without Alluxio.

All in all, Alluxio does have obvious effect to our ad-hoc analysis service and different featured SQL have different effect on Alluxio acceleration.The reason why our test does not achieve up to 10x improved performance which the official website declares may be that our testing SQL is selected from online job, which is more complicated and contains plenty of computing and I/O cost.

What We Have Done

The memory in Alluxio cluster is limited, if all the used tables are loaded into Alluxio, the first memory tier will be full quickly and then spills the excessive data to second tier. It will severely affect Alluxio performance. In order to prevent it form being happended, we developed a whitelist feature to decide which table to be loaded into Alluxio for caching.
As Alluxio has only one copy of data in memory and do not guarantee high availability, so it is used as a read only cache in our scenerio, we developed a feature to read from Alluxio and write directly to HDFS. If there is a query sql and table involved is in whitelist, the scheme of path of the table will be transformed to Alluxio scheme and applications can read data from Alluxio and write to HDFS.
The official ad-hoc usage scenerio of Alluxio is to use Presto and Hive as query enging, this method is being widely used. In this article, we proved that Spark SQL can also be use in ad-hoc with Alluxio. Compared to Presto, Spark has better fault tolerance than Presto and still keeps a good performance. So Alluxio with Spark SQL is another good technical option for ad-hoc service.

Future Work

As Alluxio does speed up our service remarkably, we would like to applying more framework such as hive and Spark MLlib to Alluxio and let it become the uniform data ingestion interface to computing layer. Besides, more efforts on security, stability and job monitoring will be on our way.

References

www.alluxio.org
Alluxio users
How Alluxio is Accelerating Apache Spark Workloads

About MOMO

MOMO Inc (Nasdaq: MOMO) is a leading mobile pan-entertainment social platform in China. It has reached approximately 100 million monthly active users and over 300 million users worldwide. Currently the MOMO big data platform has over 25,000 cores and 80TB memory which support approximately 20,000 jobs every day. The MOMO data infrastructure team works on providing stable and efficient batch and streaming solutions to support services like personalized recommendation, ads and business data analysis demand. By using Alluxio, the efficiency of ad-hoc service has achieved a significant improvement. This blog provides some performance tests to evaluate how much benefit can Alluxio contributes to our ad-hoc service as well as a new scenario of using Spark SQL on Alluxio.

最后編輯于：2018.03.19 19:02:51

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子枣申，更是在濱河造成了極大的恐慌嗓化，老刑警劉巖，帶你破解...
沈念sama閱讀 219,188評論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異，居然都是意外死亡，警方通過查閱死者的電腦和手機吼鱼，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,464評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來绰咽，“玉大人菇肃，你說我怎么就攤上這事∪∧迹” “怎么了琐谤？”我有些...
開封第一講書人閱讀 165,562評論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵戴卜，是天一觀的道長稠屠。經(jīng)常有香客問我，道長市栗，這世上最難降的妖魔是什么旺聚？我笑而不...
開封第一講書人閱讀 58,893評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任织阳，我火速辦了婚禮，結(jié)果婚禮上砰粹，老公的妹妹穿的比我還像新娘唧躲。我一直安慰自己，他們只是感情好碱璃，可當(dāng)我...
茶點故事閱讀 67,917評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布弄痹。她就那樣靜靜地躺著，像睡著了一般嵌器。火紅的嫁衣襯著肌膚如雪肛真。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,708評論 1贊 305
城市分裂傳說
那天爽航，我揣著相機與錄音蚓让，去河邊找鬼。笑死讥珍，一個胖子當(dāng)著我的面吹牛历极，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播串述，決...
沈念sama閱讀 40,430評論 3贊 420
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼执解，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了纲酗？” 一聲冷哼從身側(cè)響起衰腌，我...
開封第一講書人閱讀 39,342評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎觅赊，沒想到半個月后右蕊，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,801評論 1贊 317
?護(hù)林員之死
正文獨居荒郊野嶺守林人離奇死亡吮螺，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,976評論 3贊 337
?白月光啟示錄
正文我和宋清朗相戀三年饶囚，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片鸠补。...
茶點故事閱讀 40,115評論 1贊 351
活死人
序言：一個原本活蹦亂跳的男人離奇死亡萝风，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出紫岩，到底是詐尸還是另有隱情规惰，我是刑警寧澤，帶...
沈念sama閱讀 35,804評論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布泉蝌，位于F島的核電站歇万，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏勋陪。R本人自食惡果不足惜贪磺，卻給世界環(huán)境...
茶點故事閱讀 41,458評論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望诅愚。院中可真熱鬧寒锚，春花似錦、人聲如沸违孝。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,008評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽等浊。三九已至腮郊，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間筹燕，已是汗流浹背轧飞。一陣腳步聲響...
開封第一講書人閱讀 33,135評論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留撒踪，地道東北人过咬。一個月前我還...
沈念sama閱讀 48,365評論 3贊 373
代替公主和親
正文我出身青樓，卻偏偏與公主長得像制妄，于是被迫代替她去往敵國和親掸绞。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 45,055評論 2贊 355