項(xiàng)目git:https://github.com/lqkweb/sqlflow
SQLflow (python3+)
Sqlflow based on python development, support to Spark, Flink, etc as the underlying distributed computing engine, through a set of unified configuration file to complete the batch, flow calculation, the Rest service development.
2019-01-22 更新界面,近期會(huì)規(guī)范一下代碼,寫(xiě)一些注釋并加入讀寫(xiě)功能
主頁(yè):
結(jié)果頁(yè)面:
SQLflow
SQLflow 基于python開(kāi)發(fā), 支持通過(guò)寫(xiě)sql的方式操作分布式集群, 數(shù)據(jù)處理, 機(jī)器學(xué)習(xí)陌选、深度學(xué)習(xí)模型訓(xùn)練, 模型部署, 分布式爬蟲(chóng), 數(shù)據(jù)可視化等裳食。
Build
python3.6
git clone https://github.com/lqkweb/sqlflow.git
pip install -r requirements.txt
python manage.py
主頁(yè):http://127.0.0.1:5000
腳本頁(yè)面:http://127.0.0.1:5000/script
單sql頁(yè)面:http://127.0.0.1:5000/sql
【注意:1题造、下載apache spark文件配置manage.py中的SPARK_HOME路徑业簿。2剩彬、data.csv是放到sqlflow/data目錄中】
Usage
在腳本執(zhí)行頁(yè)面:http://127.0.0.1:5000/script 輸入 select * from A limit 3; 或者 select * from A limit 3 as B; 生成臨時(shí)表A或者B
生成臨時(shí)表A數(shù)據(jù):
select * from A limit 3;
生成臨時(shí)表B數(shù)據(jù):
select * from A limit 3 as B;
打開(kāi)單sql執(zhí)行頁(yè)面:http://127.0.0.1:5000/sql, 直接就可以用spark sql任意語(yǔ)法操作數(shù)據(jù)表A和數(shù)據(jù)表B了:
desc A
select * from A limit 2
select * from B limit 2
[注] "as B" 相當(dāng)于創(chuàng)建了一個(gè) B 臨時(shí)表。
一個(gè)簡(jiǎn)單的sql操作spark集群的Demo,是不是很簡(jiǎn)單。
[附] sparksql doc: https://spark.apache.org/docs/latest/api/sql/index.html