用戶行為日志 信令數(shù)據(jù) cell mapWithState DStream 整合RDD == transform 數(shù)據(jù)一:日志信息 DStream domain,traff...
Spark Streaming 基于Spark之上的流處理 流:source ==> compute ==> store 離線是特殊的流 letting you write ...
Function functions.scala hobbies.txt alice jogging,Coding,cooking 3 lina travel,danc...
External Data Source API 外部數(shù)據(jù)源 MapReduce Hive Spark 加載數(shù)據(jù) 格式:json、parquet素挽、text、jdbc........
DataFrame python pandas R RDD MapReduce DataFrame vs Dataset(1.6) DS: Java Scala DF: 4 ...
1.核心概念 broker: 進程 producer: 生產(chǎn)者 consumer: 消費者 topic: 主題 partitions: 分區(qū) (副本數(shù)) consumergr...
Spark SQL IOE SQL:schema + file select ... from xxx where..... SQL on Hadoop Hive Impal...
下載地址: Zookeeper: http://mirror.bit.edu.cn/apache/zookeeper/current/ Scala: http://www.s...
Kafka: 消息中間件 -->分布式流式平臺 MQ Redis Kafka Flume 生產(chǎn)者 source Broker channel 消費者 sink 正常部...
collect collect countByKey countByValue collectAsMap groupByKey vs reduceByKey val rdd=...
Spark on YARN 將spark作業(yè)提交到y(tǒng)arn上去執(zhí)行 spark僅僅作業(yè)一個客戶端 ./spark-submit \ --class org.apache.sp...
Application a driver program + executors SparkContext = application spark-shell ? appli...
x.y.z 1.6.1 2.3.1 2.2.2 RDD transformation: lazy map filter union flatMap mapPartition ...
Hadoop的HDFS HA抖苦、Yarn HA集群部署 1.HDFS NN SNN(secondary) 熱備 NN(active) 掛了 NN(standby)--》acti...
Hive高級第二部分: *****Hive:復雜數(shù)據(jù)類型毁菱、JDBC編程ZK: Compression壓縮比解壓速度1G的沒壓縮數(shù)據(jù):1G的gzip壓縮數(shù)據(jù):codec:我...
ZK 1) 高可用: HDFS/HBase/Spark HA2) API:ZK/Curator開發(fā):Java/Scala操作ZKKafka:offset可以存儲在ZK =...
python官網(wǎng)給出的編程規(guī)范 1.Use 4-space indentation, and no tabs. 2.Wrap lines so that they don’t...
anaconda3下載地址 官網(wǎng):https://www.anaconda.com/download/ 百度云鏈接:https://pan.baidu.com/s/17jHe...
.../page_views/201808082008 .... .../page_views/201808082009 .... ./flume-ng agent \ --...