- Application
User program built on Spark Consists of a driver program and executors on the cluster
構建在spark上的用戶應用程序(eg.idea上的scala object)
由在集群上的一個driver program和多個executors所組成
- Application jar
A jar containing the user's Spark application In some cases users will want to create an "uber jar"-- --containing their application along with its dependencies The user's jar should never include Hadoop or Spark libraries, -- --however, these will be added at runtime.
一個jar包含了用戶的Spark應用程序
- Driver program
The process running the main() function of the application and creating the SparkContext
運行應用main()方法的進程霍转,并且能創(chuàng)建SparkContext
所以在main方法里創(chuàng)建SparkContext的程序就是driver program
- Cluster manager
An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
在集群上申請資源的外部的服務
好處:代碼開發(fā)過程中不用關注代碼運行在哪里
運行各種模式下鸥跟,其代碼都是相同的
- Deploy mode
Distinguishes where the driver process runs. In "cluster" mode, the framework launches the driver inside of the cluster In "client" mode, the submitter launches the driver outside of the cluster
分辨driver process運行在哪里鞍历, 在集群模式, 框架在集群內啟動框架沪铭,
在client模式, submitter在cluster外面啟動driver
- Worker node
Any node that can run application code in the cluster
在集群上能夠運行應用的node被稱為Worker node
- Executor
A process launched for an application on a worker node that runs tasks and keeps data in memory or disk storage across them Each application has its own executors
啟動一個服務于worker node(eg. node manager)上的進程,
運行在container里運行tasks(map或filter)狐血,
并且可以將數(shù)據(jù)放于內存中或者是跨節(jié)點的磁盤上。
每個應用程序有其獨立的executors
- Task
A unit of work that will be sent to one executor
發(fā)送給executor的工作單元
- Job
A parallel computation consisting of multiple tasks , that gets spawned in response to a Spark action (e.g. save, collect)
由多個task組成的一個并行計算, 一個action觸發(fā)一個job
簡單解釋, 調用一個action(如collection算子)就是一個job
- Stage
Each job gets divided into smaller sets of tasks called stages that depend on each other similar to the map and reduce stages in MapReduce
1個job會被分成多個stage
遇到一個shuffle就產(chǎn)生新的stage