Apache Spark is a fast and general engine for large-scale data processing.
1信柿、Speed 高運行速度
? ? Run programs up to 100x faster than Hadoop MapReduce in memory,or 10x faster on disk.
2析二、Ease of use 易用性
? ? Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala and Python shells.
3、Generality 一棧式,通用性
? ? Spark powers a stack of high-level tools including Spark SQL,MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.
4氢烘、Runs Everywhere 各處運行
? ? Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, S3.