GeoSpark是基于Spark分布式的地理信息計(jì)算引擎蜈亩,相比于傳統(tǒng)的ArcGIS徊件,GeoSpark可以提供更好性能的空間分析、查詢服務(wù)。
準(zhǔn)備工作
- Ubuntu18.04
- IDEA
- GeoSpark支持Java豹爹、Scala兩種,本次開(kāi)發(fā)語(yǔ)言選擇Java趟径。
JDK8安裝
下載JDK8:https://download.oracle.com/otn/java/jdk/8u211-b12/478a62b7d4e34b78b671c754eaaf38ab/jdk-8u211-linux-x64.tar.gz (注:現(xiàn)在需要注冊(cè)O(shè)racle賬戶才允許下載)
-
下載解壓后,復(fù)制到
/opt
下面,然后在~/.bashrc
下面添加環(huán)境變量export JAVA_HOME=/opt/jdk1.8.0_172 #這里改成你的jdk目錄名 export PATH=${JAVA_HOME}/bin:$PATH export CLASSPAHT=.:/opt/jdk1.8.0_172/lib:/opt/jdk1.8.0_172/lib/dt.jar:/opt/jdk1.8.0_172/lib/tools.jar #在JDK8后應(yīng)該是不需要在配置CLASSPATH,這里為了保險(xiǎn)起見(jiàn)棘伴,還是加上了
Scala配置
下載Scala2.12.8:https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz
-
下載解壓后,復(fù)制到
/opt
下面,然后在~/.bashrc
下面添加環(huán)境變量export SCALA_HOME=/opt/scala-2.12.8 export PATH=${SCALA_HOME}/bin:$PATH
然后執(zhí)行
source ~/.bashrc
-
執(zhí)行
scala -version
,如果出現(xiàn)有類似以下信息,則表示安裝成功Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
Spark單機(jī)配置
這里配置的是單機(jī)版Spark,不需要集群,不需要部署Hadoop等環(huán)境.
下載Spark2.4.3: https://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.6.tgz
-
下載解壓后,復(fù)制到用戶目錄下面
/home/{user}
屁置,然后在~/.bashrc
下面添加環(huán)境變量:export SPARK_HOME=/home/hwang/spark-2.4.3-bin-hadoop2.6 export SPARK_LOCAL_IP="127.0.0.1" export PATH=${SPARK_HOME}/bin:$PATH
-
然后執(zhí)行
spark-shell
,如果出現(xiàn)以下信息則表示安裝成功Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1559006613213). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.3 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172) scala>
GeoSpark
-
打開(kāi)IDEA,創(chuàng)建Maven新工程,修改pom.xml文件
<properties> <scala.version>2.11</scala.version> <geospark.version>1.2.0</geospark.version> <spark.compatible.verison>2.3</spark.compatible.verison> <spark.version>2.4.3</spark.version> <hadoop.version>2.7.2</hadoop.version> </properties> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>2.11.0</version> </dependency> <dependency> <groupId>org.datasyslab</groupId> <artifactId>geospark</artifactId> <version>${geospark.version}</version> </dependency> <dependency> <groupId>org.datasyslab</groupId> <artifactId>geospark-sql_${spark.compatible.verison}</artifactId> <version>${geospark.version}</version> </dependency> <dependency> <groupId>org.datasyslab</groupId> <artifactId>geospark-viz_${spark.compatible.verison}</artifactId> <version>${geospark.version}</version> </dependency> <dependency> <groupId>org.datasyslab</groupId> <artifactId>sernetcdf</artifactId> <version>0.1.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.version}</artifactId> <version>${spark.version}</version> <scope>${dependency.scope}</scope> <exclusions> <exclusion> <groupId>org.apache.hadoop</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.version}</artifactId> <version>${spark.version}</version> <scope>${dependency.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>${hadoop.version}</version> <scope>${dependency.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> <scope>${dependency.scope}</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build>
-
我們從CSV中創(chuàng)建一個(gè)Spark的RDD焊夸,CSV內(nèi)容如下:
-88.331492,32.324142,hotel -88.175933,32.360763,gas -88.388954,32.357073,bar -88.221102,32.35078,restaurant
然后我們初始化一個(gè)SparkContext,并調(diào)用GeoSpark的PointRDD蓝角,將我們的CSV導(dǎo)入阱穗。
SparkConf conf = new SparkConf(); conf.setAppName("GeoSpark01"); conf.setMaster("local[*]"); conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"); conf.set("spark.kryo.registrator", "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator"); JavaSparkContext sc = new JavaSparkContext(conf); String pointRDDInputLocation = Learn01.class.getResource("checkin.csv").toString(); Integer pointRDDOffset = 0; // 地理位置(經(jīng)緯度)從第0列開(kāi)始 FileDataSplitter pointRDDSplitter = FileDataSplitter.CSV; Boolean carryOtherAttributes = true; // 第二列的屬性(酒店名) PointRDD rdd = new PointRDD(sc, pointRDDInputLocation, pointRDDOffset, pointRDDSplitter, carryOtherAttributes);
-
坐標(biāo)系轉(zhuǎn)換
GeoSpark采用EPGS標(biāo)準(zhǔn)坐標(biāo)系,其坐標(biāo)系也可參考EPSG官網(wǎng):https://epsg.io/
// 坐標(biāo)系轉(zhuǎn)換 String sourceCrsCode = "epsg:4326"; String targetCrsCode = "epsg:3857"; rdd.CRSTransform(sourceCrsCode, targetCrsCode);