GeoSpark源碼解析(四)
上節(jié)我們講了GeoSpark如何根據(jù)已經(jīng)分好的格網(wǎng)(Grid)來進行分塊(partition)操作毒姨。對于GeoSpark來說搂鲫,格網(wǎng)的生成算法實際上是決定了并行計算的關鍵,在上一節(jié)中同窘,我們講到了SpatialRDD
的spatialPartitioning
方法,他需要一個SpatialPartitioner
類型的參數(shù),這個參數(shù)有三種類型:
這三種SpatialPartitioner
其實是在public void spatialPartitioning(GridType gridType, int numPartitions)
這個函數(shù)中構造的讼庇。我們先看下GridType
:
public enum GridType{
/**
* The equalgrid.
*/
EQUALGRID,
/**
* The hilbert.
*/
HILBERT,
/**
* The rtree.
*/
RTREE,
/**
* The voronoi.
*/
VORONOI,
/**
* The voronoi.
*/
QUADTREE,
/**
* K-D-B-tree (k-dimensional B-tree)
*/
KDBTREE;
}
代碼中共有6種,我們分別介紹下:
-
EQUALGRID
近尚,這個我們從名字上就可以看出他是等分蠕啄,就是等分為多少行多少列。
public EqualPartitioning(Envelope boundary, int partitions)
{
//Local variable should be declared here
Double root = Math.sqrt(partitions);
int partitionsAxis;
double intervalX;
double intervalY;
//Calculate how many bounds should be on each axis
partitionsAxis = root.intValue();
intervalX = (boundary.getMaxX() - boundary.getMinX()) / partitionsAxis;
intervalY = (boundary.getMaxY() - boundary.getMinY()) / partitionsAxis;
//System.out.println("Boundary: "+boundary+"root: "+root+" interval: "+intervalX+","+intervalY);
for (int i = 0; i < partitionsAxis; i++) {
for (int j = 0; j < partitionsAxis; j++) {
Envelope grid = new Envelope(boundary.getMinX() + intervalX * i, boundary.getMinX() + intervalX * (i + 1), boundary.getMinY() + intervalY * j, boundary.getMinY() + intervalY * (j + 1));
//System.out.println("Grid: "+grid);
grids.add(grid);
}
//System.out.println("Finish one column/one certain x");
}
}
代碼的第11戈锻、12行首先根據(jù)要分塊的長寬計算X歼跟、Y間隔,然后第14格遭、15行for循環(huán)遍歷生成網(wǎng)格。
-
HILBERT
:Hilbert其實是最近很流行的一種新型NoSQL空間索引拒迅,就是利用這個算法生成key-Value格式的空間索引骚秦,存儲在HBase、Accumulo等分布式非關系型數(shù)據(jù)庫中坪它,目前LocationTech下的GeoWave骤竹、GeoMesa就是采用了這個思路。 -
RTREE
往毡,下面這張圖很好的展示R樹的工作原理蒙揣,其實就是對整個空間建立索引關系,加快搜索效率开瞭。然后GeoSpark是將RTree的最后一級葉子節(jié)點作為網(wǎng)格建立懒震。
-
VORONOI
,這個不了解嗤详。 -
QUADTREE
个扰,GIS中很經(jīng)典的一種索引算法,就是數(shù)據(jù)結構中的四叉樹葱色,其原理就是將一個空間每次四等分下去递宅,分到某個級別后停止。 -
KDBTREE
,這個我也不是特別理解办龄,感覺和RTREE大致上差不多的烘绽,可能具體針對特別的空間分布下的空間數(shù)據(jù)有相應的優(yōu)勢吧。
當Grid構建好之后俐填,就可以構造SpatialPartitioner
安接,然后進行分塊操作。
public void spatialPartitioning(GridType gridType, int numPartitions)
throws Exception{
// 省略部分代碼
switch (gridType) {
case EQUALGRID: {
EqualPartitioning EqualPartitioning = new EqualPartitioning(paddedBoundary, numPartitions);
grids = EqualPartitioning.getGrids();
partitioner = new FlatGridPartitioner(grids);
break;
}
case HILBERT: {
HilbertPartitioning hilbertPartitioning = new HilbertPartitioning(samples, paddedBoundary, numPartitions);
grids = hilbertPartitioning.getGrids();
partitioner = new FlatGridPartitioner(grids);
break;
}
case RTREE: {
RtreePartitioning rtreePartitioning = new RtreePartitioning(samples, numPartitions);
grids = rtreePartitioning.getGrids();
partitioner = new FlatGridPartitioner(grids);
break;
}
case VORONOI: {
VoronoiPartitioning voronoiPartitioning = new VoronoiPartitioning(samples, numPartitions);
grids = voronoiPartitioning.getGrids();
partitioner = new FlatGridPartitioner(grids);
break;
}
case QUADTREE: {
QuadtreePartitioning quadtreePartitioning = new QuadtreePartitioning(samples, paddedBoundary, numPartitions);
partitionTree = quadtreePartitioning.getPartitionTree();
partitioner = new QuadTreePartitioner(partitionTree);
break;
}
case KDBTREE: {
final KDBTree tree = new KDBTree(samples.size() / numPartitions, numPartitions, paddedBoundary);
for (final Envelope sample : samples) {
tree.insert(sample);
tree.assignLeafIds();
partitioner = new KDBTreePartitioner(tree);
break;
}
}
this.spatialPartitionedRDD = partition(partitioner);
}
到這里英融,基本上GeoSpark基本代碼就介紹完了盏檐,也算是對前期自己工作的一個總結,后續(xù)根據(jù)時間驶悟、項目再來進行探討胡野。