So in this video and the next, we're going to study a very cool divide and conquer algorithm for the closest pair problem.
this is a problem where you're given n points in the plane and you want to figure out which pair of points are closest to each other.
So this would be the first taste we get of an application in computational geometry, which is the part of algorithms which studies how to reason and manipulate geometric objects.
So those algorithms are important in, among other areas robotics, computer vision and computer graphics.
So this is relatively advanced material, it's a bit more difficult than the other applications of divide and conquer that we've seen.
The algorithm's a little bit tricky and it has a quite nontrivial proof of correctness, so just be ready for that and also be warned that because it's more advanced I'm going to talk about the material in at a slightly faster pace tha I do in most of the other videos.
So let's begin now by defining the problem formally, so we're given as imput endpoints in the plane, so each one just define by its x coordinate and ist y coordinate.
And when we talk about the distance between two points in this problem, we're going to focus on Euclidean distance.
So, let me remind you what that is briefly, but we're going to introduce some simple notation for that, which we'll use for the rest of the lecture.
So we're just going to note the Euclidean distance between two points, pi and pj, by d of pi pj.
So in terms of the x and y coordinates of these two points, we just look at the squared differences in each coordinate, sum them up, and take the square root.
And now as the name of the problem would suggest, the goal is to identify among all pairs of points that pair which has the smallest distance between them.
Next, let's start getting a feel for the problem by making some preliminary observations.
First, I want to make an assumption purely for convenience that there's no ties.
So that is I'm going to assume all endpoints have distinct x coordinat es, and also all endpoints have distinct y coordinates.
It's not difficult to extend the algorithm to accommodate ties.
I'll leave it to you to think about how to do that.
So next, let's draw some parallels with the problem of counting inversions, which was a earlier application of divide and conquer that we saw.
The first parallel I want, want to out is that, if we're comfortable with the quadratic time algorithm, then this is not a hard problem, we can simply solve this by brute-force search.
And again, by brute-force search, I just mean we set up a double for loop, which iterates over all distinct pairs of points.
We compute the distance for each such pair and we remember the smallest.
That's clearly a correct algorithm, it has to iterate over a quadratic number of pairs, so its running time is going to be theta of n squared.
And, as always, the question is can we apply some algorithmic ingenuity to do better? Can we have a better algorithm than this naive one which iterates over all pairs of points? You might have a, an initial instinct that because the problem asks about a quadratic number of different objects, perhaps we fundamentally need to do quadratic work.
But again, recall back in counting inversions, using divide and conquer, we were able to get an n log n algorithm despite the fact that there might be as many as a quadratic number of inversions in an array.
So the question is, can we do something similar here for the closest pair problem? Now, one of the keys to getting an n log n time algorithm for counting inversions was to leverage a sorting subroutine.
Recall that we piggybacked on merge sort to count the number of inversions in n log n time.
So the question is, here, with the closest pair problem, perhaps, sorting again can be useful in some way to beat the quadratic barrier.
So, to develop some evidence that sorting will indeed help us compute the closest pair of points embedded in quadratic time, let's look at a special case of the problem, really, an easier version of t he problem, which is when the points are just in one dimension, so on the line rather that in two dimensions in the plane.
So in the 1D version, all the points just lie on a line like this one, and we're given the points in some arbitrary order not necessarily in sorted order.
So, a way to solve the closest pair problem in one dimension, is to simply sort the points, and then of course, the closest pair better be adjacent in this ordering, so you just iterate through the n minus 1 consecutive pairs and see which one is closest to each other So, more formally, here's how you solve the one-dimensional version of the problem.
You sort the points according to their only coordinate, because you're going to remember, this is one dimension.
So as we've seen, using merge sort, we can sort the points in n log n time and then we just do a scan through the points, so this takes linear time.
And for each consecutive pair, we compute their distance and we remember the smallest of those consecutive pairs and we return that.
That's gotta be the closest pair.
So, in this picture here on the right, I'm just going to circle here in green the closest pair of points.
So this is something we discover by sorting and then doing a linear scan.
Now, needless to say, this isn't directly useful, this is not the problem I started out with.
We wanted to find out the closest pair among of points in the plane not points in the line.
But, I want to point out that, this, even in the line, there are a quadratic number of different pairs, so brute-force search is still a quadratic time algorythm even in the 1D case.
So at least, with one dimension, we can use sorting, piggyback on it, to beat the naive brute-force search bound and solve the problem in n log n time.
So our goal for this lecture is going to be to devise an equally good algorithm for the two-dimensional case, so we want to solve closest pair of points in the plane, again, in n log n, n time.
So we will succeed in this goal.
I'm going to show you an n log n time algo rithm for 2D closest pair.
It's going to take us a couple steps.
So let me begin with a high level approach.
Alright.
So the first I need to try is just to copy what worked for us in the one-dimensional case.
So the one-dimensional case, we first sorted the points by their coordinate and that was really useful.
Now, in the 2D case, points have two coordinates, x coordinates and y coordinates, so there's two ways to sort them.
So let's just sort them both ways, that is, the first step of our algorithm, which you should really think of as a preprocessing step.
We're going to take the input points.
We invoke merge sort once to sort them according to x coordinate, that's one copy of the points.
And then we make a second copy of the points where they're sorted by y coordinates.
So we're going to call those copies of points px, that's an array of the points sorted by x coordinate, and py for them sorted by y coordinate.
Now, we know merge short takes n log n times, so this preprocessing step only takes o of n log n time.
And again, given that we're shooting for an algorithm with running time big O of n log n, why not sort the points? We don't even know how we're going to use this fact right now, but it's sort of harmless.
It's not going to effect our goal of getting a big of O n log n time algorithm.
And indeed, this illustrates a broader point, which is one of the themes of this course.
So recall, I hope one of the things you take away from this course is a sense for what are the four free primitives, what are manipulations or operations you can do on data which basically are costless.
Meaning that if your data set fits in the main memory of your computer, you can basically invoke the primitive and it's just going to run blazingly fast, and you can just do it even if you don't know why.
And again, sorting is the canonical for free primitive, although, we'll see some more later in the course and so, here, we're using exactly that principle.
So we don't even understand why yet we might wa nt the points to be sorted.
It just seems like it's probably going to be useful, motivated by the 1D case, so let's go ahead and make assorted copies of the points by x and y coordinate upfront.
So reasoning by analogy with the 1D suggests that sorting the points might be useful, but we can't carry this analogy too far.
So in particular, we're not going to be able to get away with just a simple linear scan through these arrays to identify the closest pair of points.
So, to see that, consider the following example.
So we're going to look at a point set which has six points.
There's going to be two points, which I'll put in blue which are very close in x coordinate, but very far away in y coordinate.
And then there's going to be another pair of points which I'll do in green, which are very close in y coordinate, but very far away in x coordinate.
And then there's going to be a red pair of points, which are not too far away in either the x coordinate or the y coordinate.
So in this set of six points, the closest pair is the pair of red points.
They're not even going to show up consecutively on either of the two arrays, right? So in the array that's sorted by x coordinate, this blue point here is going to be wedged in between the two red points, they won't be consecutive.
And similarly in the, in py, which is sort of by y coordinate, this green coordinate is going to be wedged between the two red points.
So you won't even notice these red point if you just do a linear scan if your px and py, or py look at the consecutive pairs of points.
So, following our preprocessing step where we just invert, invoke merge sort twice we're going to do a quite nontrivial divide and conquer algorithm to compute the closest pair.
So really, in this algorithm, we're applying the divide and conquer algorithm twice.
First, internal to the sorting subroutine, assuming that we use the merge sort algorithm to sort.
Divide and conquer is being used there to get an n log n running time in this preprocessing step, and the n, we're going to use it again on sorted arrays in a new way and that's what I'm going to tell you about next.
So let's just briefly review the divide and conquer algorithm design paradigm before we apply it to the closest pair problem.
So, as usual, the first step is to figure out a way to divide your problem into smaller subproblems.
Sometimes this has a reasonable amount of ingenuity, but it's not going to.
Here in the closest pair problem, we're going to proceed exactly as we did in the merge sort and counting inversions problems, where we took the array and broke it into its left and right half.
So here, we're going to take the input point set, and again, just recurse on the left half of the points, and recurse on the right half of the points.
So here, by left and right, I mean with respect to the points x coordinates.
There's pretty much never any ingenuity in the conquer step, that just means you take the sub-problems you identified in the first step, and you solve them recursively.
That's what we'll do here, we'll recursively complete the closest pair in the left half of the points, and the closest pair in the right half of the points.
So where all the creativity in divide and conquer algorithms is in the combined step.
Given the solutions to your sub problems, how do you somehow recover a solution to the original problem? The one that you actually care about.
So for closest pair, the questionis going to be, given that you've computed the closest pair on the left half of the points, and the closest pair on the right half of the points, how do you then quickly recover the closest pair for the whole point set? That's a tricky problem, that's what we're going to spend most of our time on.
So let's make this divide and conquer approach for closest pair a little bit more precise, so let's now actually start spelling out our closest pair algorithm.
The input we're given, it's, this follows the preprocessing steps or recall that we invoke, merge sort, we get our two sorted copies of the poin t set Px, sorted by x coordinate, and py sorted by y coordinate.
So the first dividend is the division step.
So given that we have a copy of the points px sorted by x coordinate, it's easy to identify the leftmost half of the points, those with the, those n over two smallest x coordinates and in the right half, those were the n over two largest x coordinates.
We're going to call those Q and R respectively.
One thing I'm skipping over is the base case.
I'm not going to bother writing that down, so base case omitted, but it's what you would think it would be.
So basically once you have a small number point, say two points or three points, then you can just solve the problem in constant time by a brute-force search.
You just look at all the pairs and you return the closest pair.
So think of it being at least four points in the input.
Now, in order to recurse, to call clo pair again, in the left and right halves, we need sorted version of Q and R, both by x coordinate and by y coordinate, so we're just going to form those by doing suitable linear scans through px and py.
And so one thing I encourage you to think through carefully or maybe even code up after the video is how would you form Qx, Qy, Rx and Ry given that you already have Px and Py.
And if you think about it, because Px and Py are already sorted just producing these sorted sublists takes linear time.
It's in some sense the opposite of the merge subroutine used in merge sort.
Here, we're sort of splitting rather than merging.
But again, this can be done in linear time, that's something you should think through carefully later.
So that's the division step, now we just conquer, meaning we recursively call closest pair line on each of the two subproblems, so when we invoke closest pair on the left half of the points on Q we're going to get back what are indeed, the closest pair of points amongst those in Q.
So we're going to call those P1 and Pq, So among all pairs of points that both lie in Q, P1 and Q1 minimize the distance between them.
Similarly, we're going to call Q2Q2 the results of the second recursive call, that is, P2 and Q2 are amongst all pairs of points that both lie in R, the pair that has the minimum Euclidean distance.
Now, conceptually, there's two cases.
There's a lucky case and there's an unlucky case.
In the original point set P, if we're lucky, the closest pair of points in all of P, actually, both of them lie in Q or both of them lie in R.
In this lucky case, we'd already be done if the closest pair in the entire point set they happen to both lie in Q, then this first recursive call is going to recover them and we just have them in our hands P1Q1.
Similarly, if both of the closest pair of points in all of P lies on the right side in R, then they get handed to us on a silver platter by the second recursive call that just operate on R.
So in the unlucky case, the closest pair of point in P happens to be split.
That is, one of the points lies in the left half, in Q, and the other point lies in the right half, in R.
Notice, if the closest pair of points in all of P is split, is half in Q and half in R, neither recursive call is going to find it.
Okay? The pair of points is not passed to either of the two recursive calls, so there's no way it's going to be returned to us.
Okay? So we have not identified the closest pair after these two recursive calls, if the closest pair happens to be split.
This is exactly analagous to what happened when we were counting inversions.
The recursive call on the left half of the array counted the left inversions.
The recursive call on the right half of the array counted the right inversions.
But we still had to count the split inversions, so in this closest pair algorithm, we still need a special purpose subroutine that computes the closest pair for the case in which it is split, in which there is one point in Q and one point in R.
So just like in counting inversions, I'm going to write down that subroutine and I'm going to leave it unimplemented for now, we'll figur e out how to implement it quickly in the rest of the lecture.
Now, if we have a correct implementation of closest split pair, so that takes us input the original point set sort of the x and y coordinate, and returns the smallest pair that's split or one points in Q and one points in R, then we're done.
So then, the split, then the closest pair has to either be on the lef or onm the right or it has to be split.
Steps two through four compute the closest pair in each of those categories, so those are the only possible candidates for the closest pair and we just returned the best of them.
So that's an argument for y, if we have a correct implementation of the closest split para subroutine, then that implies a correct implementation of closest pair.
Now, what about the running time? So the running time of the closest para algorithm is going to be in part determined by the running time of closest split pair.
So in the next quiz, I want you to think about what kind of running time we should be shooting for with a closest split pair subroutine.
因此礼仗,在此視頻及下一個(gè)視頻中奈虾,我們將研究一種非巢酷的分而治之算法誓禁,用于解決最接近的配對(duì)問(wèn)題幻赚。
這是一個(gè)問(wèn)題广恢,您在平面中獲得了n個(gè)點(diǎn)绞愚,并且想找出哪對(duì)點(diǎn)最接近他嫡。
因此,這將是我們?cè)谟?jì)算幾何中應(yīng)用的第一個(gè)嘗試备籽,這是研究如何推理和操縱幾何對(duì)象的算法的一部分舶治。
因此,這些算法在機(jī)器人技術(shù)车猬,計(jì)算機(jī)視覺(jué)和計(jì)算機(jī)圖形學(xué)等領(lǐng)域都很重要霉猛。
因此,這是相對(duì)高級(jí)的材料珠闰,它比我們已經(jīng)看到的其他“分而治之”應(yīng)用程序要困難一些惜浅。
該算法有點(diǎn)棘手,并且具有相當(dāng)不平凡的正確性證明伏嗜,因此請(qǐng)為此做好準(zhǔn)備坛悉,并警告一下,由于它的高級(jí)性承绸,我將以稍微快一點(diǎn)的速度談?wù)撛摬牧下阌啊F渌蠖鄶?shù)視頻。
因此军熏,讓我們從形式上定義問(wèn)題開(kāi)始轩猩,因此,我們將其指定為平面中的歸因端點(diǎn)羞迷,因此每個(gè)端點(diǎn)僅由其x坐標(biāo)和ist y坐標(biāo)定義界轩。
當(dāng)我們討論此問(wèn)題中兩點(diǎn)之間的距離時(shí),我們將重點(diǎn)關(guān)注歐幾里得距離衔瓮。
因此浊猾,讓我提醒您一下這是簡(jiǎn)短的內(nèi)容,但是我們將為此引入一些簡(jiǎn)單的表示法热鞍,我們將在后續(xù)的講座中使用葫慎。
因此衔彻,我們僅要注意pi pj的d點(diǎn),即pi和pj兩點(diǎn)之間的歐幾里得距離偷办。
因此艰额,就這兩點(diǎn)的x和y坐標(biāo)而言,我們只需查看每個(gè)坐標(biāo)的平方差椒涯,將它們求和柄沮,然后求平方根即可。
現(xiàn)在废岂,正如問(wèn)題的名稱所暗示的那樣祖搓,目標(biāo)是在所有成對(duì)的點(diǎn)之間識(shí)別距離最小的那對(duì)點(diǎn)。
接下來(lái)湖苞,讓我們通過(guò)做一些初步的觀察來(lái)開(kāi)始了解這個(gè)問(wèn)題拯欧。
首先,我僅出于方便起見(jiàn)就假設(shè)沒(méi)有聯(lián)系财骨。
因此镐作,我將假設(shè)所有端點(diǎn)都具有不同的x坐標(biāo),并且所有端點(diǎn)都具有不同的y坐標(biāo)隆箩。
擴(kuò)展算法以適應(yīng)聯(lián)系并不困難该贾。
我留給您考慮如何做。
因此摘仅,接下來(lái)靶庙,讓我們與計(jì)算反轉(zhuǎn)的問(wèn)題進(jìn)行一些比較,這是我們所看到的分而治之的早期應(yīng)用娃属。
我想做的第一個(gè)并行操作是,如果我們對(duì)二次時(shí)間算法感到滿意护姆,那么這不是一個(gè)難題矾端,我們可以通過(guò)蠻力搜索簡(jiǎn)單地解決這個(gè)問(wèn)題。
同樣卵皂,通過(guò)蠻力搜索秩铆,我只是意味著我們?cè)O(shè)置了一個(gè)double for循環(huán),該循環(huán)遍歷所有不同的點(diǎn)對(duì)灯变。
我們?yōu)槊總€(gè)這樣的對(duì)計(jì)算距離殴玛,并記住最小的距離。
顯然添祸,這是一種正確的算法滚粟,它必須迭代二次對(duì),因此其運(yùn)行時(shí)間將是n平方的θ刃泌。
而且凡壤,像往常一樣署尤,問(wèn)題是我們可以運(yùn)用某些算法的獨(dú)創(chuàng)性來(lái)做得更好嗎?我們可以有一個(gè)比在所有點(diǎn)對(duì)上迭代的幼稚算法更好的算法嗎亚侠?最初曹体,您可能會(huì)有一個(gè)直覺(jué),因?yàn)閱?wèn)題詢問(wèn)的是不同數(shù)量的對(duì)象的二次方硝烂,也許我們從根本上需要進(jìn)行二次工作箕别。
但是,再次回顧一下滞谢,使用除法和征服計(jì)數(shù)倒數(shù)的方法串稀,盡管數(shù)組中倒數(shù)的數(shù)量可能是二次數(shù),但我們?nèi)匀豢梢缘玫絥 log n算法爹凹。
所以問(wèn)題是厨诸,對(duì)于最接近的配對(duì)問(wèn)題,我們可以在這里做類似的事情嗎禾酱?現(xiàn)在微酬,獲取n log n時(shí)間算法來(lái)計(jì)算反轉(zhuǎn)的關(guān)鍵之一是利用排序子例程。
回想一下颤陶,我們背負(fù)了歸并排序颗管,以n次n次記錄反轉(zhuǎn)次數(shù)。
因此滓走,問(wèn)題是垦江,在這里,對(duì)于最接近的配對(duì)問(wèn)題搅方,也許可以用某種方式再次進(jìn)行排序以克服二次障礙比吭。
因此,為了獲得一些證據(jù)證明排序確實(shí)可以幫助我們計(jì)算二次時(shí)間中嵌入的最接近的點(diǎn)對(duì)姨涡,讓我們看一下問(wèn)題的一種特殊情況衩藤,實(shí)際上是問(wèn)題的更簡(jiǎn)單版本,即當(dāng)這些點(diǎn)只是在一維上涛漂,所以在直線上赏表,而不是平面上的二維。
因此匈仗,在1D版本中瓢剿,所有點(diǎn)都位于這樣的一條線上,并且我們可以按任意順序(不一定按排序順序)獲得這些點(diǎn)悠轩。
因此间狂,一種解決一維最接近對(duì)問(wèn)題的方法是簡(jiǎn)單地對(duì)點(diǎn)進(jìn)行排序,然后哗蜈,當(dāng)然前标,最接近對(duì)最好按此順序相鄰坠韩,因此您只需遍歷n個(gè)負(fù)1個(gè)連續(xù)對(duì),然后查看哪個(gè)因此炼列,更正式地講只搁,這是解決問(wèn)題的一維版本的方法。
您可以根據(jù)點(diǎn)的唯一坐標(biāo)對(duì)其進(jìn)行排序俭尖,因?yàn)槟涀∏馔铮@是一個(gè)維度。
因此稽犁,如我們所見(jiàn)焰望,使用合并排序,我們可以按n log n個(gè)時(shí)間對(duì)點(diǎn)進(jìn)行排序已亥,然后只掃描這些點(diǎn)熊赖,所以這需要線性時(shí)間。
對(duì)于每個(gè)連續(xù)對(duì)虑椎,我們計(jì)算它們的距離震鹉,并記住那些連續(xù)對(duì)中的最小對(duì),然后將其返回捆姜。
那一定是最接近的一對(duì)传趾。
因此,在這張右圖中泥技,我將在此處以綠色圈出最近的一對(duì)點(diǎn)浆兰。
因此,這是我們通過(guò)排序然后進(jìn)行線性掃描發(fā)現(xiàn)的珊豹。
現(xiàn)在簸呈,不用說(shuō),這不是直接有用的店茶,這不是我剛開(kāi)始遇到的問(wèn)題蝶棋。
我們想找出平面上的點(diǎn)中最接近的點(diǎn),而不是直線中的點(diǎn)忽妒。
但是,我想指出的是兼贸,即使在一行中段直,也存在二次數(shù)的不同對(duì),因此溶诞,即使在一維情況下鸯檬,蠻力搜索仍然是二次時(shí)間算法。
因此螺垢,至少在一個(gè)維度上喧务,我們可以使用排序赖歌,搭載在上面,以克服樸素的蠻力搜索界限功茴,并在n log n時(shí)間內(nèi)解決問(wèn)題庐冯。
因此,本次演講的目標(biāo)是針對(duì)二維情況設(shè)計(jì)一個(gè)同樣好的算法坎穿,因此展父,我們希望在n log n,n時(shí)間內(nèi)再次求解平面中最接近的點(diǎn)對(duì)玲昧。
因此栖茉,我們將成功實(shí)現(xiàn)這一目標(biāo)。
我將向您展示2D最接近對(duì)的n次算法孵延。
這需要我們走幾步吕漂。
因此,讓我從一個(gè)高層次的方法開(kāi)始尘应。
好的惶凝。
因此,我首先要嘗試的是復(fù)制一維情況下對(duì)我們有用的內(nèi)容菩收。
因此梨睁,在一維情況下,我們首先通過(guò)點(diǎn)的坐標(biāo)對(duì)其進(jìn)行排序娜饵,這確實(shí)很有用坡贺。
現(xiàn)在,在2D情況下箱舞,點(diǎn)具有兩個(gè)坐標(biāo)遍坟,即x坐標(biāo)和y坐標(biāo),因此有兩種對(duì)它們進(jìn)行排序的方法晴股。
因此愿伴,讓我們以兩種方式對(duì)它們進(jìn)行排序,即算法的第一步电湘,您應(yīng)該將其真正視為預(yù)處理步驟隔节。
我們將獲取輸入點(diǎn)。
我們調(diào)用一次歸并排序以根據(jù)x坐標(biāo)對(duì)它們進(jìn)行排序寂呛,這就是這些點(diǎn)的一個(gè)副本怎诫。
然后,我們對(duì)按y坐標(biāo)對(duì)它們進(jìn)行排序的點(diǎn)進(jìn)行第二次復(fù)制贷痪。
因此幻妓,我們將這些點(diǎn)稱為px,即按x坐標(biāo)排序的點(diǎn)的數(shù)組劫拢,對(duì)于py則按y坐標(biāo)排序的點(diǎn)的數(shù)組肉津。
現(xiàn)在强胰,我們知道m(xù)erge short需要n log n次,因此此預(yù)處理步驟僅需要n log n次的o妹沙。
再一次偶洋,假設(shè)我們正在為運(yùn)行時(shí)間為O O n log n的算法進(jìn)行射擊,為什么不對(duì)點(diǎn)排序初烘?我們甚至不知道我們現(xiàn)在將如何使用這個(gè)事實(shí)涡真,但這是無(wú)害的。
這不會(huì)影響我們獲得大量O n log n時(shí)間算法的目標(biāo)肾筐。
實(shí)際上哆料,這說(shuō)明了一個(gè)更廣泛的觀點(diǎn),這是本課程的主題之一吗铐。
因此东亦,回想一下,我希望您從本課程中學(xué)到的東西之一是對(duì)四個(gè)免費(fèi)原語(yǔ)的理解唬渗,對(duì)基本無(wú)成本的數(shù)據(jù)可以進(jìn)行的操作或操作是什么典阵。
這意味著,如果您的數(shù)據(jù)集適合您計(jì)算機(jī)的主內(nèi)存镊逝,則您基本上可以調(diào)用該原語(yǔ)壮啊,并且它將以極快的速度運(yùn)行,即使您不知道為什么也可以這樣做撑蒜。
同樣歹啼,排序是免費(fèi)原始語(yǔ)言的規(guī)范,盡管座菠,我們將在本課程的后面部分看到更多內(nèi)容狸眼,因此,在這里浴滴,我們正使用該原理拓萌。
因此,我們什至不理解為什么我們還不知道要排序的點(diǎn)升略。
似乎受一維情況的啟發(fā)微王,它可能會(huì)很有用,所以讓我們繼續(xù)制作x和y坐標(biāo)的點(diǎn)的各種副本品嚣。
因此骂远,通過(guò)與一維類比進(jìn)行推理表明,對(duì)點(diǎn)進(jìn)行排序可能很有用腰根,但我們不能將此類類推得太遠(yuǎn)。
因此拓型,尤其是额嘿,僅通過(guò)這些數(shù)組進(jìn)行簡(jiǎn)單的線性掃描以識(shí)別最接近的點(diǎn)對(duì)就無(wú)法擺脫困境瘸恼。
因此,請(qǐng)看下面的示例册养。
因此东帅,我們將看一個(gè)包含六個(gè)點(diǎn)的點(diǎn)集。
將有兩個(gè)點(diǎn)球拦,我將用藍(lán)色表示靠闭,在x坐標(biāo)上非常接近,但在y坐標(biāo)上非常遙遠(yuǎn)坎炼。
然后我將用綠色做另外一對(duì)點(diǎn)愧膀,它們?cè)趛坐標(biāo)上非常接近,而在x坐標(biāo)上非常遙遠(yuǎn)谣光。
然后會(huì)有一對(duì)紅色的點(diǎn)檩淋,它們?cè)趚坐標(biāo)或y坐標(biāo)上都不太遠(yuǎn)。
因此萄金,在這套六個(gè)點(diǎn)中蟀悦,最接近的一對(duì)是紅色點(diǎn)對(duì)。
它們甚至不會(huì)連續(xù)出現(xiàn)在兩個(gè)陣列中的任何一個(gè)上氧敢,對(duì)嗎日戈?因此,在按x坐標(biāo)排序的數(shù)組中孙乖,此藍(lán)色點(diǎn)將被楔入兩個(gè)紅色點(diǎn)之間浙炼,它們將不會(huì)連續(xù)。
同樣的圆,在以y坐標(biāo)表示的py中鼓拧,該綠色坐標(biāo)將被楔入兩個(gè)紅色點(diǎn)之間。
因此越妈,如果您只進(jìn)行線性掃描(如果px和py或py看著連續(xù)的點(diǎn)對(duì))季俩,您甚至都不會(huì)注意到這些紅點(diǎn)。
因此梅掠,在預(yù)處理步驟中酌住,我們只需進(jìn)行反轉(zhuǎn),然后調(diào)用合并排序兩次阎抒,我們將做一個(gè)非常平凡的分而治之算法來(lái)計(jì)算最接近的對(duì)酪我。
因此,實(shí)際上且叁,在此算法中都哭,我們兩次應(yīng)用了分治法。
首先,在排序子例程的內(nèi)部欺矫,假設(shè)我們使用合并排序算法進(jìn)行排序纱新。
在此預(yù)處理步驟中,使用分治法來(lái)獲得n log n的運(yùn)行時(shí)間穆趴,而n我們將以一種新的方式在排序數(shù)組上再次使用它脸爱,這就是我要告訴您的下一個(gè)。
因此未妹,在將其應(yīng)用到最接近的對(duì)問(wèn)題之前簿废,讓我們簡(jiǎn)要地回顧一下分而治之算法設(shè)計(jì)范例。
因此络它,像往常一樣族檬,第一步是想出一種將問(wèn)題分解為較小的子問(wèn)題的方法。
有時(shí)酪耕,這有一定的創(chuàng)造力导梆,但事實(shí)并非如此。
在最接近的對(duì)問(wèn)題中迂烁,我們將像在合并排序和計(jì)數(shù)反演問(wèn)題中一樣看尼,繼續(xù)進(jìn)行操作,在這里我們將數(shù)組分成左右兩半盟步。
因此藏斩,在這里,我們將采用輸入點(diǎn)集却盘,然后再次遞歸于點(diǎn)的左半部分狰域,并遞歸于點(diǎn)的右半部分。
所以在這里黄橘,我指的是關(guān)于點(diǎn)x坐標(biāo)兆览。
在征服步驟中幾乎沒(méi)有任何聰明才智,這僅意味著您要解決第一步中確定的子問(wèn)題塞关,然后遞歸地解決它們抬探。
這就是我們要做的,我們將在點(diǎn)的左半部分中遞歸地完成最接近的對(duì)帆赢,在點(diǎn)的右半部分中遞歸地完成最接近的對(duì)小压。
因此,分治法中所有創(chuàng)造力都在結(jié)合步驟中椰于。
給定您的子問(wèn)題的解決方案怠益,您如何以某種方式恢復(fù)原始問(wèn)題的解決方案?您真正關(guān)心的那個(gè)瘾婿。
因此蜻牢,對(duì)于最接近的對(duì)烤咧,問(wèn)題就在于,假設(shè)您已計(jì)算出點(diǎn)的左半部分的最接近對(duì)孩饼,而對(duì)點(diǎn)的右半部分則計(jì)算了最接近對(duì)髓削,那么您如何快速恢復(fù)以下點(diǎn)對(duì)呢?整個(gè)定點(diǎn)镀娶?這是一個(gè)棘手的問(wèn)題,這就是我們將要花費(fèi)大部分時(shí)間的原因揪罕。
因此梯码,讓我們對(duì)最接近對(duì)的分治法更加精確一些,讓我們現(xiàn)在開(kāi)始實(shí)際闡明最接近對(duì)的算法好啰。
我們得到的輸入是按照預(yù)處理步驟執(zhí)行的轩娶,或者我們調(diào)用了調(diào)用,合并排序框往,得到了點(diǎn)集px的兩個(gè)排序副本鳄抒,分別按x坐標(biāo)排序,而py按y坐標(biāo)排序椰弊。
因此许溅,第一筆紅利是除法步驟。
因此秉版,假設(shè)我們擁有按x坐標(biāo)排序的點(diǎn)px的副本贤重,則很容易識(shí)別出點(diǎn)的最左半部分,即那些在最小的x坐標(biāo)上的n個(gè)點(diǎn)清焕,以及在兩個(gè)最小的x坐標(biāo)上的n個(gè)點(diǎn)的最右半部分并蝗。最大x坐標(biāo)。
我們將分別稱為Q和R秸妥。
我要跳過(guò)的一件事是基本情況滚停。
我不會(huì)費(fèi)心寫(xiě)下來(lái),因此省略了基本情況粥惧,但這就是您認(rèn)為的那樣键畴。
因此,基本上影晓,一旦您有一個(gè)小數(shù)點(diǎn)镰吵,例如兩點(diǎn)或三點(diǎn),您就可以通過(guò)蠻力搜索在恒定時(shí)間內(nèi)解決問(wèn)題挂签。
您只需查看所有對(duì)疤祭,然后返回最接近的對(duì)。
因此饵婆,可以認(rèn)為輸入中至少有四個(gè)點(diǎn)勺馆。
現(xiàn)在,為了遞歸,在左右兩半再次調(diào)用clo對(duì)草穆,我們需要按x坐標(biāo)和y坐標(biāo)對(duì)Q和R進(jìn)行排序灌灾,所以我們將通過(guò)做適當(dāng)?shù)倪x擇來(lái)形成Q和R。通過(guò)px和py進(jìn)行線性掃描悲柱。
因此锋喜,我鼓勵(lì)您仔細(xì)考慮一遍,甚至在視頻播放后進(jìn)行編碼豌鸡,考慮到已經(jīng)擁有Px和Py嘿般,您將如何形成Qx,Qy涯冠,Rx和Ry炉奴。
而且,如果考慮一下蛇更,因?yàn)镻x和Py已經(jīng)排序瞻赶,只生成這些排序的子列表將花費(fèi)線性時(shí)間。
從某種意義上講派任,這與合并排序中使用的合并子例程相反砸逊。
在這里,我們有點(diǎn)分裂而不是合并吨瞎。
但是同樣痹兜,這可以在線性時(shí)間內(nèi)完成,這是您以后應(yīng)該仔細(xì)考慮的事情颤诀。
這就是除法步驟字旭,現(xiàn)在我們就征服了,這意味著我們遞歸地在兩個(gè)子問(wèn)題中的每一個(gè)上調(diào)用最接近的對(duì)線崖叫,因此遗淳,當(dāng)我們?cè)赒上點(diǎn)的左半部分調(diào)用最接近的對(duì)時(shí),我們將得到確實(shí)是心傀,是Q中最接近的一對(duì)點(diǎn)屈暗。
因此,我們將它們稱為P1和Pq脂男,因此在都位于Q的所有成對(duì)的點(diǎn)之間养叛,P1和Q1將它們之間的距離最小化。
類似地宰翅,我們將第二個(gè)遞歸調(diào)用的結(jié)果稱為Q2Q2弃甥,即P2和Q2都位于R中,這對(duì)點(diǎn)具有最小的歐幾里德距離汁讼,這對(duì)所有點(diǎn)都在其中淆攻。
現(xiàn)在阔墩,從概念上講,有兩種情況瓶珊。
有一個(gè)幸運(yùn)的情況啸箫,有一個(gè)不幸運(yùn)的情況。
如果幸運(yùn)的話伞芹,在原始點(diǎn)集P中忘苛,所有P中最接近的一對(duì)點(diǎn)實(shí)際上都位于Q或都位于R中。
在這種幸運(yùn)的情況下唱较,如果它們?cè)谡麄€(gè)點(diǎn)集中最接近的一對(duì)都碰巧都位于Q上柑土,那么我們已經(jīng)做完了,那么第一個(gè)遞歸調(diào)用將恢復(fù)它們绊汹,而我們只需將它們放在我們的手中。
類似地扮宠,如果所有P中最接近的兩個(gè)點(diǎn)對(duì)都位于R的右側(cè)西乖,那么第二個(gè)遞歸調(diào)用(它們僅在R上)會(huì)將它們放在銀盤(pán)上交給我們。
因此坛增,在不幸的情況下获雕,P中最接近的點(diǎn)對(duì)恰好被分割了。
也就是說(shuō)收捣,其中一個(gè)點(diǎn)位于Q的左半部分届案,另一點(diǎn)位于R的右半部分。
請(qǐng)注意罢艾,如果將所有P中最接近的點(diǎn)分開(kāi)楣颠,則Q的一半,R的一半咐蚯,則兩個(gè)遞歸調(diào)用都不會(huì)找到它童漩。
好的?這對(duì)點(diǎn)不會(huì)傳遞給兩個(gè)遞歸調(diào)用中的任何一個(gè)春锋,因此不可能將其返回給我們矫膨。
好的?因此期奔,如果最接近的一對(duì)碰巧被拆分侧馅,則在這兩個(gè)遞歸調(diào)用之后,我們尚未確定最接近的一對(duì)呐萌。
這恰好與我們計(jì)算反演時(shí)發(fā)生的情況類似馁痴。
數(shù)組左半部分的遞歸調(diào)用計(jì)算了左反轉(zhuǎn)。
數(shù)組右半部分的遞歸調(diào)用計(jì)算了正確的反轉(zhuǎn)搁胆。
但是我們?nèi)匀槐仨氂?jì)算分裂的倒數(shù)弥搞,因此在這種最接近的對(duì)算法中邮绿,我們?nèi)匀恍枰粋€(gè)特殊用途的子例程,該子例程針對(duì)分裂的情況計(jì)算最接近的對(duì)攀例,其中Q中有一個(gè)點(diǎn)船逮,而在Q中有一個(gè)點(diǎn)。 R.
因此粤铭,就像計(jì)算反轉(zhuǎn)一樣挖胃,我將寫(xiě)下該子例程,并且現(xiàn)在暫時(shí)不執(zhí)行它梆惯,我們將在本教程的其余部分中弄清楚如何快速實(shí)現(xiàn)它酱鸭。
現(xiàn)在,如果我們正確地實(shí)現(xiàn)了最接近的拆分對(duì)垛吗,那么就需要我們輸入x和y坐標(biāo)的原始點(diǎn)集凹髓,然后返回拆分的最小對(duì)或Q中的一個(gè)點(diǎn)和R中的一個(gè)點(diǎn),那么我們'重做怯屉。
因此蔚舀,然后進(jìn)行分割,然后最接近的一對(duì)必須位于左側(cè)或右側(cè)锨络,否則必須對(duì)其進(jìn)行分割赌躺。
第2步到第4步將計(jì)算每個(gè)類別中最接近的對(duì),因此這些是最接近對(duì)的唯一可能的候選者羡儿,我們只返回了其中的最好的對(duì)礼患。
因此,這是y的參數(shù)掠归,如果我們對(duì)最接近的split para子例程有正確的實(shí)現(xiàn)缅叠,則意味著對(duì)最接近的對(duì)的正確實(shí)現(xiàn)。
現(xiàn)在拂到,運(yùn)行時(shí)間如何痪署?因此,最接近對(duì)等算法的運(yùn)行時(shí)間將部分由最接近分裂對(duì)的運(yùn)行時(shí)間確定兄旬。
因此狼犯,在下一個(gè)測(cè)驗(yàn)中,我希望您考慮使用最接近的拆分對(duì)子例程應(yīng)該拍攝什么樣的運(yùn)行時(shí)間领铐。
O(n log n) Algorithm for Closest Pair I [Advanced - Optional] - Question 1
Suppose we can correctly implement the ClosestSplitPair subroutine in???(??)?time.
What will be the overall running time of the ClosestPair algorithm ? (Choose the smallest upper bound that applies)
A. ??(??)
B. ??(??log(??))
C. ??(??(log(??))^2)
D. ??(??^2)