3.1「Stanford Algorithms」O(n log n) Algorithm for Counting Inversions1 - Part1

In this next series of videos, we'll get some more practice applying the divide and conquer algorithm and design paradigm to various problems.

This will also give us a glimpse of the kinds of application [inaudible] to which it's been successfully applied.

We're gonna start by extending the merge sort algorithm to solve a problem involving counting the number of inversions of an array.

Before we tackle the specific problem of counting the number of inversions in an array, let me say a few words about the divide and conquer paradigm in general.

So again, you've already seen the totally canonical example of divide and conquer, namely merge sort.

So the following three conceptual steps will be familiar to you.

The first step, no prizes for guessing is you divide.

The problem.

Into smaller sub-problems.

Sometimes this division happens only in your mind.

It's really more of a conceptual step than part of your code.

Sometimes you really do copy over parts of the input into say new arrays to pass on to your recursive calls.

The second step, again no prizes here, is you conquer the sub-problems just using recursion.

So for example, in Merge Sort, you conceptually divide the array into two different pieces.

And then you [inaudible] with the conquer or sort to the first half of the array.

And you, you do the same thing with the second half of the array.

Now, of course, it's not quite as easy as just doing these two steps.

Dividing the problem, and then solving the sub problems recursively.

Usually, you have some extra cleanup work after the recursive calls, and to stitch together the solutions to the sub problems into one for the big problem, the problem that you actually care about.

Recall, for example, in Merge Sort, after our recursive calls, the left half of the array was sorted, the right half of the array was sorted.

But we still had to stitch those together.

Merge into a sorted version of the entire array.

So the [inaudible] step is to combine.

The solutions to the subproblem into one problem.

Generally the largest amount of ingenuity happens in the third step.

How do you actually quickly combine solutions to subproblems into one to the original problem? Sometimes you also get some cleverness in the first step with division.

Sometimes it's as simple as just spliting a ray in two.

But there are cases where the division step also has some ingenuity.

Now let's move on to the specific problem of counting inversions and see how to apply this divide and conquer paradygm.

So let begin by defining the problem formally now.

We're given as input an array A with a length N.

And you can define the problem so that the array a contains any ole distinct numbers.

But, let's just keep thing simple and assume that it contains the numbers one through n.

The integers in that range in some order.

That captures the essence of the problem.

And the goal is to compute the number of inversions of this array so what's an inversion you may ask well an inversion is just a pair of array [inaudible] I and J with I smaller than J so that earlier array entry the I entry is bigger than the latter one the Jake one so one thing that should be evident is that if the array contains these numbers in sorted order if the array is simply one two three four all the way up to N then the number of inversions is zero.

The converse you might also want to think through if the array has any other ordering of the numbers between one and N other than the assorted one, then it's going to have a non.

Of zero number of inversions.

Let's look at another example.

So'spose we have an array of six entries.

So the numbers one thru six in the following order.

One, three, five followed by two, four, six.

So how many inversions does this array have? So again what we need to look for are pairs of array entries so that the earlier or left entry is bigger than the later or right entry.

So one example which we see right here would five and two.

Those are right next to each other and out of order, the earlier entry is bigger than the other one.

But there's others, there's three and two for example Those are out of order.

And, five and four are also out of order.

And I'll leave it to you to check that those are the only three pairs that are out of order.

So summarizing the inversions in this array of length six are 3-2, 5-2, and 5-4.

Corresponding to the array entries, 2-4, 3-4, and 3-5.

Pictorially, we can think of it thusly, we can first.

Write down the numbers in order, one up to six.

And then we can write down the numbers again but, ordered in the way that their given in the input array.

So, one three five two four six.

And then we can connect the dots, meaning we connect one to one.

Reconnect two to two, and so on.

It turns out, and I'll leave to for you to, to think this through, that the number of crossing pairs of line segments prescisely correspond to the number of inversions.

So we see that there are one, two, three crossing line segments.

And these are exactly in correspondence with the three inversions, we found earlier.

Five and two, three and two, and five and four.

Now, [inaudible] wanna solve this problem you might ask.

Well there's few reasons that come up.

One would be to have a numerical similarity measure that quantifies how close to [inaudible] lists are to each other.

So for example, suppose I took you and a friend, and I took, identified ten movies that both of you had seen.

And I asked each of you to order, or to rank these movies from your most favorite to your least favorite.

Now I can form an array, and compute inversions.

And it quantifies, in some sense, how dissimilar your two rankings are to each other.

So in more detail, in the first entry of this array, I would put down the ranking that your friend gave to your favorite movie.

So if you had your favorite movie, Star Wars or whatever.

And your friend only thought it was the fifth best out of the ten, then I would write down a five in the first entry of this array.

Generally, I would take your second favorite movie.

I would look at how your friend ranked that.

I would put that in the second entry of the array and so on, all the way up to the tenth entry of the array, where I would put your friend's ranking of your least favorite movie.

Now, if you have exactly identical preferences, if you rank them exactly the same way, the number of inversions of this array would be zero.

And in general, the more inversions this array has, it quantifies that your lists look more and more different from each other.

Now why might you want to do this why might you want to know whether two different people ranked things in the similar way had similar preferences well one reason might be what's called collaborative filtering, probably many of you have had the experience of going to a website and if you've made a few purchases through this website it starts recommending further purchases for you, so one way to solve this problem under the hood, is to look at your purchases look at what you seem to like, find other people who have similar preferences similar history look at things they've bought that you haven't, and then recommend.

New products to you based on what similar customers seemed to have bought.

So this problem captures some of the essence of identifying which customers or which people are similar based on data about what they prefer.

So just to make sure we're all on the same page, let me pause for a brief quiz.

We've already noticed that a given array will have zero inversions, if and only if it's in sorted order.

If it only contains the numbers of one through N in order.

So, on the other side, what is the largest number of inversions an array could possibly have? Let's say, just for an array of size six, like the one in this example here.

在接下來的視頻系列中旨剥，我們將獲得更多實踐，將分而治之算法和設(shè)計范例應(yīng)用于各種問題软族。

這也將使我們了解成功應(yīng)用了哪些應(yīng)用程序[聽不清]绸吸。

我們將從擴(kuò)展合并排序算法開始罩锐，以解決涉及計算數(shù)組反轉(zhuǎn)數(shù)的問題。

在我們解決對數(shù)組中的求逆數(shù)進(jìn)行計數(shù)的特定問題之前葛家，讓我先談一些關(guān)于分而治之的范式喷好。

同樣蚀同，您已經(jīng)看到了分而治之的完全典范示例缅刽，即合并排序啊掏。

因此，您將熟悉以下三個概念性步驟衰猛。

第一步迟蜜，猜猜是沒有獎品的。

問題啡省。

分為較小的子問題娜睛。

有時這種分裂只發(fā)生在您的腦海中懈涛。

這實際上是更多概念性步驟耍铜，而不是代碼的一部分。

有時闪盔，您確實確實將部分輸入復(fù)制到新數(shù)組中结序，以傳遞給遞歸調(diào)用障斋。

第二步，這里也沒有獎品徐鹤，是您僅使用遞歸即可解決子問題垃环。

因此，例如返敬，在“合并排序”中遂庄，您從概念上將數(shù)組分為兩個不同的部分。

然后劲赠，您[聽不清]用征服或排序到數(shù)組的前半部分涛目。

而您，對數(shù)組的后半部分執(zhí)行相同的操作凛澎。

現(xiàn)在泌绣，當(dāng)然，這并不像完成這兩個步驟那樣容易预厌。

劃分問題阿迈，然后遞歸解決子問題。

通常轧叽，在遞歸調(diào)用之后苗沧，您需要進(jìn)行一些額外的清理工作，并將子問題的解決方案組合為一個大問題炭晒，即您真正關(guān)心的問題待逞。

回想一下，例如网严，在“合并排序”中识樱，在我們進(jìn)行遞歸調(diào)用之后，對數(shù)組的左半部分進(jìn)行了排序，對數(shù)組的右半部分進(jìn)行了排序怜庸。

但是我們?nèi)匀槐仨殞⑺鼈兛p合在一起当犯。

合并到整個數(shù)組的排序版本中。

因此割疾，[聽不清]步驟是合并嚎卫。

該子問題的解決方案成為一個問題。

通常宏榕，最大的創(chuàng)造力發(fā)生在第三步拓诸。

您實際上如何快速將子問題的解決方案組合為原始問題？有時麻昼，在除法的第一步中奠支，您也會變得很聰明。

有時抚芦，就像將光線一分為二一樣簡單胚宦。

但是在某些情況下，分割步驟也有一些技巧燕垃。

現(xiàn)在枢劝，讓我們繼續(xù)研究反轉(zhuǎn)計數(shù)的特定問題，并了解如何應(yīng)用這種分而治之卜壕。

因此您旁，讓我們從現(xiàn)在開始正式定義問題開始。

我們將輸入長度為N的數(shù)組A作為輸入轴捎。

并且您可以定義問題鹤盒，以便數(shù)組a包含任何ole互不相同的數(shù)字。

但是侦副，讓我們保持簡單侦锯，并假設(shè)它包含數(shù)字1到n。

該范圍內(nèi)的整數(shù)按一定順序排列秦驯。

這抓住了問題的實質(zhì)尺碰。

而且目標(biāo)是計算此數(shù)組的求逆數(shù)，因此您可能會問什么是求逆译隘？求逆只是一對數(shù)組[聽不清] I和J亲桥，且I小于J，因此固耘，較早的數(shù)組入口I入口較大比后一個Jake更重要的一點是题篷，如果數(shù)組按排序順序包含這些數(shù)字，并且該數(shù)組只是一二三四一直到N厅目，則求逆數(shù)為零番枚。

相反法严，您可能還想考慮一下，如果數(shù)組在除N之外的其他任何一個介于1和N之間的數(shù)字葫笼，則它將有一個非整數(shù)深啤。

反轉(zhuǎn)數(shù)為零。

讓我們看另一個例子渔欢。

因此，假設(shè)我們有一個包含六個條目的數(shù)組瘟忱。

因此奥额，數(shù)字按以下順序從一到六。

一访诱，三垫挨，五，然后是二触菜，四九榔，六。

那么這個數(shù)組有多少個反轉(zhuǎn)呢涡相？因此哲泊，我們再次需要尋找的是成對的數(shù)組條目，以便較早或較左的條目大于較晚或較右的條目催蝗。

因此切威，我們在這里看到的一個示例將是五個和兩個。

那些是彼此相鄰且順序混亂的丙号，較早的條目要比另一個條目大先朦。

但是還有其他一些，例如三個和兩個犬缨，它們是亂序的喳魏。

并且，五個和四個也出現(xiàn)故障怀薛。

我將留給您檢查刺彩，以確保只有三對故障。

因此枝恋，總結(jié)此長度為6的數(shù)組中的反演為3-2迂苛、5-2和5-4。

對應(yīng)于數(shù)組條目2-4鼓择、3-4和3-5三幻。

在圖形上，我們可以這樣思考呐能，我們可以首先念搬。

依次寫下數(shù)字抑堡，最多1個。

然后我們可以再次寫下數(shù)字朗徊，但是按照輸入數(shù)組中給定的方式排序首妖。

因此，一三五二四六爷恳。

然后我們可以連接點有缆，這意味著我們一對一地連接。

重新連接兩個到兩個温亲，依此類推棚壁。

事實證明，我將讓您仔細(xì)考慮一下栈虚，線段的交叉對的數(shù)量精確地對應(yīng)于反轉(zhuǎn)的數(shù)量袖外。

因此，我們看到有1魂务、2曼验、3個交叉線段。

這些恰好與我們先前發(fā)現(xiàn)的三個反演相對應(yīng)粘姜。

五和二鬓照，三和二，五和四孤紧。

現(xiàn)在颖杏，[聽不清]想解決您可能會問的問題。

好吧坛芽，有幾個原因留储。

一種方法是采用數(shù)值相似性度量，以量化[聽不清]列表彼此之間的接近程度咙轩。

例如获讳，假設(shè)我?guī)愫鸵粋€朋友，然后帶我確定了你們倆都看過的十部電影活喊。

我請你們每個人訂購丐膝，或?qū)⑦@些電影從您最喜歡的到最不喜歡的排列。

現(xiàn)在钾菊，我可以形成一個數(shù)組帅矗，并計算反轉(zhuǎn)。

從某種意義上說煞烫，它量化了兩個排名之間的差異浑此。

因此，更詳細(xì)地講滞详，在該數(shù)組的第一個條目中凛俱，我將放下您的朋友對您最喜歡的電影的排名紊馏。

因此，如果您有自己喜歡的電影蒲犬，《星球大戰(zhàn)》或其他電影朱监。

而您的朋友只認(rèn)為這是十個最佳中的第五個，那么我會在該數(shù)組的第一項中寫下一個五個原叮。

通常赫编，我會拍第二部您喜歡的電影。

我會看看你的朋友如何評價的奋隶。

我會將其放在數(shù)組的第二個條目中擂送，依此類推，一直到數(shù)組的第十個條目达布，然后將您的朋友在您最不喜歡的電影中的排名放在此處团甲。

現(xiàn)在逾冬，如果您具有完全相同的首選項黍聂，如果以完全相同的方式對它們進(jìn)行排名，則此數(shù)組的反轉(zhuǎn)數(shù)將為零身腻。

通常产还，此數(shù)組的反轉(zhuǎn)次數(shù)更多，它可以量化列表之間的差異越來越大嘀趟。

現(xiàn)在脐区，為什么要執(zhí)行此操作，為什么要知道兩個不同的人是否以相似的方式對事物進(jìn)行排名她按，是否具有相似的偏好牛隅？一個原因可能就是所謂的協(xié)作過濾，也許你們中的許多人都有訪問網(wǎng)站的經(jīng)驗如果您通過該網(wǎng)站進(jìn)行了幾次購買酌泰，它就會開始為您推薦進(jìn)一步的購買媒佣，因此，解決這個問題的一種方法是查看您的購買陵刹，看看您的喜好默伍，找到其他有購買意愿的人相似的偏好相似的歷史記錄會查看他們購買的，您沒有的東西衰琐，然后推薦也糊。

根據(jù)相似客戶購買的商品為您提供新產(chǎn)品。

因此羡宙，此問題捕獲了一些本質(zhì)狸剃，這些本質(zhì)是基于有關(guān)他們偏愛的數(shù)據(jù)來識別哪些客戶或哪些人是相似的。

因此狗热，為了確保我們都在同一頁面上捕捂，讓我暫停一下簡短的測驗瑟枫。

我們已經(jīng)注意到，給定數(shù)組只有在按排序順序時才具有零反轉(zhuǎn)指攒。

如果僅按順序包含1到N的數(shù)字慷妙。

那么，另一方面允悦，數(shù)組可能具有的最大反轉(zhuǎn)數(shù)是多少膝擂？假設(shè)僅是大小為6的數(shù)組，例如此處的示例隙弛。

O(n log n) Algorithm for Counting Inversions I - Question 1

What is the largest-possible number of inversions that a 6-element array can have?

A 15

B 21

C 36

D 64