[Week 1] Princeton Algorithm PartII WordNet

最近在coursera上學(xué)習(xí)Princeton大學(xué)的Algorithm PartII臊诊,這個(gè)系列的兩門(mén)課是我見(jiàn)過(guò)最好的算法課癞埠。主講Robert Sedgewick師承高德納状原,聲名遠(yuǎn)播聋呢,雖然年紀(jì)大了,但是講課思路清晰颠区,深入淺出削锰。與Kevin Wayne共同編著的《算法》第四版作為教材,沒(méi)有《算法導(dǎo)論》那么偏理論毕莱、晦澀難懂器贩,能夠把常見(jiàn)的數(shù)據(jù)結(jié)構(gòu)和算法講得很透徹,容易理解朋截。
更值得稱(chēng)道的是這門(mén)課編程作業(yè)及其評(píng)分系統(tǒng)蛹稍。上課的內(nèi)容是基礎(chǔ)的數(shù)據(jù)結(jié)構(gòu)和算法,但編程作業(yè)往往是這些算法的實(shí)際應(yīng)用部服,能真正理解為什么在給定的情況下要使用這樣的數(shù)據(jù)結(jié)構(gòu)唆姐、算法。作業(yè)的評(píng)分不僅僅是看正確與否廓八,還有運(yùn)行時(shí)間奉芦、內(nèi)存的要求,會(huì)給出詳細(xì)的分析報(bào)告剧蹂,如果不是ACM大神声功,這些作業(yè)都值得花時(shí)間去仔細(xì)琢磨。
下面進(jìn)入正題宠叼,第一周的作業(yè)WordNet

題目

WordNet is a semantic lexicon for theEnglish language that is used extensively by computational linguistsand cognitive scientists; for example, it was a key component in IBM'sWatson.WordNet groups words into sets of synonyms called synsets and describes semantic relationships between them.One such relationship is the is-a relationship, which connects a hyponym(more specific synset) to a hypernym (more general synset).For example, locomotion is a hypernym of runningand running is a hypernym of dash.

The WordNet digraph.Your first task is to build the wordnet digraph: each vertex v is an integer that represents a synset, and each directed edge v→w represents that w is a hypernym of v.The wordnet digraph is a rooted DAG: it is acylic and has one vertex thatis an ancestor of every other vertex.However, it is not necessarily a tree because a synset can have more than onehypernym. A small subgraph of the wordnet digraph is illustrated below.

The WordNet input file formats.We now describe the two data files that you will use to create the wordnet digraph.The files are in CSV format: each line contains a sequence of fields,separated by commas.

List of noun synsets.The file synsets.txtlists all the (noun) synsets in WordNet.The first field is the synset id (an integer),the second field is the synonym set (or synset), and thethird field is its dictionary definition (or gloss).For example, the line
36,AND_circuit AND_gate,a circuit in a computer that fires only when all of its inputs fire
means that the synset { AND_circuit, AND_gate }has an id number of 36 and it's gloss isa circuit in a computer that fires only when all of its inputs fire.The individual nouns that comprise a synset are separatedby spaces (and a synset element is not permitted to contain a space).The S synset ids are numbered 0 through S ? 1;the id numbers will appear consecutively in the synset file.
List of hypernyms.The file hypernyms.txtcontains the hypernym relationships:The first field is a synset id; subsequent fields are the id numbersof the synset's hypernyms. For example, the following line
164,21012,56099
means that the the synset 164 ("Actifed") has two hypernyms:21012 ("antihistamine") and56099 ("nasal_decongestant"),representing that Actifed is both an antihistamine and a nasal decongestant.The synsets are obtained from the corresponding lines in the file synsets.txt.

164,Actifed,trade name for a drug containing an antihistamine and a decongestant...
21012,antihistamine,a medicine used to treat allergies...
56099,nasal_decongestant,a decongestant that provides temporary relief of nasal...
WordNet data type.Implement an immutable data type WordNet with the following API:

// constructor takes the name of the two input files
public WordNet(String synsets, String hypernyms)

// the set of nouns (no duplicates), returned as an Iterable
public Iterable<String> nouns()

// is the word a WordNet noun?
public boolean isNoun(String word)

// distance between nounA and nounB (defined below)
public int distance(String nounA, String nounB)

// a synset (second field of synsets.txt) that is the common ancestor of nounA and nounB
// in a shortest ancestral path (defined below)
public String sap(String nounA, String nounB)

// for unit testing of this class
public static void main(String[] args)
The constructor should throw a java.lang.IllegalArgumentExceptionif the input does not correspond to a rooted DAG.The distance() and sap() methodsshould throw a java.lang.IllegalArgumentExceptionunless both of the noun arguments are WordNet nouns.
Your data type should use space linear in the input size(size of synsets and hypernyms files).The constructor should take time linearithmic (or better) in the input size.The method isNoun() should run in time logarithmic (or better) inthe number of nouns.The methods distance() and sap() should run in time linear in thesize of the WordNet digraph.

Shortest ancestral path.An ancestral path between two verticesv and w in a digraph is a directed path fromv to a common ancestor x, together witha directed path from w to the same ancestor x. A shortest ancestral path is an ancestral path of minimum total length.For example, in the digraph at left(digraph1.txt),the shortest ancestral path between3 and 11 has length 4 (with common ancestor 1).In the digraph at right (digraph2.txt),one ancestral path between 1 and 5 has length 4(with common ancestor 5), but the shortest ancestral path has length 2(with common ancestor 0).

SAP data type.Implement an immutable data type SAP with the following API:

// constructor takes a digraph (not necessarily a DAG)
public SAP(Digraph G)

// length of shortest ancestral path between v and w; -1 if no such path
public int length(int v, int w)

// a common ancestor of v and w that participates in a shortest ancestral path; -1 if no such path
public int ancestor(int v, int w)

// length of shortest ancestral path between any vertex in v and any vertex in w; -1 if no such path
public int length(Iterable<Integer> v, Iterable<Integer> w)

// a common ancestor that participates in shortest ancestral path; -1 if no such path
public int ancestor(Iterable<Integer> v, Iterable<Integer> w)

// for unit testing of this class (such as the one below)
public static void main(String[] args)
All methods should throw a java.lang.IndexOutOfBoundsException if one (or more) of the input arguments is not between 0 and G.V() - 1.You may assume that the iterable arguments contain at least one integer.All methods (and the constructor) should take time at mostproportional to E + Vin the worst case, where E and V are the number of edges and verticesin the digraph, respectively.Your data type should use space proportional to E + V.
Test client.The following test client takes the name of a digraph input file asas a command-line argument, constructs the digraph,reads in vertex pairs from standard input,and prints out the length of the shortest ancestral path between the two verticesand a common ancestor that participates in that path:

public static void main(String[] args) {
In in = new In(args[0]);
Digraph G = new Digraph(in);
SAP sap = new SAP(G);
while (!StdIn.isEmpty()) {
int v = StdIn.readInt();
int w = StdIn.readInt();
int length = sap.length(v, w);
int ancestor = sap.ancestor(v, w);
StdOut.printf("length = %d, ancestor = %d\n", length, ancestor);
}
}
Here is a sample execution:
% more digraph1.txt % java SAP digraph1.txt
13 3 11
11 length = 4, ancestor = 1
7 3
8 3 9 12
3 1 length = 3, ancestor = 5
4 1
5 1 7 2
9 5 length = 4, ancestor = 0
10 5
11 10 1 6
12 10 length = -1, ancestor = -1
1 0
2 0
Measuring the semantic relatedness of two nouns.Semantic relatedness refers to the degree to which two concepts are related. Measuring semantic relatedness is a challenging problem. For example, most of us agree that George Bush and John Kennedy (two U.S. presidents)are more related than are George Bushand chimpanzee (two primates). However, not most of us agree that George Bush and Eric Arthur Blair are related concepts. But if one is aware that George Bush and Eric Arthur Blair (aka George Orwell) are both communicators, then it becomes clear that the two concepts might be related.

We define the semantic relatednessof two wordnet nouns A and B as follows:

distance(A, B) = distance is the minimum length of any ancestral path betweenany synset v of A and any synset w of B.
This is the notion of distance that you will use to implement thedistance() and sap() methods in the WordNet data type.

Outcast detection.Given a list of wordnet nouns A1, A2,..., An, which nounis the least related to the others? To identify an outcast,compute the sum of the distances between each noun and every other one:

di = dist(Ai, A1) + dist(Ai, A2) + ... + dist(Ai, An)
and return a noun Atfor which dt is maximum.
Implement an immutable data type Outcast with the following API:

// constructor takes a WordNet object
public Outcast(WordNet wordnet)

// given an array of WordNet nouns, return an outcast
public String outcast(String[] nouns)

// for unit testing of this class (such as the one below)
public static void main(String[] args)
Assume that argument array to the outcast() methodcontains only valid wordnet nouns (and that it contains at least two such nouns).
The following test client takes from the command line the name of a synset file, the name of a hypernym file, followed by thenames of outcast files, and prints out an outcast in each file:

public static void main(String[] args) {
WordNet wordnet = new WordNet(args[0], args[1]);
Outcast outcast = new Outcast(wordnet);
for (int t = 2; t < args.length; t++) {
In in = new In(args[t]);
String[] nouns = in.readAllStrings();
StdOut.println(args[t] + ": " + outcast.outcast(nouns));
}
}
Here is a sample execution:
% more outcast5.txt
horse zebra cat bear table

% more outcast8.txt
water soda bed orange_juice milk apple_juice tea coffee

% more outcast11.txt
apple pear peach banana lime lemon blueberry strawberry mango watermelon potato

% java Outcast synsets.txt hypernyms.txt outcast5.txt outcast8.txt outcast11.txt
outcast5.txt: table
outcast8.txt: bed
outcast11.txt: potato

題目分析

wordnet是一個(gè)有上下位關(guān)系的英語(yǔ)詞典先巴,可以描述詞語(yǔ)間的關(guān)系。下位詞是上位詞的一種具體描述车吹,是一種isa關(guān)系筹裕。例如running是dash的上位詞。從數(shù)據(jù)結(jié)構(gòu)上來(lái)看wordnet是一個(gè)有向無(wú)環(huán)圖窄驹。每個(gè)節(jié)點(diǎn)是一個(gè)同義詞集合(synset)朝卒,一個(gè)英語(yǔ)單詞或復(fù)合詞可能有多個(gè)意思,因此可能會(huì)出現(xiàn)在多個(gè)不同節(jié)點(diǎn)中乐埠。每條邊都是由下位詞指向上位詞抗斤,有且僅有從下位詞到上位詞的路徑,因此不可能存在環(huán)丈咐。
題目中wordnet是以synsets.txt和hypernyms.txt的形式給出瑞眼。synsets.txt中按字母順序給出了所有同義詞集合的序號(hào)、集合棵逊、描述伤疙。hypernyms.txt按同義詞集合的序號(hào)給出了上下位的關(guān)系,每行第一個(gè)序號(hào)是下位詞,之后的序號(hào)都是上位詞徒像。一個(gè)下位詞可能存在多個(gè)上位詞黍特。
題目的要求是實(shí)現(xiàn)三個(gè)類(lèi):

  • WordNet.java
  • SAP.java
  • Outcast.java

SAP.java是一個(gè)基礎(chǔ)類(lèi),給定一個(gè)有向圖锯蛀,計(jì)算兩點(diǎn)之間的最短距離灭衷,以及兩點(diǎn)的共同祖先。首先應(yīng)該完成這個(gè)類(lèi)旁涤。
WordNet.java中實(shí)現(xiàn)計(jì)算wordnet結(jié)構(gòu)中兩個(gè)詞的最短距離及共同祖先翔曲。
Outcast.java就比較簡(jiǎn)單了,找出一組詞中最不相關(guān)的一個(gè)劈愚,調(diào)用WordNet.java很容易實(shí)現(xiàn)瞳遍。

SAP.java

下面具體分析每個(gè)類(lèi)的實(shí)現(xiàn)
首先實(shí)現(xiàn)SAP.java,在這個(gè)類(lèi)中實(shí)現(xiàn)計(jì)算最短距離和共同祖先的算法造虎,然后可以應(yīng)用到wordnet中傅蹂。

分析

求最短路徑和共同祖先的思想就是從兩個(gè)不同節(jié)點(diǎn)出發(fā),找到都能到達(dá)且距離最短的節(jié)點(diǎn)算凿。要計(jì)算距離份蝴,使用廣度優(yōu)先搜索,并且保存和更新每個(gè)節(jié)點(diǎn)的到出發(fā)點(diǎn)的距離氓轰。

成員變量:

private Digraph digraph;

private int ancestor;
    
private int length;

構(gòu)造方法:

 public SAP(Digraph G) {
        if (G == null) throw new NullPointerException();
        this.digraph = G;
    }

由于最短距離是兩個(gè)節(jié)點(diǎn)到某公共節(jié)點(diǎn)的距離組合婚夫,可能出現(xiàn)多種情況,因此自定義一個(gè)數(shù)據(jù)結(jié)構(gòu)Node來(lái)表示公共節(jié)點(diǎn)署鸡,把所有可能的公共節(jié)點(diǎn)放入一個(gè)優(yōu)先隊(duì)列中案糙,最后從中取最小值。

private class Node implements Comparable<Node> {
        private int length;
        private int id;

        Node(int length, int id) {
            this.length = length;
            this.id = id;
        }

        @Override
        public int compareTo(Node o) {
            if (length > o.length) {
                return 1;
            } else if (length == o.length) {
                return 0;
            } else {
                return -1;
            }
        }
    }

求最短距離的完整代碼:

  1. 先對(duì)一個(gè)節(jié)點(diǎn)進(jìn)行廣度優(yōu)先搜素靴庆,計(jì)算到所有節(jié)點(diǎn)的距離时捌。
  2. 對(duì)另一個(gè)節(jié)點(diǎn)進(jìn)行廣度優(yōu)先搜素,計(jì)算到所有節(jié)點(diǎn)的距離炉抒,如果某節(jié)點(diǎn)可以從兩個(gè)節(jié)點(diǎn)分別到達(dá)奢讨,則存在優(yōu)先隊(duì)列中。
  3. 從優(yōu)先隊(duì)列中取出最小值焰薄,即為最短距離拿诸。
// length of shortest ancestral path between v and w; -1 if no such path
    public int length(int v, int w) {
        if (v < 0 || v >= digraph.V() || w < 0 || w >= digraph.V())
            throw new IndexOutOfBoundsException();
        MinPQ<Node> possibleLength = new MinPQ();
        boolean[] marked = new boolean[this.digraph.V()];
        marked[v] = true;
        int[] pathV = new int[digraph.V()];
        int[] pathW = new int[digraph.V()];
        for (int i = 0; i < digraph.V(); i++) {
            pathV[i] = -1;
            pathW[i] = -1;
        }
        pathV[v] = 0;
        pathW[w] = 0;

        // bfs on v
        Queue<Integer> queue = new Queue<>();
        queue.enqueue(v);
        while (!queue.isEmpty()) {
            int vertex = queue.dequeue();
            for (int nextVertex : digraph.adj(vertex)) {
                if (!marked[nextVertex]) {
                    marked[nextVertex] = true;
                    queue.enqueue(nextVertex);
                    if (pathV[nextVertex] == -1 || pathV[nextVertex] > pathV[vertex] + 1) {
                        pathV[nextVertex] = pathV[vertex] + 1;
                    }

                }
            }
        }

        // bfs on w
        marked = new boolean[this.digraph.V()];
        marked[w] = true;
        queue.enqueue(w);
        while (!queue.isEmpty()) {
            int vertex = queue.dequeue();
            if (pathV[vertex] > -1) {
                Node node = new Node(pathV[vertex] + pathW[vertex], vertex);
                possibleLength.insert(node);
            }
            for (int nextVertex : digraph.adj(vertex)) {
                if (!marked[nextVertex]) {
                    marked[nextVertex] = true;
                    queue.enqueue(nextVertex);
                    if (pathW[nextVertex] == -1 || pathW[nextVertex] > pathW[vertex] + 1) {
                        pathW[nextVertex] = pathW[vertex] + 1;
                    }

                }
            }
        }
        if (possibleLength.size() > 0) {
            Node node = possibleLength.delMin();
            this.length = node.length;
            this.ancestor = node.id;
        } else {
            this.length = -1;
            this.ancestor = -1;
        }
        return this.length;
    }

共同祖先在計(jì)算最短距離時(shí)也能獲得,Node中的id就是共同祖先塞茅,因此可以復(fù)用亩码。

// a common ancestor of v and w that participates in a shortest ancestral path; -1 if no such path
    public int ancestor(int v, int w) {
        length(v, w);
        return this.ancestor;
    }

輸入的v,w為多個(gè)節(jié)點(diǎn)時(shí),直接用暴力for循環(huán)找出每?jī)蓚€(gè)節(jié)點(diǎn)間的最短距離和共同祖先野瘦,存到優(yōu)先隊(duì)列中描沟,最后取出的就是兩組節(jié)點(diǎn)間的最短紀(jì)錄和共同祖先。
代碼:

    // length of shortest ancestral path between any vertex in v and any vertex in w; -1 if no such path
    public int length(Iterable<Integer> v, Iterable<Integer> w) {
        if (v == null || w == null) throw new NullPointerException();
        MinPQ<Node> possibleNodes = new MinPQ();
        for (int nodeV : v) {
            if (nodeV < 0 || nodeV > this.digraph.V())
                throw new IndexOutOfBoundsException();
            for (int nodeW : w) {
                if (nodeW < 0 || nodeW > this.digraph.V())
                    throw new IndexOutOfBoundsException();

                Node node = new Node(length(nodeV, nodeW), ancestor(nodeV, nodeW));
                possibleNodes.insert(node);
            }
        }

        if (possibleNodes.size() > 0) {
            Node node = possibleNodes.delMin();
            this.length = node.length;
            this.ancestor = node.id;
        } else {
            this.length = -1;
            this.ancestor = -1;
        }
        return this.length;
    }
    
    // a common ancestor that participates in shortest ancestral path; -1 if no such path
    public int ancestor(Iterable<Integer> v, Iterable<Integer> w) {
        length(v, w);
        return this.ancestor;
    }

WordNet.java

要把sap中的算法實(shí)際高效地應(yīng)用到wordnet中并不簡(jiǎn)單,需要找到適合的數(shù)據(jù)結(jié)構(gòu)來(lái)儲(chǔ)存我們需要的信息并且能夠高效地查詢(xún)啊掏。

題目要求

  1. 返回wordnet中所有的詞
  2. 判斷一個(gè)詞是否在wordnet中
  3. 兩個(gè)詞的最短距離
  4. 兩個(gè)詞最短路徑上的共同上位詞的同義詞集合

要做到1和2兩個(gè)要求蠢络,需要一個(gè)沒(méi)有重復(fù)詞的集合,而且能夠根據(jù)詞來(lái)快速搜索迟蜜。提到快速搜索,立即想到時(shí)間復(fù)雜度log(n)的二叉搜索樹(shù)啡省,正好符合題目的時(shí)間要求娜睛。因此,使用PartI中講到過(guò)的SET<key>(在Alg4.jar中卦睹,用平衡二叉樹(shù)實(shí)現(xiàn)log(n)的插入畦戒、搜索)。

SET需要一個(gè)key來(lái)實(shí)現(xiàn)搜索结序。要求搜索的是詞障斋,但不能直接以string作為key,因?yàn)檫€要考慮到最短路徑的實(shí)現(xiàn)徐鹤。因此垃环,用一個(gè)自定義結(jié)構(gòu):實(shí)現(xiàn)Comparable接口時(shí)比較的是詞的字母順序,另外由于詞的多義性用一個(gè)ArrayList存放出現(xiàn)在wordnet中的可能位置返敬。

private class Node implements Comparable<Node> {
        private ArrayList<Integer> ids;
        private String noun;
        Node(String noun) {
            this.noun = noun;
            this.ids = new ArrayList<>();
        }

        private void addId(int id) {
            this.ids.add(id);
        }

        private ArrayList<Integer> getIds() {
            return this.ids;
        }

        @Override
        public int compareTo(Node o) {
            return this.noun.compareTo(o.noun);
        }
    }

成員變量:

private SAP wordnetSap;
    private Digraph digraph;
    private boolean hasCycle;
    private boolean[] onStack;
    private boolean[] marked;
    private ArrayList<Integer> possibleRoots; // a DAG has only one root
    private ArrayList<String[]> synsets; // synsets for each wordnet node
    private SET<Node> allNouns; // SET to store Node

構(gòu)造方法:讀取synsets.txt和hypernyms.txt,構(gòu)建表示wordnet的圖遂庄。Node表示詞以及詞可能出現(xiàn)的位置,建立SET劲赠。與sap中不同涛目,共同祖先需要根據(jù)序號(hào)返回同義詞組,與SET的思路不一致凛澎,因此還需要一個(gè)ArrayList來(lái)表示每個(gè)序號(hào)對(duì)應(yīng)的同義詞組霹肝。

根據(jù)題目要求,還需要檢查構(gòu)建的wordnet圖是否合法塑煎,即檢查是否無(wú)環(huán)沫换、是否只有一個(gè)root。這個(gè)檢查合法性同樣用廣度優(yōu)先搜索可以完成:

  • 是否無(wú)環(huán):搜索時(shí)如果之前訪(fǎng)問(wèn)過(guò)某節(jié)點(diǎn)轧叽,則構(gòu)成環(huán)苗沧。
  • 是否只有一個(gè)root:無(wú)論從哪個(gè)節(jié)點(diǎn)開(kāi)始搜索,最后一個(gè)節(jié)點(diǎn)(沒(méi)有指向其他節(jié)點(diǎn)的邊)是唯一的炭晒。
public WordNet(String synsets, String hypernyms) {
        if (synsets == null || hypernyms == null)
            throw new NullPointerException();
        possibleRoots = new ArrayList<>();
        this.synsets = new ArrayList<>();
        allNouns = new SET<Node>();
        int count = 0;
        try {
            BufferedReader in = new BufferedReader(new FileReader(synsets));
            String line;
            while ((line = in.readLine()) != null) {
                String[] parts = line.split(",");
                String aSynset = parts[1];
                String[] strings = aSynset.split(" ");
                for (String str : strings) {
                    Node node = new Node(str);
                    if (this.allNouns.contains(node)) {
                        this.allNouns.ceiling(node).addId(Integer.parseInt(parts[0]));
                    }else {
                        node.addId(Integer.parseInt(parts[0]));
                        this.allNouns.add(node);
                    }
                }
                this.synsets.add(strings);

                count++;
            }
            in.close();

            this.digraph = new Digraph(count);

            BufferedReader hypernymsReader = new BufferedReader(new FileReader(hypernyms));
            while ((line = hypernymsReader.readLine()) != null){
                String[] temp = line.split(",");
                int id = Integer.parseInt(temp[0]);
                for (int i = 1; i < temp.length; i++) {
                    this.digraph.addEdge(id, Integer.parseInt(temp[i]));
                }
            }
            hypernymsReader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

        this.wordnetSap = new SAP(this.digraph);

        onStack = new boolean[digraph.V()];
        marked = new boolean[digraph.V()];
        if (hasCycle(this.digraph)) throw new IllegalArgumentException();
        if (this.possibleRoots.size() > 1) throw new IllegalArgumentException();
    }
    
    private boolean hasCycle(Digraph digraph) {
        for (int i = 0; i < digraph.V(); i++) {
            if (!this.marked[i]) dfs(digraph, i);
        }
        return hasCycle;
    }

    private void dfs(Digraph digraph, int v) {

        onStack[v] = true;
        marked[v] = true;

        if (!digraph.adj(v).iterator().hasNext()) {
            if (!this.possibleRoots.contains(v))
                this.possibleRoots.add(v);
        }
        for (int w : digraph.adj(v)) {
            if (this.hasCycle) return;
            else if (!marked[w]) {
                dfs(digraph, w);
            }
            else if (onStack[w]) this.hasCycle = true;
        }
        onStack[v] = false;
    }

選對(duì)了合適的數(shù)據(jù)結(jié)構(gòu)待逞,完成了構(gòu)造方法,其實(shí)就完成了絕大部分工作网严,剩下的實(shí)現(xiàn)就相對(duì)容易了识樱。

返回wordnet中所有的詞:

public Iterable<String> nouns() {
        ArrayList<String> nouns = new ArrayList<>();
        for (Node node : this.allNouns) {
            nouns.add(node.noun);
        }
        return nouns;
    }

判斷一個(gè)詞是否在wordnet中:

public boolean isNoun(String word) {
        if (word == null) throw new NullPointerException();
        Node node = new Node(word);
        return this.allNouns.contains(node);
    }

兩個(gè)詞的最短距離:用SET和Node迅速找到詞所對(duì)應(yīng)的可能位置,用sap中的方法可以馬上計(jì)算。

public int distance(String nounA, String nounB) {
        if (nounA == null || nounB == null)
            throw new NullPointerException();
        if (!isNoun(nounA) || !isNoun(nounB))
            throw new IllegalArgumentException();
        ArrayList<Integer> idAs;
        ArrayList<Integer> idBs;

        Node nodeA = new Node(nounA);
        Node nodeB = new Node(nounB);
        nodeA = this.allNouns.ceiling(nodeA);
        nodeB = this.allNouns.ceiling(nodeB);
        idAs = nodeA.getIds();
        idBs = nodeB.getIds();

        return wordnetSap.length(idAs, idBs);
    }

兩個(gè)詞最短路徑上的共同上位詞的同義詞集合:同樣用sap中的方法找到位置怜庸,然后從事先存好的arraylist中獲取同義詞集合当犯,拼接成給定的形式。

public String sap(String nounA, String nounB) {
        if (nounA == null || nounB == null)
            throw new NullPointerException();
        if (!isNoun(nounA) || !isNoun(nounB))
            throw new IllegalArgumentException();

        ArrayList<Integer> idAs;
        ArrayList<Integer> idBs;

        Node nodeA = new Node(nounA);
        Node nodeB = new Node(nounB);
        nodeA = this.allNouns.ceiling(nodeA);
        nodeB = this.allNouns.ceiling(nodeB);
        idAs = nodeA.getIds();
        idBs = nodeB.getIds();

        int id = wordnetSap.ancestor(idAs, idBs);
        String[] strings = this.synsets.get(id);
        String result = "";
        for (int i = 0; i< strings.length; i++) {
            result += strings[i];
            if (i != strings.length - 1)
                result += " ";
        }
        return result;
    }

Outcast.java

outcast就簡(jiǎn)單了割疾,從給定的詞組中嚎卫,計(jì)算每個(gè)詞與其他詞的最短距離之和。和最大的詞自然就是與別的詞最不同的了宏榕。

public class Outcast {
    private WordNet wordNet;

    public Outcast(WordNet wordnet) {
        this.wordNet = wordnet;
    }

    public String outcast(String[] nouns) {
        int[] distances = new int[nouns.length];
        for (int i = 0; i < nouns.length; i++) {
            String noun = nouns[i];
            for (int j = 0; j < nouns.length; j++) {
                if (i != j) {
                    distances[i] += wordNet.distance(noun, nouns[j]);
                }
            }
        }
        int max = 0;
        int id = 0;
        for (int i = 0; i < distances.length; i++) {
            if (distances[i] > max) {
                max = distances[i];
                id = i;
            }
        }

        return nouns[id];
    }

}

總結(jié)

筆者水平有限拓诸,上述答案其實(shí)只有91分,運(yùn)行時(shí)間不達(dá)標(biāo)麻昼。拋磚引玉奠支,希望大神們能夠提出改進(jìn)方法,大家一起學(xué)習(xí)進(jìn)步抚芦,謝謝倍谜!

項(xiàng)目源碼在GitHub中,請(qǐng)點(diǎn)這里

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末叉抡,一起剝皮案震驚了整個(gè)濱河市尔崔,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌卜壕,老刑警劉巖您旁,帶你破解...
    沈念sama閱讀 219,110評(píng)論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異轴捎,居然都是意外死亡鹤盒,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,443評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門(mén)侦副,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)侦锯,“玉大人,你說(shuō)我怎么就攤上這事秦驯〕吲觯” “怎么了?”我有些...
    開(kāi)封第一講書(shū)人閱讀 165,474評(píng)論 0 356
  • 文/不壞的土叔 我叫張陵译隘,是天一觀的道長(zhǎng)亲桥。 經(jīng)常有香客問(wèn)我,道長(zhǎng)固耘,這世上最難降的妖魔是什么题篷? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,881評(píng)論 1 295
  • 正文 為了忘掉前任,我火速辦了婚禮厅目,結(jié)果婚禮上番枚,老公的妹妹穿的比我還像新娘法严。我一直安慰自己,他們只是感情好葫笼,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,902評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布深啤。 她就那樣靜靜地躺著,像睡著了一般路星。 火紅的嫁衣襯著肌膚如雪溯街。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書(shū)人閱讀 51,698評(píng)論 1 305
  • 那天洋丐,我揣著相機(jī)與錄音苫幢,去河邊找鬼。 笑死垫挨,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的触菜。 我是一名探鬼主播九榔,決...
    沈念sama閱讀 40,418評(píng)論 3 419
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼涡相!你這毒婦竟也來(lái)了哲泊?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 39,332評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤催蝗,失蹤者是張志新(化名)和其女友劉穎切威,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體丙号,經(jīng)...
    沈念sama閱讀 45,796評(píng)論 1 316
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡先朦,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,968評(píng)論 3 337
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了犬缨。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片喳魏。...
    茶點(diǎn)故事閱讀 40,110評(píng)論 1 351
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖怀薛,靈堂內(nèi)的尸體忽然破棺而出刺彩,到底是詐尸還是另有隱情,我是刑警寧澤枝恋,帶...
    沈念sama閱讀 35,792評(píng)論 5 346
  • 正文 年R本政府宣布创倔,位于F島的核電站,受9級(jí)特大地震影響焚碌,放射性物質(zhì)發(fā)生泄漏畦攘。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,455評(píng)論 3 331
  • 文/蒙蒙 一呐能、第九天 我趴在偏房一處隱蔽的房頂上張望念搬。 院中可真熱鬧抑堡,春花似錦、人聲如沸朗徊。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 32,003評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)爷恳。三九已至有缆,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間温亲,已是汗流浹背棚壁。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 33,130評(píng)論 1 272
  • 我被黑心中介騙來(lái)泰國(guó)打工检碗, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留漂佩,地道東北人雇盖。 一個(gè)月前我還...
    沈念sama閱讀 48,348評(píng)論 3 373
  • 正文 我出身青樓所袁,卻偏偏與公主長(zhǎng)得像戈轿,于是被迫代替她去往敵國(guó)和親脂男。 傳聞我的和親對(duì)象是個(gè)殘疾皇子卡骂,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,047評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 9,511評(píng)論 0 23
  • 在空空蕩蕩的房間里只有手機(jī)亮著光翠忠,我盯著這一塊小小的屏幕粘姜,聽(tīng)著窗外的人聲嘈雜鬓照。餓著的肚子提醒我要出去覓食了,可是我...
    柯KAI閱讀 211評(píng)論 0 0
  • 幻想成為鋼琴曲中一個(gè)個(gè)輕快的音符孤紧,悠閑地在你指間跳躍豺裆。 幻想成為山澗中那股清冽的泉水,釋?xiě)训卦诖笞匀坏挠H吻中流淌号显。...
    小編閱讀 316評(píng)論 0 0
  • 今天我做了很多事情臭猜,各種各樣的事情,你們是不是都不知道咙轩,那就讓我來(lái)為大家解開(kāi)這個(gè)謎获讳。 ...
    11小溪流連星皓閱讀 321評(píng)論 2 3