文章目錄:
- 題目要求--分析
- 具體實(shí)現(xiàn)--動(dòng)手
- 源碼分析--知其然,知其所以然
- 優(yōu)化--創(chuàng)新
- 總結(jié)
1. 題目要求--分析
寫一個(gè)腳本以統(tǒng)計(jì)一個(gè)文本文件 words.txt 中每個(gè)單詞出現(xiàn)的頻率鸳粉。
為了簡(jiǎn)單起見(jiàn)谴餐,你可以假設(shè):
words.txt只包括小寫字母和 ' ' ;
每個(gè)單詞只由小寫字母組成。
單詞間由一個(gè)或多個(gè)空格字符分隔加袋。
示例:
假設(shè) words.txt 內(nèi)容如下:
the day is sunny the the the sunny is is
你的腳本應(yīng)當(dāng)輸出(以詞頻降序排列):
the 4
is 3
sunny 2
day 1
2. 具體實(shí)現(xiàn)--動(dòng)手
1.輸入哥艇,讀取文件谐腰,使用 FileInputStream显设、BufferedReader讀取文件框弛;
2.分割字符串,采用StringTokenizer進(jìn)行字符分割捕捂;
3.用 HashMap 保存統(tǒng)計(jì)數(shù)據(jù)瑟枫;
4.統(tǒng)計(jì)詞頻斗搞,降序排序輸出,采用Comparator用來(lái)實(shí)現(xiàn)按value排序
5.輸出
package algorithm;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.StringTokenizer;
public class WordFrequency {
public static void main(String[] args) {
long startTime=System.nanoTime(); //獲取開(kāi)始時(shí)間
String string = "";
Map<String, Integer> map = new HashMap<String,Integer>();
try {
//[1] 讀取 2.txt 文本
FileInputStream fis = new FileInputStream("/Users/sweetgirl/Documents/MyCode/2.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String temp = "";
try {
while((temp = br.readLine()) != null) {
string = string + temp;
}
} catch (IOException e) {
// TODO: handle exception
e.printStackTrace();
}
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
//[2] 分割字符串
StringTokenizer st = new StringTokenizer(string); //用于切分字符串
int count;
String word;
while(st.hasMoreTokens()) {
word = st.nextToken(",?.!:\"\"' '\n");
if (map.containsKey(word)) {
//[3] HashMap 保存數(shù)據(jù)
count = map.get(word);
map.put(word, count + 1);
}else {
map.put(word, 1);
}
}
//[4] 排序
Comparator<Map.Entry<String, Integer>> valueComparator = new Comparator<Map.Entry<String,Integer>>() {
public int compare(Map.Entry<String, Integer> o1,Map.Entry<String, Integer> o2) {
return o2.getValue()-o1.getValue();
}
};
//[5] 輸出結(jié)果
List<Map.Entry<String, Integer>> list = new ArrayList<Map.Entry<String,Integer>>(map.entrySet());
Collections.sort(list,valueComparator);
System.out.println("---------------------map 按照 value 降序排序----------");
for(Map.Entry<String, Integer> entry:list) {
System.out.println(entry.getKey() + ":"+ entry.getValue());
}
long endTime=System.nanoTime(); //獲取結(jié)束時(shí)間
System.out.println("程序運(yùn)行時(shí)間: "+(endTime-startTime)+"ns");
}
}
測(cè)試文本:
Photo sphere panoramic camera function; keyboard gesture input function; improved lock screen function, including support for desktop pendant and direct opening camera function in lock screen state; expandable notification, allowing users to directly open the application; Gmail mail zoom display; Daydream screen saver Program; the user can zoom in on the entire display three times, and can also rotate and zoom display with two fingers, as well as voice output and gesture mode navigation designed for blind users; support Miracast wireless display sharing function; Google Now is now available Allow users to use Gamail as a new source of data, such as improved flight tracking, hotel and restaurant reservations, and music and movie recommendations.Photo sphere panoramic camera function; keyboard gesture input function; improved lock screen function, including support for desktop pendant and direct opening camera function in lock screen state; expandable notification, allowing users to directly open the application; Gmail mail zoom display; Daydream screen saver Program; the user can zoom in on the entire display three times, and can also rotate and zoom display with two fingers, as well as voice output and gesture mode navigation designed for blind users; support Miracast wireless display sharing function.
測(cè)試結(jié)果:
---------------------map 按照 value 降序排序----------
and:11
screen:6
as:6
display:6
zoom:6
the:6
function:5
function;:5
lock:4
in:4
support:4
for:4
gesture:4
can:4
camera:4
improved:3
users:3
to:3
voice:2
rotate:2
mail:2
state;:2
panoramic:2
Photo:2
entire:2
fingers:2
three:2
output:2
mode:2
notification:2
navigation:2
saver:2
users;:2
directly:2
Miracast:2
including:2
sharing:2
input:2
application;:2
Daydream:2
blind:2
display;:2
expandable:2
direct:2
pendant:2
two:2
times:2
desktop:2
sphere:2
designed:2
on:2
keyboard:2
Gmail:2
also:2
allowing:2
opening:2
with:2
Program;:2
well:2
wireless:2
user:2
open:2
Gamail:1
data:1
movie:1
use:1
available:1
source:1
tracking:1
recommendations:1
music:1
Google:1
new:1
is:1
Now:1
flight:1
now:1
of:1
hotel:1
a:1
restaurant:1
Allow:1
such:1
reservations:1
程序運(yùn)行時(shí)間: 13034950ns
3. 源碼分析--知其然慷妙,知其所以然
Hashmap [1] 存值
1 public static void main(String[] args) {
2
3 HashMap<String, Integer> map=new HashMap<>();
4 System.out.println(map.put("1", 1));//null
5 System.out.println(map.put("1", 2));//1
6 }
Hashmap [2] 取值
1 public static void main(String[] args) {
2 HashMap<String, Integer> map=new HashMap<>();
3 map.put("DEMO", 1);
4 System.out.println(map.get("1"));//null
5 System.out.println(map.get("DEMO"));//1
6 }
4. 優(yōu)化--創(chuàng)新
分割字符串 方法二:
package algorithm;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.StringTokenizer;
public class WordFrequency {
public static void main(String[] args) {
long startTime=System.nanoTime(); //獲取開(kāi)始時(shí)間
String string = "";
Map<String, Integer> map = new HashMap<String,Integer>();
try {
//[1] 讀取 2.txt 文本
FileInputStream fis = new FileInputStream("/Users/sweetgirl/Documents/MyCode/2.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String temp = "";
try {
while((temp = br.readLine()) != null) {
string = string + temp;
}
} catch (IOException e) {
// TODO: handle exception
e.printStackTrace();
}
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
//[2] 分割字符串
String[] spit = string.split(" ");
for(int i = 0; i < spit.length; i++) {
if (map.get(spit[i]) == null) {
map.put(spit[i], 1);
}else {
//[3] HashMap 保存數(shù)據(jù)
int frequency = map.get(spit[i]);
map.put(spit[i], ++frequency);
}
}
//[4] 排序
Comparator<Map.Entry<String, Integer>> valueComparator = new Comparator<Map.Entry<String,Integer>>() {
public int compare(Map.Entry<String, Integer> o1,Map.Entry<String, Integer> o2) {
return o2.getValue()-o1.getValue();
}
};
List<Map.Entry<String, Integer>> list = new ArrayList<Map.Entry<String,Integer>>(map.entrySet());
Collections.sort(list,valueComparator);
System.out.println("---------------------map 按照 value 降序排序----------");
for(Map.Entry<String, Integer> entry:list) {
System.out.println(entry.getKey() + ":"+ entry.getValue());
}
long endTime=System.nanoTime(); //獲取結(jié)束時(shí)間
System.out.println("程序運(yùn)行時(shí)間: "+(endTime-startTime)+"ns");
}
}
測(cè)試文本:
同上
測(cè)試結(jié)果:
---------------------map 按照 value 降序排序----------
and:11
screen:6
as:6
display:6
zoom:6
the:6
function;:5
lock:4
in:4
support:4
for:4
gesture:4
can:4
camera:4
improved:3
users:3
to:3
voice:2
rotate:2
mail:2
state;:2
panoramic:2
entire:2
three:2
output:2
mode:2
navigation:2
function:2
saver:2
users;:2
directly:2
Miracast:2
including:2
sharing:2
input:2
application;:2
Daydream:2
blind:2
display;:2
expandable:2
direct:2
times,:2
function,:2
pendant:2
two:2
desktop:2
sphere:2
designed:2
on:2
keyboard:2
Gmail:2
also:2
allowing:2
opening:2
with:2
fingers,:2
notification,:2
Program;:2
well:2
wireless:2
user:2
open:2
Gamail:1
movie:1
use:1
available:1
Photo:1
source:1
music:1
Google:1
new:1
is:1
Now:1
flight:1
function.:1
now:1
of:1
hotel:1
tracking,:1
a:1
recommendations.Photo:1
restaurant:1
data,:1
Allow:1
such:1
reservations,:1
程序運(yùn)行時(shí)間: 9878808ns
5. 總結(jié)
當(dāng)文本為 1 KB 時(shí)僻焚,方法一 (使用 StringTokenizer )程序運(yùn)行時(shí)間: 13034950ns;方法二 (使用 spit )程序運(yùn)行時(shí)間: 9878808ns .
StringTokenizer > spit
當(dāng)文本為 24 KB 時(shí)膝擂,方法一 (使用 StringTokenizer )程序運(yùn)行時(shí)間: 25411294ns虑啤;方法二 (使用 spit )程序運(yùn)行時(shí)間: 28782200ns .
StringTokenizer < spit
綜上可知,當(dāng)你需要統(tǒng)計(jì)詞頻的文本較大時(shí)猿挚,例如一本長(zhǎng)篇小說(shuō)咐旧,那么 StringTokenizer 的效率更高。