使用方法
from nltk.stem.porter import PorterStemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem import SnowballStemmer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
porter_stemmer = PorterStemmer()
lancaster_stemmer = LancasterStemmer()
snowball_stemmer = SnowballStemmer('english')
wordnet_lemmatizer = WordNetLemmatizer()
words = [('bottles', wordnet.NOUN), ('vases', wordnet.NOUN), ('lit', wordnet.VERB), ('said', wordnet.VERB), ('earlier', wordnet.ADJ)]
for word_tuple in words:
word = word_tuple[0]
pos = word_tuple[1]
porter_stemmer.stem(word) # output: 'bottl', 'vase', 'lit', 'said', 'earlier'
lancaster_stemmer.stem(word) # output: 'bottl', 'vas', 'lit', 'said', 'ear'
snowball_stemmer.stem(word) # output: 'bottl', 'vase', 'lit', 'said', 'earlier'
wordnet_lemmatizer.lemmatize(word) # output: 'bottle', 'vas', 'lit', 'said', 'earlier'
wordnet_lemmatizer.lemmatize(word, pos=pos) # output: 'bottle', 'vas', 'light', 'say', 'early'
結(jié)論
僅由上例可見(jiàn),在有詞性的情況下,WordNetLemmatizer獲取英語(yǔ)單詞原形的效果要更好酥宴。
[注] 詞形還原工具對(duì)比