https://github.com/HIT-SCIR/pyltp
http://ltp.readthedocs.io/zh_CN/latest/
http://blog.csdn.net/churximi/article/details/51174182
http://www.cnblogs.com/ybf-yyj/p/7658571.html
http://blog.csdn.net/churximi/article/details/51174182
http://www.cnblogs.com/anderslly/p/jiebanet.html
https://www.codeproject.com/Articles/32201/Lucene-Net-Custom-Synonym-Analyzer
https://github.com/linezero/jieba.NET
https://github.com/anderscui/jieba.NET
https://github.com/chapzq77/LTP_Python_Interface
https://github.com/NLPchina/nlp-lang
https://github.com/NLPchina/ansj_seg
http://www.nlpcn.org/resource/list/4
https://github.com/sing1ee/jieba-solr
https://www.nuget.org/packages/jieba.NET
https://python.libhunt.com/project/snownlp/vs/jieba
https://github.com/FudanNLP/fnlp
https://github.com/hankcs/HanLP/
https://github.com/crownpku/awesome-chinese-nlp
https://www.codeproject.com/Articles/32175/Lucene-Net-Text-Analysis
https://github.com/apache/lucenenet
https://github.com/JimLiu/Lucene.Net.Analysis.PanGu
https://github.com/LonghronShen/OurAspNet.Lucene.Net.Analysis.PanGu
1. Chinese NLP Toolkits 中文NLP工具
-
Toolkits 綜合NLP工具包
-
Popular NLP Toolkits for English/Multi-Language 常用的英文或支持多語(yǔ)言的NLP工具包
-
Chinese Word Segment 中文分詞
-
Information Extraction 信息提取
-
QA & Chatbot 問(wèn)答和聊天機(jī)器人
2. Corpus 中文語(yǔ)料
3. Organizations 相關(guān)中文NLP組織和會(huì)議
4. Learning Materials 學(xué)習(xí)資料
Chinese NLP Toolkits 中文NLP工具
Toolkits 綜合NLP工具包
THULAC 中文詞法分析工具包 by 清華 (C++/Java/Python)
NLPIR by 中科院 (Java)
LTP 語(yǔ)言技術(shù)平臺(tái) by 哈工大 (C++)
FudanNLP by 復(fù)旦 (Java)
BosonNLP by Boson (商業(yè)API服務(wù))
HanNLP (Java)
SnowNLP (Python) Python library for processing Chinese text
YaYaNLP (Python) 純python編寫的中文自然語(yǔ)言處理包,取名于“牙牙學(xué)語(yǔ)”
DeepNLP (Python) Deep Learning NLP Pipeline implemented on Tensorflow with pretrained Chinese models.
chinese_nlp (C++ & Python) Chinese Natural Language Processing tools and examples
Chinese-Annotator (Python) Annotator for Chinese Text Corpus 中文文本標(biāo)注工具
Popular NLP Toolkits for English/Multi-Language 常用的英文或支持多語(yǔ)言的NLP工具包
CoreNLP by Stanford (Java)
NLTK (Python)
spaCy (Python)
OpenNLP (Java)
gensim (Python) Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora.
Chinese Word Segment 中文分詞
Jieba 結(jié)巴中文分詞 (Python) 做最好的 Python 中文分詞組件
kcws 深度學(xué)習(xí)中文分詞 (Python) BiLSTM+CRF與IDCNN+CRF
ID-CNN-CWS (Python) Iterated Dilated Convolutions for Chinese Word Segmentation
Genius 中文分詞 (Python) Genius是一個(gè)開(kāi)源的python中文分詞組件朱庆,采用 CRF(Conditional Random Field)條件隨機(jī)場(chǎng)算法资盅。
loso 中文分詞 (Python)
Information Extraction 信息提取
MITIE (C++) library and tools for information extraction
Duckling (Haskell) Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
IEPY (Python) IEPY is an open source tool for Information Extraction focused on Relation Extraction.
Snorkel: A training data creation and management system focused on information extraction
Neural Relation Extraction implemented with LSTM in TensorFlow
Information-Extraction-Chinese Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文實(shí)體識(shí)別與關(guān)系提取
QA & Chatbot 問(wèn)答和聊天機(jī)器人
Rasa NLU (Python) turn natural language into structured data
Rasa Core (Python) machine learning based dialogue engine for conversational software
Chatterbot (Python) ChatterBot is a machine learning, conversational dialog engine for creating chat bots.
Chatbot (Python) 基於向量匹配的情境式聊天機(jī)器人
Tipask (PHP) 一款開(kāi)放源碼的PHP問(wèn)答系統(tǒng)恕出,基于Laravel框架開(kāi)發(fā),容易擴(kuò)展土铺,具有強(qiáng)大的負(fù)載能力和穩(wěn)定性。
QuestionAnsweringSystem (Java) 一個(gè)Java實(shí)現(xiàn)的人機(jī)問(wèn)答系統(tǒng),能夠自動(dòng)分析問(wèn)題并給出候選答案理郑。
使用TensorFlow實(shí)現(xiàn)的Sequence to Sequence的聊天機(jī)器人模型 (Python)
使用深度學(xué)習(xí)算法實(shí)現(xiàn)的中文閱讀理解問(wèn)答系統(tǒng) (Python)
DuReader中文閱讀理解Baseline代碼 (Python)
Corpus 中文語(yǔ)料
百度百科100gb語(yǔ)料@百度盤 密碼neqs 出處應(yīng)該是梁斌penny大神
UDChinese (for training spaCy POS)
中文word2vec模型之維基百科中文 使用2017年6月20日中文維基百科語(yǔ)料訓(xùn)練的腳本和模型文件。
Synonyms:中文近義詞工具包 基于維基百科中文和word2vec訓(xùn)練的近義詞庫(kù)咨油,封裝為python包文件您炉。
Chinese_conversation_sentiment A Chinese sentiment dataset may be useful for sentiment analysis.
中文突發(fā)事件語(yǔ)料庫(kù) Chinese Emergency Corpus
dgk_lost_conv 中文對(duì)白語(yǔ)料 chinese conversation corpus
用于訓(xùn)練中英文對(duì)話系統(tǒng)的語(yǔ)料庫(kù) Datasets for Training Chatbot System
中國(guó)股市公告信息爬取 通過(guò)python腳本從巨潮網(wǎng)絡(luò)的服務(wù)器獲取中國(guó)股市(sz,sh)的公告(上市公司和監(jiān)管機(jī)構(gòu))
tushare財(cái)經(jīng)數(shù)據(jù)接口 TuShare是一個(gè)免費(fèi)、開(kāi)源的python財(cái)經(jīng)數(shù)據(jù)接口包役电。
保險(xiǎn)行業(yè)語(yǔ)料庫(kù) [52nlp介紹Blog] OpenData in insurance area for Machine Learning Tasks
最全中華古詩(shī)詞數(shù)據(jù)庫(kù) 唐宋兩朝近一萬(wàn)四千古詩(shī)人, 接近5.5萬(wàn)首唐詩(shī)加26萬(wàn)宋詩(shī). 兩宋時(shí)期1564位詞人赚爵,21050首詞。
中文語(yǔ)料小數(shù)據(jù) 包含了中文命名實(shí)體識(shí)別法瑟、中文關(guān)系識(shí)別冀膝、中文閱讀理解等一些小量數(shù)據(jù)
中文人名語(yǔ)料庫(kù) 中文姓名,姓氏,名字,稱呼,日本人名,翻譯人名,英文人名。
中文數(shù)據(jù)預(yù)處理材料 中文分詞詞典和中文停用詞
Organizations 相關(guān)中文NLP組織和會(huì)議
NLP Conference Calender Main conferences, journals, workshops and shared tasks in NLP community.
Learning Materials 學(xué)習(xí)資料
Stanford CS224n Natural Language Processing with Deep Learning 2017
Speech and Language Processing by Dan Jurafsky and James H. Martin
文本處理實(shí)踐課資料 文本處理實(shí)踐課資料霎挟,包含文本特征提任哑省(TF-IDF),文本分類酥夭,文本聚類赐纱,word2vec訓(xùn)練詞向量及同義詞詞林中文詞語(yǔ)相似度計(jì)算、文檔自動(dòng)摘要熬北,信息抽取疙描,情感分析與觀點(diǎn)挖掘等實(shí)驗(yàn)。
https://github.com/crownpku/Awesome-Chinese-NLP
<自已動(dòng)手構(gòu)造編譯系統(tǒng)》GCC
https://github.com/fanzhidongyzby/cit/
https://code.google.com/archive/p/redis/#!
https://github.com/antirez/redis/
https://github.com/rabbitmq/rabbitmq-dotnet-client
https://www.microsoft.com/en-us/cognitive-toolkit/
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.vstoolsai-vs2015
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.vstoolsai-vs2017
https://docs.microsoft.com/en-us/cognitive-toolkit/setup-cntk-on-your-machine
https://www.microsoft.com/en-us/cognitive-toolkit/features/model-gallery/
https://github.com/Microsoft/CNTK
https://github.com/migueldeicaza/TensorFlowSharp