https://github.com/HIT-SCIR/pyltp
http://ltp.readthedocs.io/zh_CN/latest/
http://blog.csdn.net/churximi/article/details/51174182
http://www.cnblogs.com/ybf-yyj/p/7658571.html
http://blog.csdn.net/churximi/article/details/51174182
http://www.cnblogs.com/anderslly/p/jiebanet.html
https://www.codeproject.com/Articles/32201/Lucene-Net-Custom-Synonym-Analyzer
https://github.com/linezero/jieba.NET
https://github.com/anderscui/jieba.NET
https://github.com/chapzq77/LTP_Python_Interface
https://github.com/NLPchina/nlp-lang
https://github.com/NLPchina/ansj_seg
http://www.nlpcn.org/resource/list/4
https://github.com/sing1ee/jieba-solr
https://www.nuget.org/packages/jieba.NET
https://python.libhunt.com/project/snownlp/vs/jieba
https://github.com/FudanNLP/fnlp
https://github.com/hankcs/HanLP/
https://github.com/crownpku/awesome-chinese-nlp
https://www.codeproject.com/Articles/32175/Lucene-Net-Text-Analysis
https://github.com/apache/lucenenet
https://github.com/JimLiu/Lucene.Net.Analysis.PanGu
https://github.com/LonghronShen/OurAspNet.Lucene.Net.Analysis.PanGu
1.Chinese NLP Toolkits 中文NLP工具
Popular NLP Toolkits for English/Multi-Language 常用的英文或支持多語言的NLP工具包
3.Organizations 相關(guān)中文NLP組織和會(huì)議
4.Learning Materials 學(xué)習(xí)資料
THULAC 中文詞法分析工具包by 清華 (C++/Java/Python)
NLPIRby 中科院 (Java)
LTP 語言技術(shù)平臺(tái)by 哈工大 (C++)
FudanNLPby 復(fù)旦 (Java)
BosonNLPby Boson (商業(yè)API服務(wù))
HanNLP(Java)
SnowNLP(Python) Python library for processing Chinese text
YaYaNLP(Python) 純python編寫的中文自然語言處理包捏悬,取名于“牙牙學(xué)語”
DeepNLP(Python) Deep Learning NLP Pipeline implemented on Tensorflow with pretrained Chinese models.
chinese_nlp(C++ & Python) Chinese Natural Language Processing tools and examples
Chinese-Annotator(Python) Annotator for Chinese Text Corpus 中文文本標(biāo)注工具
Popular NLP Toolkits for English/Multi-Language 常用的英文或支持多語言的NLP工具包
CoreNLPby Stanford (Java)
NLTK(Python)
spaCy(Python)
OpenNLP(Java)
gensim(Python) Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora.
Jieba 結(jié)巴中文分詞(Python) 做最好的 Python 中文分詞組件
kcws 深度學(xué)習(xí)中文分詞(Python) BiLSTM+CRF與IDCNN+CRF
ID-CNN-CWS(Python) Iterated Dilated Convolutions for Chinese Word Segmentation
Genius 中文分詞(Python) Genius是一個(gè)開源的python中文分詞組件游添,采用 CRF(Conditional Random Field)條件隨機(jī)場算法。
loso 中文分詞(Python)
MITIE(C++) library and tools for information extraction
Duckling(Haskell) Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
IEPY(Python) IEPY is an open source tool for Information Extraction focused on Relation Extraction.
Snorkel: A training data creation and management system focused on information extraction
Neural Relation Extraction implemented with LSTM in TensorFlow
A neural network model for Chinese named entity recognition
Information-Extraction-ChineseChinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文實(shí)體識(shí)別與關(guān)系提取
Rasa NLU(Python) turn natural language into structured data
Rasa Core(Python) machine learning based dialogue engine for conversational software
Chatterbot(Python) ChatterBot is a machine learning, conversational dialog engine for creating chat bots.
Chatbot(Python) 基於向量匹配的情境式聊天機(jī)器人
Tipask(PHP) 一款開放源碼的PHP問答系統(tǒng),基于Laravel框架開發(fā)宛徊,容易擴(kuò)展歧杏,具有強(qiáng)大的負(fù)載能力和穩(wěn)定性。
QuestionAnsweringSystem(Java) 一個(gè)Java實(shí)現(xiàn)的人機(jī)問答系統(tǒng)影晓,能夠自動(dòng)分析問題并給出候選答案镰吵。
使用TensorFlow實(shí)現(xiàn)的Sequence to Sequence的聊天機(jī)器人模型(Python)
使用深度學(xué)習(xí)算法實(shí)現(xiàn)的中文閱讀理解問答系統(tǒng)(Python)
DuReader中文閱讀理解Baseline代碼(Python)
大規(guī)模中文概念圖譜CN-Probase公眾號(hào)介紹
98年人民日?qǐng)?bào)詞性標(biāo)注庫@百度盤
百度百科100gb語料@百度盤密碼neqs 出處應(yīng)該是梁斌penny大神
UDChinese(for training spaCy POS)
中文word2vec模型之維基百科中文使用2017年6月20日中文維基百科語料訓(xùn)練的腳本和模型文件。
Synonyms:中文近義詞工具包基于維基百科中文和word2vec訓(xùn)練的近義詞庫挂签,封裝為python包文件疤祭。
Chinese_conversation_sentimentA Chinese sentiment dataset may be useful for sentiment analysis.
中文突發(fā)事件語料庫Chinese Emergency Corpus
dgk_lost_conv 中文對(duì)白語料chinese conversation corpus
用于訓(xùn)練中英文對(duì)話系統(tǒng)的語料庫Datasets for Training Chatbot System
中國股市公告信息爬取通過python腳本從巨潮網(wǎng)絡(luò)的服務(wù)器獲取中國股市(sz,sh)的公告(上市公司和監(jiān)管機(jī)構(gòu))
tushare財(cái)經(jīng)數(shù)據(jù)接口TuShare是一個(gè)免費(fèi)、開源的python財(cái)經(jīng)數(shù)據(jù)接口包饵婆。
保險(xiǎn)行業(yè)語料庫[52nlp介紹Blog] OpenData in insurance area for Machine Learning Tasks
最全中華古詩詞數(shù)據(jù)庫唐宋兩朝近一萬四千古詩人, 接近5.5萬首唐詩加26萬宋詩. 兩宋時(shí)期1564位詞人勺馆,21050首詞。
中文語料小數(shù)據(jù)包含了中文命名實(shí)體識(shí)別侨核、中文關(guān)系識(shí)別草穆、中文閱讀理解等一些小量數(shù)據(jù)
中文人名語料庫中文姓名,姓氏,名字,稱呼,日本人名,翻譯人名,英文人名。
中文數(shù)據(jù)預(yù)處理材料中文分詞詞典和中文停用詞
Organizations 相關(guān)中文NLP組織和會(huì)議
NLP Conference CalenderMain conferences, journals, workshops and shared tasks in NLP community.
Learning Materials 學(xué)習(xí)資料
Stanford CS224n Natural Language Processing with Deep Learning 2017
Speech and Language Processingby Dan Jurafsky and James H. Martin
文本處理實(shí)踐課資料文本處理實(shí)踐課資料搓译,包含文本特征提缺(TF-IDF),文本分類侥衬,文本聚類诗祸,word2vec訓(xùn)練詞向量及同義詞詞林中文詞語相似度計(jì)算、文檔自動(dòng)摘要轴总,信息抽取直颅,情感分析與觀點(diǎn)挖掘等實(shí)驗(yàn)。
https://github.com/crownpku/Awesome-Chinese-NLP
<自已動(dòng)手構(gòu)造編譯系統(tǒng)》GCC
https://github.com/fanzhidongyzby/cit/
https://code.google.com/archive/p/redis/#!
https://github.com/antirez/redis/
https://github.com/rabbitmq/rabbitmq-dotnet-client
https://www.microsoft.com/en-us/cognitive-toolkit/
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.vstoolsai-vs2015
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.vstoolsai-vs2017
https://docs.microsoft.com/en-us/cognitive-toolkit/setup-cntk-on-your-machine
https://www.microsoft.com/en-us/cognitive-toolkit/features/model-gallery/
https://github.com/Microsoft/CNTK
https://github.com/migueldeicaza/TensorFlowSharp
http://www.csharpkit.com/2017-10-15_55288.html
https://github.com/Microsoft/vs-tools-for-ai
作者:readilen
鏈接:http://www.reibang.com/p/f678372b0444
來源:簡書
簡書著作權(quán)歸作者所有怀樟,任何形式的轉(zhuǎn)載都請(qǐng)聯(lián)系作者獲得授權(quán)并注明出處功偿。