python數(shù)據(jù)集、機器學習脆贵、智能數(shù)據(jù)集[zz]
轉(zhuǎn):http://bbs.w3china.org/blog/more.asp?name=idmer&id=24017
大家做數(shù)據(jù)挖掘研究時医清,常常為找不到合適的數(shù)據(jù)而發(fā)愁。在KDNuggets上有Datasets欄目卖氨,提供一些數(shù)據(jù)集会烙,網(wǎng)址為:http://www.kdnuggets.com/datasets/
還有另外一個很好的資源網(wǎng)址為:http://kdd.ics.uci.edu/负懦,里面包含的數(shù)據(jù)資源如下(按應用領域劃分):
Direct Marketing
KDD CUP 1998 Data
GIS
Forest CoverType
Indexing
Corel Image Features
Pseudo Periodic Synthetic Time Series
Intrusion Detection
KDD CUP 1999 Data
Process Control
Synthetic Control Chart Time Series
Recommendation Systems
Entree Chicago Recommendation Data
Robots
Pioneer-1 Mobile Robot Data
Robot Execution Failures
Sign Language Recognition
Australian Sign Language Data
High-quality Australian Sign Language Data
Text Categorization
20 Newsgroups Data
Reuters-21578 Text Categorization Collection
NSF Research Awards Abstracts 199 0-2003
World Wide Web
Microsoft Anonymous Web Data
MSNBC Anonymous Web Data
Syskill Webert Web Data
轉(zhuǎn):http://blogger.org.cn/blog/more.asp?name=DMman&id=24043
------------------------------------------------------------------分割線------------------------------------------------------------------
DMman按:以下鏈接轉(zhuǎn)自互聯(lián)網(wǎng),鏈接的有效性與可用價值DMman沒有逐個進行測試柏腻。
1密似、氣候監(jiān)測數(shù)據(jù)集http://cdiac.ornl.gov/ftp/ndp026b
2、幾個實用的測試數(shù)據(jù)集下載的網(wǎng)站
http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的網(wǎng)址可以找到reuters數(shù)據(jù)集http://www.research.att.com/~lewis/reuters21578.html
以下網(wǎng)址上有各種數(shù)據(jù)集:
http://kdd.ics.uci.edu/summary.data.type.html
進行文本分類葫盼,還有一個數(shù)據(jù)集是可以用的残腌,即rainbow的數(shù)據(jù)集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
3、找了很多測試數(shù)據(jù)集,寫論文的同志們肯定需要的,至少能用來檢驗算法的效果
可能有一些不能訪問,但是總有能訪問的吧:
UCI收集的機器學習數(shù)據(jù)集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm
statlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
樣本數(shù)據(jù)庫
http://www.ics.uci.edu/~mlearn/MLRepository.html
關于基金的數(shù)據(jù)挖掘的網(wǎng)站
http://www.gotofund.com/index.asp
http://lans.ece.utexas.edu/~strehl/
reuters數(shù)據(jù)集
http://www.research.att.com/~lewis/reuters21578.html
各種數(shù)據(jù)集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/
進行文本分類&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html
時間序列數(shù)據(jù)的網(wǎng)址
http://www.stat.wisc.edu/~reinsel/bjr-data/
apriori算法的測試數(shù)據(jù)
http://www.almaden.ibm.com/cs/quest/syndata.html
數(shù)據(jù)生成器的鏈接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html
關聯(lián):
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData
WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1贫导。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2抛猫。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar
癌癥基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
金融數(shù)據(jù):
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
另一個人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的網(wǎng)址可以找到reuters數(shù)據(jù)集
http://www.research.att.com/~lewis/reuters21578.html
以下網(wǎng)址上有各種數(shù)據(jù)集:
http://kdd.ics.uci.edu/summary.data.type.html
進行文本分類孩灯,還有一個數(shù)據(jù)集是可以用的闺金,即rainbow的數(shù)據(jù)集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Download the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
kdnuggets 相關鏈接數(shù)據(jù)集(借花獻佛了):
http://www.kdnuggets.com/datasets/index.html
你也可以到http://blogger.org.cn/blog/more.asp?name=idmer&id=24017
察看kdnuggets 數(shù)據(jù)集資源的詳細介紹。
------------------------------------------------------------------分割線------------------------------------------------------------------
資料來源:網(wǎng)絡資料
數(shù)據(jù)挖掘相關比賽以及數(shù)據(jù)集
2005 University of California data mining contest, predicting bad accounts and their churn date using real-world CRM data, deadline June 30, 2005.
ILP 2005 Challenge, on the prediction of functional classes of genes.
KDD Cup 2005, on classifying internet user search queries, deadline July 8.
Data Mining Cup 2005 (Chemnitz, Germany), for students; topic: How data mining can ascertain the risk of loss of payments and reduce this risk.
KDD Cup 2004, focuses on data-mining for a several performance criteria using datasets from bioinformatics and quantum physics.
InfoVis 2004 Contest, The History of InfoVis.
DATA MINING CUP 2004 (Chemnitz, Germany), for students.
InfoVis 2003 Contest: Visualization and Pair Wise Comparison of Trees, results announced Sep 5, 2003.
KDD Cup 2003, focuses on problems motivated by network mining and the analysis of usage logs.
DATA MINING CUP 2003 (Chemnitz, Germany). The task is to identify spam emails before they reach the user′s mailbox.
KDD Cup 2002, focus on data mining in molecular biology.
Student Data Mining Cup (2002), Chemnitz University and Prudential Systems.
分享到微信
微信掃碼
新浪微博
QQ好友
QQ空間
標簽:python培訓python基礎教程python視頻教程python視頻python培訓視頻
var rec_url = 'http://www.douban.com/share/recommend?'
推薦1人
?2014-11-30 22:42:51python培訓(智普教育 www.jeapedu.com)
http://www.ars.usda.gov/SP2UserFiles/Place/80400525/Data/SR27/asc/NUT_DATA.txt
>刪除
2014-12-13 14:10:11python培訓(智普教育 www.jeapedu.com)
http://www.ars.usda.gov/Services/docs.htm?docid=8964
>刪除
2015-01-19 00:03:18python培訓(智普教育 www.jeapedu.com)
open source map
http://blog.sina.com.cn/s/articlelist_1592094021_0_1.html
http://blog.csdn.net/scy411082514/article/details/7471499
>刪除
2015-01-19 00:23:44python培訓(智普教育 www.jeapedu.com)
http://ladsweb.nascom.nasa.gov/data/search.html
>刪除
2015-01-19 00:26:58python培訓(智普教育 www.jeapedu.com)
http://planet.openstreetmap.org/
>刪除
2015-04-25 16:18:22python培訓(智普教育 www.jeapedu.com)
呵呵
>刪除
2015-06-24 23:01:20python培訓(智普教育 www.jeapedu.com)
http://www.rita.dot.gov/bts/data_and_statistics/by_mode/rail.html