這篇文章主要是最近整理《數(shù)據(jù)挖掘與分析》課程中的作品及課件過程中吧雹,收集了幾段比較好的代碼供大家學習骨杂。同時,做數(shù)據(jù)分析到后面吮炕,除非是研究算法創(chuàng)新的腊脱,否則越來越覺得數(shù)據(jù)非常重要访得,才是有價值的東西龙亲。后面的課程會慢慢講解Python應(yīng)用在Hadoop和Spark中,以及networkx數(shù)據(jù)科學等知識悍抑。
如果文章中存在錯誤或不足之處鳄炉,還請海涵~希望文章對你有所幫助。
采用Pandas對2002年~2014年的商品房價數(shù)據(jù)集作時間序列分析搜骡,從中抽取幾個城市與貴陽做對比拂盯,并對貴陽商品房作出分析。
數(shù)據(jù)集位32.csv记靡,具體值如下:(讀者可直接復(fù)制)
year????Beijing?Chongqing???Shenzhen????Guiyang?Kunming?Shanghai????Wuhai???Changsha??
2002????4764.00?????1556.00?????5802.00?????1643.00?????2276.00?????4134.00?????1928.00?????1802.00???
2003????4737.00?????1596.00?????6256.00?????1949.00?????2233.00?????5118.00?????2072.00?????2040.00???
2004????5020.93?????1766.24?????6756.24?????1801.68?????2473.78?????5855.00?????2516.32?????2039.09???
2005????6788.09?????2134.99?????7582.27?????2168.90?????2639.72?????6842.00?????3061.77?????2313.73???
2006????8279.51?????2269.21?????9385.34?????2372.66?????2903.32?????7196.00?????3689.64?????2644.15???
2007????11553.26????2722.58?????14049.69????2901.63?????3108.12?????8361.00?????4664.03?????3304.74???
2008????12418.00????2785.00?????12665.00????3149.00?????3750.00?????8195.00?????4781.00?????3288.00???
2009????13799.00????3442.00?????14615.00????3762.00?????3807.00?????12840.00????5329.00?????3648.00???
2010????17782.00????4281.00?????19170.00????4410.00?????3660.00?????14464.00????5746.00?????4418.00???
2011????16851.95????4733.84?????21350.13????5069.52?????4715.23?????14603.24????7192.90?????5862.39???
2012????17021.63????5079.93?????19589.82????4846.14?????5744.68?????14061.37????7344.05?????6100.87???
2013????18553.00????5569.00?????24402.00????5025.00?????5795.00?????16420.00????7717.00?????6292.00???
2014????18833.00????5519.00?????24723.00????5608.00?????6384.00?????16787.00????7951.00?????6116.00??
繪制對比各個城市的商品房價數(shù)據(jù)代碼如下所示:
#?-*-?coding:?utf-8?-*-??
"""
Created?on?Mon?Mar?06?10:55:17?2017
@author:?eastmount
"""??
import?pandas?as?pd??
data?=?pd.read_csv("32.csv",index_col='year')?#index_col用作行索引的列名???
#顯示前6行數(shù)據(jù)???
print(data.shape)????
print(data.head(6))??
import?matplotlib.pyplot?as?plt??
plt.rcParams['font.sans-serif']?=?['simHei']?#用來正常顯示中文標簽??
plt.rcParams['axes.unicode_minus']?=?False???#用來正常顯示負號??
data.plot()??
plt.savefig(u'時序圖.png',?dpi=500)??
plt.show()??
輸出如下所示:
重點知識:
1谈竿、plt.rcParams顯示中文及負號团驱;
2、plt.savefig保存圖片至本地空凸;
3嚎花、pandas直接讀取數(shù)據(jù)顯示繪制圖形,index_col獲取索引呀洲。
接著上面的實驗紊选,我們需要獲取貴陽那列數(shù)據(jù),再繪制相關(guān)圖形道逗。
#?-*-?coding:?utf-8?-*-??
"""
Created?on?Mon?Mar?06?10:55:17?2017
@author:?eastmount
"""??
import?pandas?as?pd??
data?=?pd.read_csv("32.csv",index_col='year')?#index_col用作行索引的列名???
#顯示前6行數(shù)據(jù)???
print(data.shape)????
print(data.head(6))??
import?matplotlib.pyplot?as?plt??
plt.rcParams['font.sans-serif']?=?['simHei']?#用來正常顯示中文標簽??
plt.rcParams['axes.unicode_minus']?=?False???#用來正常顯示負號??
data.plot()??
plt.savefig(u'時序圖.png',?dpi=500)??
plt.show()??
#獲取貴陽數(shù)據(jù)集并繪圖??
gy?=?data['Guiyang']??
print?u'輸出貴陽數(shù)據(jù)'??
print?gy??
gy.plot()??
plt.show()??
通過data['Guiyang']獲取某列數(shù)據(jù)兵罢,然后再進行繪制如下所示:
通過這個數(shù)據(jù)集調(diào)用bar函數(shù)可以繪制對應(yīng)的柱狀圖,如下所示滓窍,需要注意x軸位年份卖词,獲取兩列數(shù)據(jù)進行繪圖。
#?-*-?coding:?utf-8?-*-??
"""
Created?on?Mon?Mar?06?10:55:17?2017
@author:?eastmount
"""??
import?pandas?as?pd??
data?=?pd.read_csv("32.csv",index_col='year')?#index_col用作行索引的列名???
#顯示前6行數(shù)據(jù)???
print(data.shape)????
print(data.head(6))??
#獲取貴陽數(shù)據(jù)集并繪圖??
gy?=?data['Guiyang']??
print?u'輸出貴陽數(shù)據(jù)'??
print?gy??
import?numpy?as?np??
x?=?['2002','2003','2004','2005','2006','2007','2008',??
'2009','2010','2011','2012','2013','2014']??
N?=13??
ind?=?np.arange(N)#賦值0-13??
width=0.35??
plt.bar(ind,?gy,?width,?color='r',?label='sum?num')???
#設(shè)置底部名稱????
plt.xticks(ind+width/2,?x,?rotation=40)?#旋轉(zhuǎn)40度????
plt.title('The?price?of?Guiyang')????
plt.xlabel('year')????
plt.ylabel('price')????
plt.savefig('guiyang.png',dpi=400)????
plt.show()??
輸出如下圖所示:
補充一段hist繪制柱狀圖的代碼:
import?numpy?as?np??
import?pylab?as?pl??
#?make?an?array?of?random?numbers?with?a?gaussian?distribution?with??
#?mean?=?5.0??
#?rms?=?3.0??
#?number?of?points?=?1000??
data?=?np.random.normal(5.0,?3.0,?1000)??
#?make?a?histogram?of?the?data?array??
pl.hist(data,?histtype='stepfilled')?#去掉黑色輪廓??
#?make?plot?labels??
pl.xlabel('data')???
pl.show()??
輸出如下圖所示:
推薦文章:http://www.cnblogs.com/jasonfreak/p/5441512.html
三. Python繪制時間序列-自相關(guān)圖
核心代碼如下所示:
#?-*-?coding:?utf-8?-*-??
"""
Created?on?Mon?Mar?06?10:55:17?2017
@author:?yxz15
"""??
import?pandas?as?pd??
data?=?pd.read_csv("32.csv",index_col='year')??
#顯示前6行數(shù)據(jù)????
print(data.shape)????
print(data.head(6))??
import?matplotlib.pyplot?as?plt??
plt.rcParams['font.sans-serif']?=?['simHei']??
plt.rcParams['axes.unicode_minus']?=?False??
data.plot()??
plt.savefig(u'時序圖.png',?dpi=500)??
plt.show()??
from?statsmodels.graphics.tsaplots?import?plot_acf??
gy?=?data['Guiyang']??
print?gy??
plot_acf(gy).show()??
plt.savefig(u'貴陽自相關(guān)圖',dpi=300)??
from?statsmodels.tsa.stattools?import?adfuller?as?ADF??
print?'ADF:',ADF(gy)??
輸出結(jié)果如下所示:
時間序列相關(guān)文章推薦:
Python_Statsmodels包_時間序列分析_ARIMA模型
四. 聚類分析大連交易所數(shù)據(jù)集
這部分主要提供一個網(wǎng)址給大家下載數(shù)據(jù)集吏夯,前面文章說過sklearn自帶一些數(shù)據(jù)集以及UCI官網(wǎng)提供大量的數(shù)據(jù)集坏平。這里講述一個大連商品交易所的數(shù)據(jù)集。
地址:http://www.dce.com.cn/dalianshangpin/xqsj/lssj/index.html#
比如下載"焦炭"數(shù)據(jù)集锦亦,命名為"35.csv"舶替,在對其進行聚類分析。
代碼如下:
#?-*-?coding:?utf-8?-*-??
"""
Created?on?Mon?Mar?06?10:19:15?2017
@author:?yxz15
"""??
#第一部分:導(dǎo)入數(shù)據(jù)集??
import?pandas?as?pd??
Coke1?=pd.read_csv("35.csv")??
print?Coke1?[:4]??
#第二部分:聚類??
from?sklearn.cluster?import?KMeans??
clf=KMeans(n_clusters=3)??
pre=clf.fit_predict(Coke1)??
print?pre[:4]??
#第三部分:降維??
from?sklearn.decomposition?import?PCA??
pca=PCA(n_components=2)??
newData=pca.fit_transform(Coke1)??
print?newData[:4]??
x1=[n[0]?for?n?in?newData]??
x2=[n[1]?for?n?in?newData]??
#第四部分:用matplotlib包畫圖??
import?matplotlib.pyplot?as?plt??
plt.title??
plt.xlabel("x?feature")??
plt.ylabel("y?feature")??
plt.scatter(x1,x2,c=pre,?marker='x')??
plt.savefig("bankloan.png",dpi=400)??
plt.show()??
? ??出如下圖所示:
PCA降維繪圖參考這篇博客杠园。
http://blog.csdn.net/xiaolewennofollow/article/details/46127485
代碼如下:
#?-*-?coding:?utf-8?-*-??
"""
Created?on?Mon?Mar?06?21:47:46?2017
@author:?yxz
"""??
from?numpy?import?*??
def?loadDataSet(fileName,delim='\t'):??
????fr=open(fileName)??
stringArr=[line.strip().split(delim)for?line?in?fr.readlines()]??
datArr=[map(float,line)for?line?in?stringArr]??
return?mat(datArr)??
def?pca(dataMat,topNfeat=9999999):??
meanVals=mean(dataMat,axis=0)??
????meanRemoved=dataMat-meanVals??
covMat=cov(meanRemoved,rowvar=0)??
????eigVals,eigVets=linalg.eig(mat(covMat))??
????eigValInd=argsort(eigVals)??
eigValInd=eigValInd[:-(topNfeat+1):-1]??
????redEigVects=eigVets[:,eigValInd]??
print?meanRemoved??
print?redEigVects??
????lowDDatMat=meanRemoved*redEigVects??
????reconMat=(lowDDatMat*redEigVects.T)+meanVals??
return?lowDDatMat,reconMat??
dataMat=loadDataSet('41.txt')??
lowDMat,reconMat=pca(dataMat,1)??
def?plotPCA(dataMat,reconMat):??
import?matplotlib??
import?matplotlib.pyplot?as?plt??
????datArr=array(dataMat)??
????reconArr=array(reconMat)??
n1=shape(datArr)[0]??
n2=shape(reconArr)[0]??
????xcord1=[];ycord1=[]??
????xcord2=[];ycord2=[]??
for?i?in?range(n1):??
xcord1.append(datArr[i,0]);ycord1.append(datArr[i,1])??
for?i?in?range(n2):??
xcord2.append(reconArr[i,0]);ycord2.append(reconArr[i,1])??
????fig=plt.figure()??
ax=fig.add_subplot(111)??
ax.scatter(xcord1,ycord1,s=90,c='red',marker='^')??
ax.scatter(xcord2,ycord2,s=50,c='yellow',marker='o')??
plt.title('PCA')??
plt.savefig('ccc.png',dpi=400)??
????plt.show()??
plotPCA(dataMat,reconMat)??
輸出結(jié)果如下圖所示:
采用PCA方法對數(shù)據(jù)集進行降維操作漠趁,即將紅色三角形數(shù)據(jù)降維至黃色直線上,一個平面降低成一條直線废恋。PCA的本質(zhì)就是對角化協(xié)方差矩陣辩恼,對一個n*n的對稱矩陣進行分解,然后把矩陣投影到這N個基上瞧甩。
數(shù)據(jù)集為41.txt钉跷,值如下:
61.5????55??
59.8????61??
56.9????65??
62.4????58??
63.3????58??
62.8????57??
62.3????57??
61.9????55??
65.1????61??
59.4????61??
64??55??
62.8????56??
60.4????61??
62.2????54??
60.2????62??
60.9????58??
62??54??
63.4????54??
63.8????56??
62.7????59??
63.3????56??
63.8????55??
61??57??
59.4????62??
58.1????62??
60.4????58??
62.5????57??
62.2????57??
60.5????61??
60.9????57??
60??57??
59.8????57??
60.7????59??
59.5????58??
61.9????58??
58.2????59??
64.1????59??
64??54??
60.8????59??
61.8????55??
61.2????56??
61.1????56??
65.2????56??
58.4????63??
63.1????56??
62.4????58??
61.8????55??
63.8????56??
63.3????60??
60.7????60??
60.9????61??
61.9????54??
60.9????55??
61.6????58??
59.3????62??
61??59??
59.3????61??
62.6????57??
63??57??
63.2????55??
60.9????57??
62.6????59??
62.5????57??
62.1????56??
61.5????59??
61.4????56??
62??55.3??
63.3????57??
61.8????58??
60.7????58??
61.5????60??
63.1????56??
62.9????59??
62.5????57??
63.7????57??
59.2????60??
59.9????58??
62.4????54??
62.8????60??
62.6????59??
63.4????59??
62.1????60??
62.9????58??
61.6????56??
57.9????60??
62.3????59??
61.2????58??
60.8????59??
60.7????58??
62.9????58??
62.5????57??
55.1????69??
61.6????56??
62.4????57??
63.8????56??
57.5????58??
59.4????62??
66.3????62??
61.6????59??
61.5????58??
63.2????56??
59.9????54??
61.6????55??
61.7????58??
62.9????56??
62.2????55??
63??59??
62.3????55??
58.8????57??
62??55??
61.4????57??
62.2????56??
63??58??
62.2????59??
62.6????56??
62.7????53??
61.7????58??
62.4????54??
60.7????58??
59.9????59??
62.3????56??
62.3????54??
61.7????63??
64.5????57??
65.3????55??
61.6????60??
61.4????56??
59.6????57??
64.4????57??
65.7????60??
62??56??
63.6????58??
61.9????59??
62.6????60??
61.3????60??
60.9????60??
60.1????62??
61.8????59??
61.2????57??
61.9????56??
60.9????57??
59.8????56??
61.8????55??
60??57??
61.6????55??
62.1????64??
63.3????59??
60.2????56??
61.1????58??
60.9????57??
61.7????59??
61.3????56??
62.5????60??
61.4????59??
62.9????57??
62.4????57??
60.7????56??
60.7????58??
61.5????58??
59.9????57??
59.2????59??
60.3????56??
61.7????60??
61.9????57??
61.9????55??
60.4????59??
61??57??
61.5????55??
61.7????56??
59.2????61??
61.3????56??
58??62??
60.2????61??
61.7????55??
62.7????55??
64.6????54??
61.3????61??
63.7????56.4??
62.7????58??
62.2????57??
61.6????56??
61.5????57??
61.8????56??
60.7????56??
59.7????60.5??
60.5????56??
62.7????58??
62.1????58??
62.8????57??
63.8????58??
57.8????60??
62.1????55??
61.1????60??
60??59??
61.2????57??
62.7????59??
61??57??
61??58??
61.4????57??
61.8????61??
59.9????63??
61.3????58??
60.5????58??
64.1????59??
67.9????60??
62.4????58??
63.2????60??
61.3????55??
60.8????56??
61.7????56??
63.6????57??
61.2????58??
62.1????54??
61.5????55??
61.4????59??
61.8????60??
62.2????56??
61.2????56??
60.6????63??
57.5????64??
61.3????56??
57.2????62??
62.9????60??
63.1????58??
60.8????57??
62.7????59??
62.8????60??
55.1????67??
61.4????59??
62.2????55??
63??54??
63.7????56??
63.6????58??
62??57??
61.5????56??
60.5????60??
61.1????60??
61.8????56??
63.3????56??
59.4????64??
62.5????55??
64.5????58??
62.7????59??
64.2????52??
63.7????54??
60.4????58??
61.8????58??
63.2????56??
61.6????56??
61.6????56??
60.9????57??
61??61??
62.1????57??
60.9????60??
61.3????60??
65.8????59??
61.3????56??
58.8????59??
62.3????55??
60.1????62??
61.8????59??
63.6????55.8??
62.2????56??
59.2????59??
61.8????59??
61.3????55??
62.1????60??
60.7????60??
59.6????57??
62.2????56??
60.6????57??
62.9????57??
64.1????55??
61.3????56??
62.7????55??
63.2????56??
60.7????56??
61.9????60??
62.6????55??
60.7????60??
62??60??
63??57??
58??59??
62.9????57??
58.2????60??
63.2????58??
61.3????59??
60.3????60??
62.7????60??
61.3????58??
61.6????60??
61.9????55??
61.7????56??
61.9????58??
61.8????58??
61.6????56??
58.8????66??
61??57??
67.4????60??
63.4????60??
61.5????59??
58??62??
62.4????54??
61.9????57??
61.6????56??
62.2????59??
62.2????58??
61.3????56??
62.3????57??
61.8????57??
62.5????59??
62.9????60??
61.8????59??
62.3????56??
59??70??
60.7????55??
62.5????55??
62.7????58??
60.4????57??
62.1????58??
57.8????60??
63.8????58??
62.8????57??
62.2????58??
62.3????58??
59.9????58??
61.9????54??
63??55??
62.4????58??
62.9????58??
63.5????56??
61.3????56??
60.6????54??
65.1????58??
62.6????58??
58??62??
62.4????61??
61.3????57??
59.9????60??
60.8????58??
63.5????55??
62.2????57??
63.8????58??
64??57??
62.5????56??
62.3????58??
61.7????57??
62.2????58??
61.5????56??
61??59??
62.2????56??
61.5????54??
67.3????59??
61.7????58??
61.9????56??
61.8????58??
58.7????66??
62.5????57??
62.8????56??
61.1????68??
64??57??
62.5????60??
60.6????58??
61.6????55??
62.2????58??
60??57??
61.9????57??
62.8????57??
62??57??
66.4????59??
63.4????56??
60.9????56??
63.1????57??
63.1????59??
59.2????57??
60.7????54??
64.6????56??
61.8????56??
59.9????60??
61.7????55??
62.8????61??
62.7????57??
63.4????58??
63.5????54??
65.7????59??
68.1????56??
63??60??
59.5????58??
63.5????59??
61.7????58??
62.7????58??
62.8????58??
62.4????57??
61??59??
63.1????56??
60.7????57??
60.9????59??
60.1????55??
62.9????58??
63.3????56??
63.8????55??
62.9????57??
63.4????60??
63.9????55??
61.4????56??
61.9????55??
62.4????55??
61.8????58??
61.5????56??
60.4????57??
61.8????55??
62??56??
62.3????56??
61.6????56??
60.6????56??
58.4????62??
61.4????58??
61.9????56??
62??56??
61.5????57??
62.3????58??
60.9????61??
62.4????57??
55??61??
58.6????60??
62??57??
59.8????58??
63.4????55??
64.3????58??
62.2????59??
61.7????57??
61.1????59??
61.5????56??
58.5????62??
61.7????58??
60.4????56??
61.4????56??
61.5????55??
61.4????56??
65??56??
56??60??
60.2????59??
58.3????58??
53.1????63??
60.3????58??
61.4????56??
60.1????57??
63.4????55??
61.5????59??
62.7????56??
62.5????55??
61.3????56??
60.2????56??
62.7????57??
62.3????58??
61.5????56??
59.2????59??
61.8????59??
61.3????55??
61.4????58??
62.8????55??
62.8????64??
62.4????61??
59.3????60??
63??60??
61.3????60??
59.3????62??
61??57??
62.9????57??
59.6????57??
61.8????60??
62.7????57??
65.3????62??
63.8????58??
62.3????56??
59.7????63??
64.3????60??
62.9????58??
62??57??
61.6????59??
61.9????55??
61.3????58??
63.6????57??
59.6????61??
62.2????59??
61.7????55??
63.2????58??
60.8????60??
60.3????59??
60.9????60??
62.4????59??
60.2????60??
62??55??
60.8????57??
62.1????55??
62.7????60??
61.3????58??
60.2????60??
60.7????56
原文參考:http://blog.csdn.net/eastmount/article/details/60675865