import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
def normfun(x,mu,sigma):
pdf = np.exp(-((x-mu)**2) / (2*sigma**2)) / (sigma * np.sqrt(2*np.pi))
return pdf
iq_data = pd.read_csv('IQscore.csv')
iq = iq_data['IQ']
len(iq)
70
max(iq)
140
min(iq)
69
mean = iq.mean()
std = iq.std()
x = np.arange(60,150,1)
y = normfun(x,mean,std)
plt.plot(x,y)
plt.hist(iq, bins = 10, rwidth = 0.9, normed = True)
plt.title('IQ distribution')
plt.xlabel('IQ score')
plt.ylabel('Porbability')
plt.show()
output_5_0.png
std = iq.std()
std
15.015905990389498
mean
100.82857142857142
對(duì)數(shù)據(jù)的理解:
- 該組數(shù)據(jù)平均值是100.83,標(biāo)準(zhǔn)差是15.02舆逃。
- 大部分的數(shù)據(jù)集中在85至115之間荆秦。
- 離平均值越遠(yuǎn)徐钠,數(shù)據(jù)越少癌刽,也可以理解為平均值與標(biāo)準(zhǔn)差之間的差值(mean-std)越大,數(shù)據(jù)越少丹皱。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
def normfun(x,mu,sigma):
pdf = np.exp(-((x-mu)**2) / (2*sigma**2)) / (sigma * np.sqrt(2*np.pi))
return pdf
data = pd.read_csv('stakes.csv')
time = data['time']
len(time)
89
min(time)
146.0
max(time)
153.19999999999999
mean = time.mean()
std = time.std()
x = np.arange(145,155,0.1)
y = normfun(x,mean,std)
plt.rcParams["font.family"] = "SimHei"
plt.plot(x,y)
plt.hist(time, bins = 10, rwidth = 0.9, normed = True)
plt.title('Time')
plt.xlabel(u'時(shí)間')
plt.ylabel(u'占比率')
plt.show()
output_14_0.png
mean
149.22101123595513
std
1.6278164717748154
對(duì)數(shù)據(jù)的理解:
- 讀取數(shù)據(jù)之后妒穴,通過len()函數(shù)得到整個(gè)數(shù)據(jù)的大小,對(duì)獲取的數(shù)據(jù)有一個(gè)大概的認(rèn)識(shí)摊崭。
- 通過min()/man()函數(shù)得到該組數(shù)據(jù)的最大值和最小值讼油,便于確認(rèn)數(shù)據(jù)間距。
- 通過可視化圖形和得到的平均值呢簸、標(biāo)準(zhǔn)差矮台,可以知道數(shù)據(jù)集中在147.59——150.85。
- 離平均值越遠(yuǎn)根时,數(shù)據(jù)越少瘦赫。