含義描述
collapse 將變量數(shù)據(jù)轉(zhuǎn)換為均值蛇券、和、中位數(shù)等等猪杭。clist 必須為數(shù)字變量餐塘。
語法與選項(xiàng)
collapse clist [if] [in] [weight] [, options]
where clist is either
[(stat)] varlist [ [(stat)] ... ]
[(stat)] target_var=varname [target_var=varname ...] [ [(stat)] ...]
or any combination of the varlist or target_var forms, and stat is one of
mean means (default) //默認(rèn)為均值
median medians //中位數(shù)
p1 1st percentile
p2 2nd percentile
... 3rd-49th percentiles
p50 50th percentile (same as median) //第50分位點(diǎn)
... 51st-97th percentiles
p98 98th percentile
p99 99th percentile
sd standard deviations //標(biāo)準(zhǔn)差
semean standard error of the mean (sd/sqrt(n)) //平均值的標(biāo)準(zhǔn)誤
sebinomial standard error of the mean, binomial (sqrt(p(1-p)/n))
sepoisson standard error of the mean, Poisson (sqrt(mean))
sum sums //求和
rawsum sums, ignoring optionally specified weight except observations with a weight of zero are excluded
count number of nonmissing observations //非缺失觀測數(shù)
percent percentage of nonmissing observations //非缺失觀測數(shù)百分比
max maximums //最大值
min minimums //最小值
iqr interquartile range //四分位范圍
first first value // 第一個(gè)值
last last value //最后一個(gè)值
firstnm first nonmissing value //第一個(gè)非缺失值
lastnm last nonmissing value //最后一個(gè)非缺失值
如果未指定stat,則假定為平均值皂吮。 means (default)
選項(xiàng) | 功能 |
---|---|
by(varlist) | 用來按某變量分類計(jì)算統(tǒng)計(jì)量的值唠倦。可以是一個(gè)涮较,也可以是多個(gè)稠鼻。 |
cw | 刪除含有缺失值的觀測值。 |
fast) | 如果用戶按Break鍵狂票,則不要還原原始數(shù)據(jù)集候齿;編程時(shí)用的命令,一般人不使用闺属,可以忽視慌盯。 |
例子
use https://www.stata-press.com/data/r16/college,clear
list, sep(4)
統(tǒng)計(jì)出每個(gè)年級(jí)的平均績點(diǎn)(gap)
collapse (mean) gpa , by(year)
list
統(tǒng)計(jì)出每個(gè)年級(jí)的平均績點(diǎn)(gap),并命名為mean_gpa
use https://www.stata-press.com/data/r16/college,clear
collapse (mean) mean_gpa=gpa, by(year)
list
統(tǒng)計(jì)出每個(gè)年級(jí)的平均績點(diǎn)(gap)與學(xué)習(xí)時(shí)間(hour)
use https://www.stata-press.com/data/r16/college,clear
collapse (mean) gpa hour, by(year)
list
這里考慮權(quán)重掂器,權(quán)重等于年級(jí)人數(shù) [fw=number]亚皂。collapse 允許四種權(quán)重類型;默認(rèn)值為aweights国瓮。權(quán)重標(biāo)準(zhǔn)化只影響總和灭必、計(jì)數(shù)、方差乃摹,標(biāo)準(zhǔn)誤和 sebinomia l統(tǒng)計(jì)禁漓。
use https://www.stata-press.com/data/r16/college,clear
collapse (mean) gpa [fw=number], by(year)
list
當(dāng)變量中存在缺失值時(shí),使用cw選項(xiàng)會(huì)將存在缺失值的該行觀測值全部刪除孵睬,因此得到的統(tǒng)計(jì)量的值均為刪除這些行以后計(jì)算得到的播歼;若不使用該選項(xiàng),則只影響含有缺失值的變量的統(tǒng)計(jì)量掰读。仍以college數(shù)據(jù)為例秘狞,我們將2-4行的gpa數(shù)據(jù)替換為缺失值:
use https://www.stata-press.com/data/r16/college,clear
replace gpa= . in 2/4
list in 1/5
而后對(duì)gpa和hour按年級(jí)求均值叭莫,首先來看一下不使用cw選項(xiàng)時(shí)的結(jié)果,程序如下:
collapse (mean) gpa hour , by(year)
list
對(duì)比前面沒有缺失值時(shí)所得的均值烁试,由于2-4行均屬于一年級(jí)食寡,因而在以年級(jí)分類計(jì)算均值時(shí),只有一年級(jí)的gpa受到了影響廓潜,hour的均值與前面一致。如果使用cw選項(xiàng):
use https://www.stata-press.com/data/r16/college,clear
replace gpa= . in 2/4
collapse (mean) gpa hour, by(year) cw
list
使用cw后善榛,由于一年級(jí)的數(shù)據(jù)只剩下第一行辩蛋,其余行全部刪掉,因而一年級(jí)的hour變量的均值也發(fā)生了變化移盆,在計(jì)算時(shí)只對(duì)原數(shù)據(jù)的第一行進(jìn)行了平均悼院。
參考資料:
本文的例子來源于微信公眾號(hào):Stata and Python數(shù)據(jù)分析
利用collapse命令轉(zhuǎn)化原始數(shù)據(jù)