1 tidyverse系統(tǒng)
https://www.math.pku.edu.cn/teachers/lidf/docs/Rbook/html/_Rbook/summary-manip.html#summm-tidyv
(完整版)
載入tidyverse
包卿操, 則magrittr包枚驻,readr包,dplyr包和tidyr包都會被自動載入:
library(tidyverse)
下面的例子中用如下的一個(gè)班的學(xué)生數(shù)據(jù)作為例子甫何, 保存在如下class.csv
文件中:
name,sex,age,height,weight
Alice,F,13,56.5,84
Becka,F,13,65.3,98
Gail,F,14,64.3,90
Karen,F,12,56.3,77
Kathy,F,12,59.8,84.5
Mary,F,15,66.5,112
Sandy,F,11,51.3,50.5
Sharon,F,15,62.5,112.5
Tammy,F,14,62.8,102.5
Alfred,M,14,69,112.5
Duke,M,14,63.5,102.5
Guido,M,15,67,133
James,M,12,57.3,83
Jeffrey,M,13,62.5,84
John,M,12,59,99.5
Philip,M,16,72,150
Robert,M,12,64.8,128
Thomas,M,11,57.5,85
William,M,15,66.5,112
讀入為tibble:
d.class <- read_csv(
"class.csv",
col_types=cols(
.default = col_double(),
name=col_character(),
sex=col_factor(levels=c("M", "F"))
))
這個(gè)數(shù)據(jù)框有19個(gè)觀測唁影, 有如下5個(gè)變量:
- name
- sex
- age
- height
- weight
R的NHANES擴(kuò)展包提供了一個(gè)規(guī)模更大的示例數(shù)據(jù)框NHANES耕陷, 可以看作是美國扣除住院病人以外的人群的一個(gè)隨機(jī)樣本, 有10000個(gè)觀測据沈,有76個(gè)變量哟沫, 主題是個(gè)人的健康與營養(yǎng)方面的信息。 僅作為教學(xué)使用而不足以作為嚴(yán)謹(jǐn)?shù)目蒲杏脭?shù)據(jù)锌介。 原始數(shù)據(jù)的情況詳見http://www.cdc.gov/nchs/nhanes.htm嗜诀。 載入NHANES數(shù)據(jù)框:
library(NHANES)
data(NHANES)
print(dim(NHANES))
## [1] 10000 76
print(names(NHANES))
## [1] "ID" "SurveyYr" "Gender" "Age"
## [5] "AgeDecade" "AgeMonths" "Race1" "Race3"
## [9] "Education" "MaritalStatus" "HHIncome" "HHIncomeMid"
## [13] "Poverty" "HomeRooms" "HomeOwn" "Work"
## [17] "Weight" "Length" "HeadCirc" "Height"
## [21] "BMI" "BMICatUnder20yrs" "BMI_WHO" "Pulse"
## [25] "BPSysAve" "BPDiaAve" "BPSys1" "BPDia1"
## [29] "BPSys2" "BPDia2" "BPSys3" "BPDia3"
## [33] "Testosterone" "DirectChol" "TotChol" "UrineVol1"
## [37] "UrineFlow1" "UrineVol2" "UrineFlow2" "Diabetes"
## [41] "DiabetesAge" "HealthGen" "DaysPhysHlthBad" "DaysMentHlthBad"
## [45] "LittleInterest" "Depressed" "nPregnancies" "nBabies"
## [49] "Age1stBaby" "SleepHrsNight" "SleepTrouble" "PhysActive"
## [53] "PhysActiveDays" "TVHrsDay" "CompHrsDay" "TVHrsDayChild"
## [57] "CompHrsDayChild" "Alcohol12PlusYr" "AlcoholDay" "AlcoholYear"
## [61] "SmokeNow" "Smoke100" "Smoke100n" "SmokeAge"
## [65] "Marijuana" "AgeFirstMarij" "RegularMarij" "AgeRegMarij"
## [69] "HardDrugs" "SexEver" "SexAge" "SexNumPartnLife"
## [73] "SexNumPartYear" "SameSex" "SexOrientation" "PregnantNow"
變量ID是受試者編號, SurveyYr是調(diào)查年份掏湾, 同一受試者可能在多個(gè)調(diào)查年份中有數(shù)據(jù)裹虫。 變量中包括性別、年齡融击、種族筑公、收入等人口學(xué)數(shù)據(jù), 包括體重尊浪、身高匣屡、脈搏封救、血壓等基本體檢數(shù)據(jù), 以及是否糖尿病捣作、是否抑郁誉结、是否懷孕、已生產(chǎn)子女?dāng)?shù)等更詳細(xì)的健康數(shù)據(jù)券躁, 運(yùn)動習(xí)慣惩坑、飲酒、性生活等行為方面的數(shù)據(jù)也拜。 這個(gè)教學(xué)用數(shù)據(jù)集最初的使用者是Cashmere高中的Michelle Dalrymple 和新西蘭奧克蘭大學(xué)的Chris Wild以舒。
2 用filter()選擇行子集
行子集可以用行下標(biāo)選取, 如d.class[8:12,]慢哈。 函數(shù)head()取出數(shù)據(jù)框的前面若干行蔓钟, tail()取出數(shù)據(jù)框的最后若干行。
從d.class
中選出年齡在13歲和13歲以下的女生:
d.class %>%
filter(sex=="F", age<=13) %>%
knitr::kable()