R數(shù)據(jù)科學 R for data sciences

《R數(shù)據(jù)科學》的再次回顧學習，以及使用tidyverse過程中的一些new tricks學習記錄。

[TOC]

前言

reprex包：創(chuàng)建簡單的可重現(xiàn)實例故响。檢查問題所在。

一使用ggplot2進行數(shù)據(jù)可視化

坐標系ggplot()颁独，在此函數(shù)種添加的映射會作為全局變量應用到圖中的每個幾何對象種彩届。
圖層geom_point點圖層
映射數(shù)據(jù)為圖形屬性 mapping=aes()，要想將圖形屬性映射為變量誓酒，需要在函數(shù)aes()中將圖形屬性的名稱和變量的名稱關聯(lián)起來樟蠕。
標度變化：將變量（數(shù)據(jù)）分配唯一的圖形屬性水平。
手動設置圖形屬性丰捷，此是在geom_point()層面坯墨。此時，這個顏色是不會傳達變量數(shù)據(jù)的信息病往。
分層facet：facet_grid()可以通過兩個變量對圖分層`facet_grid(drv_cyl)或(.cyl)
分組aes(group)此種按照圖形屬性的分組不用添加圖例捣染，也不用為幾何對象添加區(qū)分特征
統(tǒng)計變換：繪圖時用來計算新數(shù)據(jù)的算法稱為stat(statistical transformation,統(tǒng)計變化)。比如對于geom_bar()默認是只對一個數(shù)據(jù)x映射停巷，其統(tǒng)計變化后生成數(shù)據(jù)x種的每個值的count數(shù)耍攘。
- 每個幾何對象函數(shù)都有一個默認的統(tǒng)計變換，每個統(tǒng)計變換函數(shù)都有一個默認的幾何對象畔勤。
- 如需要展示二維柱狀圖數(shù)據(jù)蕾各，geom_bar(mapping=aes(x=a,y=b),stat="identity ")

image

圖形屬性/位置調整：
- color,fill
- 位置調整參數(shù)position有三個選項："identity","fill","dodge"
- position="dodge"參數(shù)可分組顯示數(shù)據(jù)，將每組種的條形依次并列放置庆揪，可以輕松比較每個條形表示的具體數(shù)值式曲。
  
  image
- 數(shù)據(jù)的聚集模式無法很好確定，因為存在數(shù)據(jù)的過繪制問題（很多彼此十分近的點重疊了）position="jitter"對于geom_position()函數(shù)來說，jitter的位置方式為抖動會排除過繪制問題ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
坐標系：
- coord_flip()函數(shù)可以交換x軸和y軸
  
  image
- labs()：modify axis, legend, and plot labels.
- coord_polar()極坐標
  
  image

mpg
str(mpg)
data<- mpg
?mpg ##查看mpg數(shù)據(jù)的說明
ggplot(data = mpg)+geom_point(aes(x=displ,y=hwy))

ggplot(mpg)+geom_point(mapping = aes(x=displ,y=hwy,color=class),color="#EF5C4E",shape=19)
ggplot(mpg)+geom_point(mapping = aes(x=displ,y=hwy),color="#EF5C4E",shape=19)
ggplot(mpg)+geom_point(mapping = aes(x=displ,y=hwy,stroke=displ),shape=19)
## 添加兩個圖層：geom_point,geom_smooth()
ggplot(mpg)+geom_point(mapping = aes(x=displ,y=hwy,color=drv))+geom_smooth(mapping = aes(x=displ,y=hwy,linetype=drv,color=drv))
# 添加分組
ggplot(data = mpg)+geom_smooth(mapping = aes(x=displ,y=hwy,group=drv))
ggplot(data = mpg)+geom_smooth(mapping = aes(x=displ,y=hwy,color=drv),show.legend = F) ## 圖例 show.legend=F

## 在不同的圖層中添加指定不同的數(shù)據(jù)
## data=filter(mpg,class=="suv"), se=F吝羞，表示去除f波動的范圍兰伤。
ggplot(data = mpg,mapping = aes(x=displ,y=hwy))+geom_point(mapping = aes(color=class))+geom_smooth(data = filter(.data = mpg,class=="suv"))

##exercices
ggplot(data = mpg,mapping = aes(x=displ,y=hwy))+geom_point()+geom_smooth(se = F)
ggplot(data = mpg,mapping = aes(x=displ,y=hwy))+geom_point()+geom_smooth(se = F,mapping = aes(group=drv))
ggplot(data = mpg,mapping = aes(x=displ,y=hwy,color=drv))+geom_point()+geom_smooth(se = F)
ggplot(data = mpg,mapping = aes(x=displ,y=hwy))+geom_point(mapping = aes(color=drv))+geom_smooth(se = F)
ggplot(data = mpg,mapping = aes(x=displ,y=hwy))+geom_point(mapping = aes(color=drv))+geom_smooth(mapping = aes(linetype=drv),se = F)
ggplot(data = mpg,mapping = aes(x=displ,y=hwy))+geom_point(mapping = aes(color=drv))

### 統(tǒng)計變換
ggplot(data = mpg,mapping = aes(x=displ,y=hwy))+geom_point()+geom_smooth(se = F)
ggplot(data=diamonds)+stat_summary(mapping = aes(x=cut,y=depth))
ggplot(data=diamonds)+geom_boxplot(mapping = aes(x=cut,y=price))
ggplot(data=diamonds)+geom_bar(mapping = aes(x=cut))
ggplot(data=diamonds)+geom_bar(mapping = aes(x=cut,y=..prop..),group=2)

### 圖形調整,位置調整
ggplot(diamonds)+geom_bar(mapping = aes(x=cut,fill=cut),color="black")+scale_fill_brewer(palette = "Set3")
ggplot(diamonds)+geom_bar(mapping = aes(x=cut,fill=clarity))+scale_fill_brewer(palette = "Set2")
ggplot(diamonds)+geom_bar(mapping=aes(x=cut,color=clarity),position = "dodge")+scale_fill_brewer(palette = "Set2")
ggplot(mpg)+geom_point(mapping = aes(x=displ,y=hwy),position = "jitter")
##exercises
ggplot(mpg,mapping = aes(x=cty,y=hwy))+geom_point(position = "jitter")+geom_smooth(color="black")
ggplot(mpg,mapping = aes(x=cty,y=hwy))+geom_jitter()
ggplot(mpg,mapping = aes(x=cty,y=hwy))+geom_count()
ggplot(mpg)+geom_boxplot(mapping = aes(x=manufacturer,y=hwy),position = "identity")
?geom_boxplot

###1.9 坐標系
ggplot(mpg,mapping = aes(x=class,y=hwy))+geom_boxplot()+coord_flip()
nz <- map_data("nz")
?map_data
ggplot(data=diamonds)+geom_bar(mapping = aes(x=cut,fill=cut),show.legend = FALSE)+theme(aspect.ratio = 1)+labs()
bar+scale_color_brewer(palette = "Set2")
bar+coord_flip()
bar+coord_polar()

二工作流：基礎 Workflow：basics

賦值：小技巧，alt+減號會自動輸入賦值符號<- 并在前后加空格
對象：用snake_case命名法小寫字母钧排，以_分割敦腔。
Rstudio中快捷查看命令：Alt+Shift+K

三使用dplyr進行數(shù)據(jù)轉換

特殊的data.frametibble。
變量variable類型：
- int：
- dbl(num的一種恨溜？):雙精度浮點數(shù)變量符衔，或稱實數(shù)。
- chr：字符向量/字符串
- dttm：日期+時間
- lgl：邏輯型變量
- fctr（factor）：因子
- date：日期型變量
基礎函數(shù)：filter(),arrange(),select(),mutate(),summarize(),group_by()
使用filter()篩選:
- filter(flights, arr_delay<=10).
- 比較運算符 ==, !=, <=
- 邏輯運算符 x & !y, x|y, xor(x,y)
  
  image
- 缺失值 NA, is.na()
- %in% ：month %in% c(11, 12)
- distinct(iris, Species)刪除重復行
- sample_n()任意挑選行
- slice(irsi,10:15)

## filter()
(jan1 <- filter(flights,month==1,day==1))
(dec25 <- filter(flights,month==12,day==25))
filter(flights,month>=11)
(nov_dec <- filter(flights,month %in% c(11,12)))
filter(flights,!(arr_delay<=120 | dep_delay<=120))
NA>=10
x <- NA
is.na(x)
df <- tibble(x=c(1,NA,2))
filter(df,x>1)
filter(df,is.na(x)|x>1)
### exercise
filter(flights,carrier %in% c("UA","AA","DL"))
filter(flights,month %in% c(7,8,9))
filter(flights,arr_delay>120 & dep_delay==0)
filter(flights,dep_delay >= 60 & (dep_delay-arr_delay>=30)) 
filter(flights,dep_time ==2400| dep_time<=600)
filter(flights,is.na(dep_time))

使用 arrange()按照列(variable)的值values進行排序
- desc倒序
- 缺失值排在最后糟袁，若想提前可desc(is.na())
- top_n(10,wind)選擇wind變量從大到小的前10個判族。

# arrange()
arrange(flights,desc(dep_delay,arr_delay)) #降序排列
arrange(flights,desc(is.na(dep_delay),arr_delay)) ##將NA值排列到前面

### find the 5 cars with highest hp without ordering them
mtcars %>% top_n(5, hp)

使用select()選擇列：（數(shù)據(jù)集會有成千上萬個變量，select選出變量的子集）
- 選出year~day之間的列：select(flights, year:day)
- 選出排除year~day列：select(flights,-(year:dat))
- 匹配變量中的名稱： matches(""), contains("ijk")
- 匹配變量中的開頭系吭，結尾名稱starts_with(""), ends_with(),
- rename()對列變量重新命名
- everything()輔助函數(shù)來將某幾列提前五嫂。

## select()
select(flights,year:day)
select(flights,-(year:day)) ## 不包括year:day
select(flights,starts_with("s"))
select(flights,ends_with("e"))
select(flights,matches("time"))
select(flights,matches("(.)\\1"))

rename(flights,tail_num=tailnum) ##對變量進行重命名
select(flights,-(month:day),everything()) ## 結合everything()輔助函數(shù) 對某幾列提前, 置后同理
select(flights, hour:time_hour,everything())
###exercise
select(flights,year,year,year)
select(flights,one_of(c("year","month","day","dep_delay")))
select(flights,contains("TIME"))

使用mutate()添加新的列/變量：
- mutate() 新列添加到已有列的后面；
- transmute 只保留新的變量肯尺。
- 常用的運算符號：求整%/%沃缘，求余%%，偏移函數(shù)lead(), lag()则吟，累加和和滾動聚合槐臀，邏輯比較，排秩氓仲。
- ==rename()== 可對變量名稱進行重新命名水慨。
- ==rename_all(~str_replace(., "", ""))== 可以根據(jù)正則匹配進行改名。
- ==mutate_all()== 可對所有observations進行修改敬扛。
- ==case_when()== 根據(jù)已有的columns創(chuàng)建新的discrete variables晰洒。

# mutate() 在tibble后 添加新變量/列
flights_sml <- select(flights,year:day,matches("_delay"),distance,air_time)
flights_sml
mutate(flights_sml,flying_delay=arr_delay-dep_delay,speed=distance/air_time * 60 )
flights_sml
transmute(flights,gain=arr_delay-dep_delay,hour=air_time/60,gain_per_hour=gain/hour)

mutate(flights,dep_time=((dep_time%/%100 * 60)+(dep_time%%100))) ## 會直接在flights中改動dep_time
flights
transmute(flights,air_time,duration=(arr_time-dep_time),arr_delay)
1:3+1:10
1:10+1:3
1:10
?cos

#### 變量名variable names 重新命名。
iris %>% 
  rename_all(tolower) %>% 
  rename_all(~str_replace_all(., "\\.", "_"))

#### 觀測 observation 和 values 值修改
storms %>% 
    select(name,year,status) %>% 
    mutate_all(tolower) %>% 
    mutate_all(~str_replace_all(., "\\s", "_"))

storms %>% 
    select(name,year,status) %>% 
    map(~str_replace(.,"\\s","_")) %>% 
    as_tibble()

#### make new discrete variables based on other columns.
starwars %>%
  select(name, species, homeworld, birth_year, hair_color) %>%
  mutate(new_group = case_when(
      species == "Droid" ~ "Robot",
      homeworld == "Tatooine" & hair_color == "blond" ~ "Blond Tatooinian",
      homeworld == "Tatooine" ~ "Other Tatooinian",
      hair_color == "blond" ~ "Blond non-Tatooinian",
      TRUE ~ "Other Human"))

使用summarize()進行分組摘要：
- 與group_by一起使用啥箭，將整個數(shù)據(jù)集的單位縮小為單個分組谍珊。
- ==add_count()== 添加個數(shù)統(tǒng)計，而不用summarize

# 使用summarize()進行分組摘要
by_year <- group_by(flights,year,month)
summarise(by_year,delay=mean(arr_delay-dep_delay,na.rm = T))
####查看
(delay_byDay <- group_by(flights,month) %>%summarise(delay_time=mean(dep_delay,na.rm = T))) %>% ggplot(mapping = aes(x=month,y=delay_time))+geom_point()+geom_smooth(se=F)

#### add the amount of observations without summarising them, and rename them with rename() statement.
mtcars %>% 
  select(-(drat:vs)) %>% 
  add_count(cyl) %>% rename(n_cyl = n) %>% 
  add_count(am) %>% rename(n_am = n)

利用管道符%>%對數(shù)據(jù)綜合操作：
- 綜合就是flights %>% group_by(~) %>% summarize(mean(~~,na.rm=T)) %>% filter(~) %>% ggplot(aes())+geom_~()
- 缺失值：na.rm=T急侥，缺失值計算會都變成缺失值砌滞，可利用filter(!is.na(dep_delay),!is.na(arr_delay))
- 常用的摘要函數(shù)：n(), sum(), mean()
- 中位數(shù)median()，分散程度sd()/IQR()/mad()
- 計數(shù) n(), 計算唯一值的數(shù)量n_distinct() 去重復后唯一值的計數(shù)坏怪，count()可快速的計算贝润。
- 邏輯值的計數(shù) 和比例：summarize(n_early=sum(dep_time<50)),sum找出大于x的True的數(shù)量，mean會計算比例铝宵。
將group_by與filter和mutate結合使用
- 根據(jù)不同的分組打掘，再利用filter進行篩選flights_sml %>% group_by(year, month, day) %>% filter(rank(desc(arr_delay)) < 10)
其它一些dplyr中的函數(shù)
- distinct("", .keep_all=T) base中的unique替代函數(shù)。

# 使用summarize()進行分組摘要
by_year <- group_by(flights,year,month)
summarise(by_year,delay=mean(arr_delay-dep_delay,na.rm = T))
####查看
(delay_byDay <- group_by(flights,month) %>%summarise(delay_time=mean(dep_delay,na.rm = T))) %>% ggplot(mapping = aes(x=month,y=delay_time))+geom_point()+geom_smooth(se=F)

### 使用管道組合多種操作
(delay_by_dest <- group_by(flights,dest)%>%summarise(count=n(),delay_time=mean(dep_time,na.rm = T), dist=mean(distance,na.rm = T))) %>% filter(count>20,dest!="HNL") %>% ggplot(mapping = aes(x=dist,y=delay_time))+geom_point(aes(size=count))+geom_smooth(se=F,color="darkblue")

## 管道符 %>%
(delay <- summarise(by_dest,count=n(),dist=mean(distance,na.rm = T),delay=mean(arr_delay,na.rm = T))) ### count=n()統(tǒng)計分組，就是dest城市的個數(shù)
delay <- filter(delay,count>20,dest!="HNL")## 篩掉飛行記錄少的胧卤，特殊機場
ggplot(data = delay,mapping = aes(x=dist,y=delay))+geom_point(aes(size=count),alpha=1/3)+geom_smooth(se=F,color="darkblue")
(delay_by_dest <- group_by(flights,dest)%>%summarise(count=n(),delay_time=mean(dep_time,na.rm = T), dist=mean(distance,na.rm = T))) %>% filter(count>20,dest!="HNL") %>% ggplot(mapping = aes(x=dist,y=delay_time))+geom_point(aes(size=count))+geom_smooth(se=F,color="darkblue")
##查看飛機型號與延誤時間的關系
flights %>% group_by(tailnum) %>%summarise(count=n(),delay_time=mean(arr_delay,na.rm = T)) %>%arrange(delay_time) %>%ggplot(mapping = aes(x=delay_time))+geom_freqpoly(binwidth = 10)
##查看航班數(shù)量 與 飛機延誤時間的關系：航班數(shù)量少時唯绍，平均延誤時間的變動特別大
delay_time %>% filter(count>25) %>% ggplot(mapping = aes(x=count,y=delay_time))+geom_point(alpha=1/5)
##其他常用的統(tǒng)計函數(shù)
flights_not_cancelled %>% group_by(dest) %>% summarise(carrier=n())
flights_not_cancelled %>% group_by(dest) %>% summarise(carriers=n_distinct(carrier))
flights_not_cancelled %>% group_by(tailnum) %>% summarise(sum(distance))
 
###exercises
##### 查看哪個航空公司延誤時間最長
flights_not_cancelled %>% group_by(carrier) %>% summarise(count=n(),arr_delay_time=mean(arr_delay)) %>% arrange(desc(arr_delay_time)) %>% ggplot(mapping = aes(x=carrier,y=arr_delay_time))+geom_point(aes(size=count))

### 將group_by和filter結合
flights_sml %>% 
  group_by(year, month, day) %>%
  filter(rank(desc(arr_delay)) < 10)
#> # A tibble: 3,306 x 7
#> # Groups:   year, month, day [365]
#>    year month   day dep_delay arr_delay distance air_time
#>   <int> <int> <int>     <dbl>     <dbl>    <dbl>    <dbl>
#> 1  2013     1     1       853       851      184       41
#> 2  2013     1     1       290       338     1134      213
#> 3  2013     1     1       260       263      266       46
#> 4  2013     1     1       157       174      213       60
#> 5  2013     1     1       216       222      708      121
#> 6  2013     1     1       255       250      589      115
#> # … with 3,300 more rows

popular_dests <- flights %>% 
  group_by(dest) %>% 
  filter(n() > 365)
popular_dests
#> # A tibble: 332,577 x 19
#> # Groups:   dest [77]
#>    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
#> 1  2013     1     1      517            515         2      830            819
#> 2  2013     1     1      533            529         4      850            830
#> 3  2013     1     1      542            540         2      923            850
#> 4  2013     1     1      544            545        -1     1004           1022
#> 5  2013     1     1      554            600        -6      812            837
#> 6  2013     1     1      554            558        -4      740            728
#> # … with 332,571 more rows, and 11 more variables: arr_delay <dbl>,
#> #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
#> #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

取消分組：ungroup()函數(shù)：

五，探索性數(shù)據(jù)分析 exploratory data analysis(EDA)

變動：是一個變量內部的行為枝誊，每次測量時數(shù)據(jù)值的變化趨勢。
相關變動：兩個或多個變量以相關的方式共同變化所表現(xiàn)出的趨勢惜纸。多個變量之間的行為叶撒。
模式：如果兩個變量之間存在系統(tǒng)性的關系，那么這種關系就會再數(shù)據(jù)中表示一種模式耐版。

5.3 變動:variation describes the behavior within a variable

一維數(shù)據(jù)的表示：geom_bar(binwidth=1)可以對一維連續(xù)變量進行分箱祠够，然后使用條形的高度表示落入箱中的數(shù)量。并且對于geom_bar(),geom_histogram()可以利用geom_freqpoly()替代粪牲，此為疊加的折線圖古瓤，并可以在折線圖內aes(color= *)參數(shù)映射其它數(shù)據(jù)。

異常值：可用圖層coord_cartesian(ylim=c(0,50))濾出（不顯示腺阳，但會保留）大于此取值的落君，而不是ylim(0,50)直接丟棄。

將異常值當做缺失值處理：利用mutate()函數(shù)創(chuàng)建新變量代替原來的變量亭引，使用ifelse()函數(shù)將異常值替換為NA：
- diamons %>% mutate(y = ifelse(y < 3 | y>20, NA, y))
- 計算畫圖時以參數(shù)na.rm=TRUE過濾掉
異常值差異不大的绎速，可以用缺失值來代替，而異常值較大的需要探究其原因焙蚓。

## 5.3 變動纹冤，分布進行可視化表示
ggplot(data = diamonds)+geom_bar(mapping = aes(x=cut))
ggplot(data = diamonds)+geom_bar(mapping = aes(x=carat),binwidth = 0.5)
diamonds %>% filter(carat<3) %>% ggplot(mapping = aes(x=carat))+geom_histogram(binwidth = 0.1)
diamonds %>% filter(carat<3) %>% ggplot(mapping = aes(x=carat))+ geom_freqpoly(aes(color=cut),binwidth=0.1)+scale_color_brewer(palette = "Set1")

### 5.3.3異常值，
ggplot(diamonds)+geom_histogram(mapping = aes(x=y),binwidth = 0.5)+ylim(0,60)
ggplot(diamonds)+geom_histogram(mapping = aes(x=y),binwidth = 0.5)+coord_cartesian(ylim = c(0,60))

# 5.4 異常值,推薦是將異常值改為缺失值處理购公。
diamonds %>% filter(between(y,3,20))
#### mutate()創(chuàng)建新變量代替原來的變量
diamonds <- diamonds %>% mutate(y=ifelse(y<3 | y>30,NA,y))
diamonds <- select(diamonds,-(`ifelse(y < 3 | y > 30, NA, y)`))
ggplot(diamonds,aes(x=x,y=y))+geom_point(na.rm = T)
flights %>% mutate(cancelled=is.na(dep_time),sched_hour=sched_dep_time%/%100,sched_min=sched_dep_time%%100,sched_depart_time=sched_hour+sched_min/60) %>% select(sched_hour,sched_min,sched_depart_time,everything()) %>% ggplot(aes(x=sched_depart_time))+geom_freqpoly(aes(color=cancelled),binwidth=1/5)

5.5 相關變動covariation describes the behavior between variables：

分類變量與連續(xù)變量：
- 對geom_freqpoly(aes(color=cut))萌京，對一維的count計數(shù)進行標準化可以y=..density..標準化。
- 箱線圖boxplot()標準化：reorder(class,hwy,FUN=median) 對分類變量進行排序宏浩。
  
  image
兩個分類變量：可以利用heatmap圖知残，geom_tile, geom_count
- geom_count()函數(shù)
- 另外可先count(x,y)計數(shù)，再利用geom_tile()和geom_count()填充圖形屬性绘闷。
兩個連續(xù)變量：
- 通常的geom_point(),可通過alpha參數(shù)設置透明度橡庞。
  
  image
- 對連續(xù)變量數(shù)據(jù)進行分箱，geom_bin2d,geom_hex()可以正方形和六邊形分享印蔗。

5.6 模式和模型：如果兩個變量之間存在系統(tǒng)性的關系扒最，那么這種關系就會再數(shù)據(jù)中表示一種模式。

變動會生成不確定性华嘹，那么相關變動就是減少不確定性吧趣，如果兩個變量是存在系統(tǒng)性的關系的，那么就可以通過一個變量的值來預測另一個變量的值。
模型：就是抽取模式的一種工具强挫，找到兩個變量間系統(tǒng)性關系的方法岔霸。

# 5.5 相關變動：兩個或者多個變量間的關系。
ggplot(diamonds,mapping = aes(x=price))+geom_freqpoly(aes(color=cut),binwidth=500)
ggplot(diamonds)+geom_freqpoly(aes(x=price,y=..density..,color=cut),binwidth=200)+scale_color_brewer(palette = "Set2")
###mpg
ggplot(mpg,mapping = aes(x=class,y=hwy))+geom_boxplot()
ggplot(mpg)+geom_boxplot(mapping = aes(x=reorder(class,hwy,FUN = mean),y=hwy,fill=class))
ggplot(mpg)+geom_boxplot(mapping = aes(x=reorder(class,hwy,FUN=median),y=hwy))+coord_flip()

### 5.5.1分類變量與連續(xù)變量



### 5.5.2倆個分類變量間的關系俯渤，geom_tile(aes(fill=count))
ggplot(data = diamonds)+geom_count(mapping = aes(x=cut,y=color))
diamonds %>% count(color,cut) %>% ggplot()+geom_tile(aes(x=color,y=cut,fill=n))

### 5.5.3 兩個連續(xù)變量
geom_point()
geom_bin2d() ## 對連續(xù)變量的數(shù)據(jù)做分箱處理呆细。
geom_hex()
diamonds %>% ggplot(aes(carat,price))+geom_hex()

image

七， tibble

是對傳統(tǒng)R中的data.frame的升級版替換八匠。

tibble在打印大數(shù)據(jù)時會默認僅顯示10行(observation)絮爷，屏幕可顯示的列(variety)
轉換傳統(tǒng)的data frame to tibble with as_tibble()
打印tibble顯示在控制臺時，有時需要比默認的顯示更多的輸出梨树】雍唬可利用print(n=10,width=Inf) or options(tibble.print_max = n, tibble.print_min = m), options(tibble.width = Inf).
==取子集，需要利用占位符df %>% .$x==
一些老的base函數(shù)不支持tibble, 可利用as.data.frame()轉換回抡四。

tb <- tibble(
  x = 1:5, 
  y = 1, 
  z = x ^ 2 + y
)
tb

nycflights13::flights %>% 
  print(n = 10, width = Inf)
  
df %>% .$x
#> [1] 0.7330 0.2344 0.6604 0.0329 0.4605
df %>% .[["x"]]
#> [1] 0.7330 0.2344 0.6604 0.0329 0.4605

八柜蜈，使用readr進行數(shù)據(jù)的導入

8.1 常用的tidyverse所提供的數(shù)據(jù)導入的方式：

read_csv()：以,分割
read_csv2()：以;分割
read_tsv()：以\t分割
read_delim()：可以讀取以任意分隔符的文件delim="\t" (delimiter 分界符)

8.2 以read_csv()為例，內部提供的函數(shù)：

skip=2指巡，跳過開頭的n行
comment="#" 跳過以#開頭的行
==col_names=FALSE 不以第一行作為列標題淑履。也可col_names=c("a","b","c")來對列進行命名==
na="." 設定哪個值為文件的缺失值

read_csv("file.csv",skip=1,col_names=c("a","b","c","d")) ## 讀取文件file.csv，跳過第一行厌处，并自定義列的名字鳖谈。

8.3 解析向量：主要依靠parse_*()函數(shù)族解析，第一個參數(shù)為需要解析的字符向量阔涉，na參數(shù)設定缺失值處理na="."缆娃，函數(shù)族包括parse_logical(), parse_double(), character(), factor(), datetime().

parse_number()可以忽略數(shù)值前后的非數(shù)值型字符，可以處理貨幣/百分比瑰排，提取嵌在文本中的數(shù)值
character：UTF-8可以對人類使用的所有字符進行編碼贯要，ASCII為美國信息交換標準代碼。
因子：表示 已知集合的分類變量
日期時間等

8.4 解析文件：readr會通過文件的前1000行以啟發(fā)式算法guess_parser()返回readr最可信的猜測椭住，接著用parse_guess()使用這個猜測來解析列崇渗。

8.5 寫入文件和其它導入文件：

通過write_csv,write_tsv，其會自動使用UTF-8對字符串編碼京郑。==append=T會覆蓋已有文件==
write_excel_csv()函數(shù)導為Excel文件宅广。
readxl可以讀取EXCEL文件
haven讀取SPSS,SAS數(shù)據(jù)
DBI可對RMySQL等數(shù)據(jù)庫查詢
jsonlite讀取JSON的層次數(shù)據(jù)。
xml2讀取XML文件數(shù)據(jù)

九些举，使用tidyr整理數(shù)據(jù)表 Tidy data

“Tidy datasets are all alike, but every messy dataset is messy in its own way.” –– Hadley Wickham

整潔的數(shù)據(jù)基本準則(以tidyr內部數(shù)據(jù)table1,table2,table3,table4a,table4b)為例：

每個變量(variables)都只有一列跟狱；
每個觀測(observation)只有一行；
每個數(shù)據(jù)僅有一個位置cell户魏。

image

整理數(shù)據(jù)表驶臊，應對一個變量對應多行 or 一個觀測對應多行的問題挪挤。利用gather，spread()

gather()：對于table4a來說关翎，其存在兩列1999/2000對應相同的變量值扛门。故需要合并，合并后的兩列重新起名字纵寝，根據(jù)key=""论寨，和value=""。
- table4a %>% gather(1999,2000,key="year",value="cases")
- pivot_longer(c(1999,2000),names_to="year",values_to="cases")
  
  image

tidy4a <- table4a %>% 
  pivot_longer(c(`1999`, `2000`), names_to = "year", values_to = "cases")
tidy4b <- table4b %>% 
  pivot_longer(c(`1999`, `2000`), names_to = "year", values_to = "population")
left_join(tidy4a, tidy4b)
#> Joining, by = c("country", "year")
#> # A tibble: 6 x 4
#>   country     year   cases population
#>   <chr>       <chr>  <int>      <int>
#> 1 Afghanistan 1999     745   19987071
#> 2 Afghanistan 2000    2666   20595360
#> 3 Brazil      1999   37737  172006362
#> 4 Brazil      2000   80488  174504898
#> 5 China       1999  212258 1272915272
#> 6 China       2000  213766 1280428583

spread()：對于table2來說店雅，存在冗余政基。需要拆分出多個變量
- table2 %>% spread(table2,key=type,value=count)
- pivot_wider(names_from = type,values_from=count)
  
  image
separate()：對于table3來說，rate一列數(shù)據(jù)可以拆分闹啦。
- separate(table3,rate,into=c("cases","population"))
  
  image
- sep=參數(shù)默認是以非數(shù)字非字母的字符為分隔符，也可以指定分隔符根據(jù)正則匹配辕坝。sep=4表示以4個字符作為分隔符窍奋。
- ==convert = TRUE表示改變分割后的數(shù)據(jù)結構。separate()默認切割后的數(shù)據(jù)結構是character酱畅。==
unite()：對指定兩列合并處理琳袄。
- unite(new,centry,year,sep="")
  
  image

rename()表窘，rename_all()： 對variables變量進行重命名翘骂。通過library(stringr)戳寸。
- rename_all(tolower) %>% rename_all(~str_replace_all(., "\\.", "_"))
mutate()芒帕，mutate_all()：對所有的value進行重命名占婉，修改名稱
- mutate_all(tolower) %>% mutate_all(~str_replace_all(., " ", "_"))
rowwise()菇绵，對每行的數(shù)據(jù)進行處理澎媒，加和/平均值處理琅绅。
- iris %>% select(contains("Length")) %>% rowwise() %>% mutate(avg_length = mean(c(Petal.Length, Sepal.Length)))

## 1 tidy_data
#### compare different dataset
table1
table2
table3
table4a;table4b

table1 %>% mutate(rate=cases/population *10000)
table1 %>% count(year,wt=cases)
ggplot(table1,aes(year,cases)) + geom_line(aes(color=country))+geom_point(color="grey")


## 2 gather()
table4a
table4a %>% gather(`1999`,`2000`,key = "year",value = "cases")
table4b %>% gather(`1999`,`2000`,key="year",value="population")
left_join(table4a,table4b,by=c("country","year"))

## 3 spread()
table2
table2 %>% spread(key = type,value = count)

stocks <- tibble(
  year=c(2015,2015,2016,2016),
  half=c(1,2,1,2),
  return=c(1.88,0.59,0.92,0.17)
)
stocks

stocks %>% spread(year,return) %>% gather("year","return",`2015`,`2016`)


## 綜合操作
who %>%
  gather(new_sp_m014:newrel_f65,key="key",value="count",na.rm = T) %>%
  mutate(key=str_replace(key,"newrel","new_rel")) %>%
  separate(key,into = c("new","var","sexage"),sep="_") %>%
  select(-iso2,-iso3,-new) %>%
  separate(sexage,into = c("sex","age"),sep=1)

九樊诺，使用dplyr處理關系數(shù)據(jù)仗考，多個數(shù)據(jù)表。

綜合多個表中的數(shù)據(jù)來解決感興趣的問題词爬。存在于多個表中的數(shù)據(jù)稱為關系數(shù)據(jù)秃嗜。且關系總是定義于兩個表之間的。 包括有三類操作處理關系數(shù)據(jù)：

合并數(shù)據(jù)：在一個表中添加另一個表的新變量顿膨，添加新變量的方式是以連接兩個表的鍵來實現(xiàn)的锅锨。
篩選連接：在A表中，根據(jù)鍵是否存在于B表中來篩選這個A表中的數(shù)據(jù)恋沃。
集合操作：將觀測作為集合元素來處理必搞。

基本數(shù)據(jù)的準備包括nycflights13包中的幾個表。airlines/airports/planes/weather等芽唇。

9.3 鍵：唯一標識觀測的變量（A key is a variable (or set of variables) that uniquely identifies an observation）

主鍵(primary key)：唯一標識其所在數(shù)據(jù)表中的觀測(不重復)例如planes$tailnum顾画∪〗伲可利用count() %>% filter(n>1)

planes %>% 
  count(tailnum) %>% 
  filter(n > 1)
#> # A tibble: 0 x 2
#> # … with 2 variables: tailnum <chr>, n <int>

weather %>% 
  count(year, month, day, hour, origin) %>% 
  filter(n > 1)
#> # A tibble: 3 x 6
#>    year month   day  hour origin     n
#>   <int> <int> <int> <int> <chr>  <int>
#> 1  2013    11     3     1 EWR        2
#> 2  2013    11     3     1 JFK        2
#> 3  2013    11     3     1 LGA        2

外鍵(foreign key)：唯一標識另一個數(shù)據(jù)表中的觀測例如flights$tailnum。
代理鍵：當一張表沒有主鍵研侣，需要使用mutate()函數(shù)和row_number()函數(shù)為表加上一個主鍵谱邪。
理解不同數(shù)據(jù)表之間的關系的關鍵時：==記住每種關系只與兩張表有關==，不需要清楚所有的事情庶诡，只需要明白所關心的表格即可惦银。
- 例如 flights 與 planes相連接 via tailnum。flights 連接airlines通過carrier.等
  
  image

9.4 合并連接(mutating join)：通過兩個表中的鍵（變量們）來匹配觀測（行數(shù)值）末誓，再將一個表中的變量復制到另一個表格中扯俱。 對比cheatsheet中的信息

內連接：相當于連取交集。將兩個表中的相等的鍵值取出來喇澡。X %>% inner_join(y,by="key")

image
外連接：至少保留存在于一個表中的觀測迅栅。在另一個表中未匹配的變量會以NA表示。
- left_join() keeps all observations in x.最常用的join
- right_join() keeps all observations in y
- full_join() keeps all observations in x and y
  
  image

flights2 %>% left_join(airlines,by="carrier")
## in base R

flights2 %>% mutate(name=airlines$name[match(carrier,airlines$carrier)])

重復鍵：一對多（）晴玖，多對一读存，以及多對多的關系（兩表中都不唯一，會以笛卡爾積分的方式呈現(xiàn)）呕屎。
定義兩個表中匹配的鍵：
- by=NULL让簿，默認，使用存在于兩個表中的所有變量秀睛。
- left_join(x,y,by="weather")
- left_join(x,y,c("dest"="faa "))
  
  image
其它Implementations: base:merge()
- Baes:merge: left_join(x, y) <=> merge(x, y, all.x = TRUE)
- SQL: left_join(x, y, by = "z") <=> SELECT * FROM x LEFT OUTER JOIN y USING (z)

9.5 篩選連接(Filtering joins)：根據(jù)鍵來對觀測數(shù)值進行篩選尔当，用一組數(shù)據(jù)來篩選另一組數(shù)據(jù)

semi_join(x,y,by="")：保留x表中與 y表中的觀測數(shù)值相匹配的數(shù)據(jù)。
- image
anti_join(x,y)：丟棄x表中與y表中觀測(行)數(shù)據(jù)相匹配的數(shù)據(jù)蹂安。
- image

9.6整理數(shù)據(jù)時需要對數(shù)據(jù)進行整理：

需要找出每個表中可以作為主鍵的變量椭迎。應基于數(shù)據(jù)的真實含義來找主鍵
確保主鍵中的每個變量沒有缺失值，如果有缺失值則不能被識別藤抡！
檢查主鍵是否可以與另一個表中的外鍵相匹配侠碧。利用anti_join()來確定。

9.7集合的操作：

intersect(x,y) 兩個表中皆存在缠黍。
union()：返回x表中與y表中的唯一觀測弄兜。
setdiff()：在x表中，但不在y表中的數(shù)值瓷式。

library(tidyverse)
library(nycflights13)
planes;airports;airlines;weather;


## 9.3鍵
weather %>% count(year,month,day,hour,origin) %>% filter(n>1) ### 篩選唯一的鍵

## 9.4 合并連接
flights2 <- flights %>% select(year:day,hour,origin,dest,tailnum,carrier)
flights2 %>% select(-(hour:origin)) %>% left_join(airlines,by = "carrier")
flights2 %>% select(-(hour:origin)) %>% right_join(airlines,by = "carrier")

flights2 %>% left_join(weather) ## 自然連接,使用存在于兩個表中的所有變量替饿。
flights2 %>% left_join(planes,by = "tailnum") ## 共有的
flights2 %>% left_join(airports,c("origin" = "faa"))


### exercise
##9.4.6-1目的地的平均延誤時間，與空間分布贸典。
flights %>% group_by(dest) %>% summarise(dest_delay=mean(arr_delay,na.rm = T)) %>% left_join(airports,c("dest"="faa")) %>% filter(dest_delay>0)%>%ggplot(aes(lon,lat))+borders("state")+geom_point(aes(size=dest_delay))

airports %>% semi_join(flights,c("faa"="dest")) %>% ggplot(aes(lon,lat))+borders("state")+geom_point()
##exercise3 飛機的機齡與延誤時間
flights %>% group_by(tailnum) %>% summarise(count=n(),delay_tailnum=mean(arr_delay,na.rm = T)) %>% left_join(planes,by="tailnum") %>% filter(!is.na(year)) %>% ggplot(aes(x=delay_tailnum))+geom_freqpoly(aes(color=year),binwidth=1)
###geom_ribbon作圖

## 9.5 篩選連接
(top_dest <- flights %>% count(dest,sort = T) %>% head(10))
flights %>% filter(dest %in% top_dest$dest)

flights %>% semi_join(top_dest,by = "dest")

十章使用stringr處理字符串视卢。

字符串通常包含的是非結構化或者半結構化的數(shù)據(jù)。

10.1 字符串基礎

R基礎函數(shù)中含有一些字符串處理函數(shù)廊驼，但方法不一致据过，不便于記憶惋砂。推薦使用stringr函數(shù)。函數(shù)是以str_開頭

字符串的長度length: str_length()
字符串的組合combine: str_c("x","y",sep = "_")
- 向量化函數(shù)绳锅，自動循環(huán)短向量西饵，使得其與最長的向量具有相同的長度
- x <- c("abc", NA) ; str_c("1_",str_replace_na(x),"_1")
字符串character取子集subsetting strings(==根據(jù)位置==)：str_sub(x, start, end)。如果是一個向量鳞芙，則對向量中的每個字符串操作眷柔，截取子集
- 對向量x<- c("Apple","Banana", "Pear")中的每個字符串第一個字母小寫化。==str_sub(x,1,1) <- str_to_lower(str_sub(x,1,1))==
文本轉化為大小寫：全部大寫str_to_upper(), 首字母大寫str_to_title()
對向量內的strings按照ENGLISH進行排序：str_sort(c("apple","eggplant","banana"), locale="en")

10.2 正則匹配

利用str_view()學習正則匹配原朝，需安裝library(htmltools), htmlwidgets

str_view(x, "abc")
錨點：^ $; 單詞邊界：\b,如匹配一個單詞 \bsum\b
特殊匹配符號：\\d, \\s, \\w, [abc], [^abc]不匹配a/b/c
數(shù)量：? + * {n,m} (..)\\1
分組和反引用：(..)\\1

10.3 各類匹配操作

匹配檢測：返回邏輯值TURE(1) or FALSE(0) ==str_detect(x, "e$")==
- 利用sum(), mean()簡單統(tǒng)計匹配的個數(shù)驯嘱。
- 邏輯取子集方法篩選：words[str_detect(words,"x$")]
- 與dplyr使用的另一種技巧 ：df %>% filter(str_detect(words,"ab"))
- 等同于== str_subset(words,"x$") ==
- ==str_count(words, "[aeiou]")== 返回字符串中匹配的數(shù)量。
- 與dplyr一起使用：df %>% mutate( vowels=str_count(w,"[aeiou]"))
提取匹配的內容：== str_extract() == 只提取第一個匹配的內容喳坠。
- str_extract_all(words,color_match)返回一個列表鞠评，包含所有匹配的內容。
- str_extract_all(words,color_match, simplify= TRUE) 返回的是一個矩陣壕鹉。
- 注意與str_subset()區(qū)別谢澈，subset是取子集，取所有匹配到的御板，無匹配的舍去。子集牛郑。而extract是取匹配到的字符串怠肋，無匹配到字符串返回NA⊙团螅可先利用str_subset()找到包含匹配的chr笙各，再用str_extract() 找到包含的匹配。
- str_match()
- 利用tidyr里的== extract() ==將字符串向量提取匹配的字符串并轉變?yōu)閠ibble
- image
替換匹配的內容 str_replace(words, "match_x", "replace_x")
- 同時替換多個匹配的內容：str_replace_all()
- 同時執(zhí)行多個替換：str_replace_all(words,c("1"="one","2"="two","3"="three"))
- image
拆分 split(sentences," ")返回的是一個列表
- "a|b|c|d" %>% str_split("\\|") %>% .[[1]]
- 內置的單詞邊界函數(shù)boundary()础芍，會自動識別單詞外的字符str_split(x, boundary("word"))
定位：str_locate返回匹配在字符串中的位置杈抢。
- 使用str_locate()找出匹配的模式，再用str_sub()提取或修改匹配的內容仑性。
其它模式：
- regex()
- fixed()
- coll()

10.5 其它類型的匹配

對于一個匹配的"pattern"來說惶楼，其完整的寫法是regex("pattern")。而regex()函數(shù)中包含其它的參數(shù)

ignore_case=T忽略匹配的大小寫
multiline=T 可以跨行匹配
comments = T 可以添加注釋信息
dotall=T可以匹配所有字符

其它應用：當想不起函數(shù)名稱時可以apropos("pattern")

函數(shù)	功能說明	R Base中對應函數(shù)
使用正則表達式的函數(shù)
str_subset()	返回匹配到的子集列表
str_extract()	提取首個匹配模式的字符	regmatches()
str_extract_all()	提取所有匹配模式的字符	regmatches()
str_locate()	返回首個匹配模式的字符的位置	regexpr()
str_locate_all()	返回所有匹配模式的字符的位置	gregexpr()
str_replace()	替換首個匹配模式	sub()
str_replace_all()	替換所有匹配模式	gsub()
str_split()	按照模式分割字符串	strsplit()
str_split_fixed()	按照模式將字符串分割成指定個數(shù)	-
str_detect()	檢測字符是否存在某些指定模式	grepl()
str_count()	返回指定模式出現(xiàn)的次數(shù)	-
其他重要函數(shù)
str_sub()	提取指定位置的字符	regmatches()
str_dup()	丟棄指定位置的字符	-
str_length()	返回字符的長度	nchar()
str_pad()	填補字符	-
str_trim()	丟棄填充诊杆，如去掉字符前后的空格	-
str_c()	連接字符	paste(),paste0()

## 10.2 字符串基礎
str_length(c("a","aaaaa",NA)) ## str_length 返回字符串中的字符數(shù)量
str_c("x","y","z",sep = " ")
str_c("aaa",str_replace_na(c("bbb",NA)),"ccc")

x <- c("Apple","Banana","Pear")
(str_sub(x,1,1) <- str_to_lower(str_sub(x,1,1)))## 對首字母改為小寫歼捐。
x


## 10.3正則表達式進行模式匹配。
str_view(x,".a")
str_view(x,"^a")

str_view(words,"^.{7,}$",match = T) ## exercise 只顯示7個字母及以上的單詞


## 10.4.1匹配檢測
df <- tibble(w=words,i=seq_along(words))
df %>% filter(str_detect(w,"ab")) ##對于tibble表中篩選晨汹。
str_subset(words,"^y")

mean(str_count(words,"[aeiou]")) ## 每個單詞中元音字母的數(shù)量
df %>% mutate(vowels=str_count(w,"[aeiou]"),consonants=str_count(w,"[^aeiou]")) ## 與mutate一起使用豹储，加一列匹配到元音字母與非元音字母的數(shù)

####exercises
str_subset(words,"x$|^y")
words[str_detect(words,"x$|^y")]


## 10.4.3 提取匹配內容
colors <- c("red","orange","yellow","green","blue","purple")
(color_match <- str_c(colors,collapse = "|"))
has_color <- str_subset(sentences,color_match) ## 提取包含匹配的整個句子
matches <- str_extract(has_color,color_match) ##匹配包含匹配句子 的 第一個匹配內容。
str(matches)
###exercises
str_extract(sentences,"^\\S+")
str_extract_all(sentences,"\\w+s")

words_ing <- str_subset(sentences,"\\b\\w+ing\\b")
str_extract_all(words_ing,"\\b\\w+ing\\b")
## 10.4.5 分組匹配
noun <- "(a|the) (\\S+)"
has_noun <- sentences %>% str_subset(noun)
has_noun %>% str_extract(noun) 

sentences %>% str_subset(noun) %>% str_extract(noun)
str_match(has_noun,noun) ## 可以給出每個獨立的分組淘这，返回的是一個矩陣剥扣。
tibble(sentence=sentences) %>% extract(col = sentence,into = c("article","noun"),regex = "(a|the) (\\w+)",remove = F)

## 10.4.7 替換
str_replace()
str_replace_all(words,c("1"="one","2"="two","3"="three"))


## 10.4.9拆分
"a|b|c|d" %>% str_split("\\|") %>% .[[1]]
x <- "This is a sentence"
str_view_all(x,boundary("word"))  ## 返回句子中的所有單詞

apropos("str")

十一章使用forcats處理因子

因子在R中用于處理分類變量巩剖。分類變量是在固定的已知集合中取值的變量。

使用因子時钠怯，最常用的兩種操作時修改水平的順序和水平的值佳魔。

factor(x1,levels=c("a","b","c"))
fct_reorder() ## 重新對factor的層級進行確定。
利用gss_cat數(shù)據(jù)集呻疹，其中一個問題待解決“美國民主黨/共和黨/中間派的人數(shù)比例是如何隨時間而變化的”

relig_summary <- gss_cat %>%
  group_by(relig) %>%
  summarise(
    age = mean(age, na.rm = TRUE),
    tvhours = mean(tvhours, na.rm = TRUE),
    n = n()
  )

ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) +
  geom_point()

### 排序
relig_summary %>%
  mutate(relig = fct_reorder(relig, tvhours)) %>%
  ggplot(aes(tvhours, relig)) +
    geom_point()

十四章函數(shù)(Functions)

當一段代碼需要多次使用的時候就可以寫函數(shù)來實現(xiàn)吃引。先編寫工作代碼，而后再轉換成函數(shù)的代碼刽锤。包括名稱/參數(shù)/主體代碼

range()返回最小值镊尺、最大值

library(tidyverse)
df <- tibble(a=rnorm(10),
             b=rnorm(10),
             c=rnorm(10),
             d=rnorm(10)
)
x <- df$a
rng <- range(x,na.rm = T) ## range函數(shù)返回（最大值和最小值）
(x-rng[1])/(rng[2]-rng[1])

#### 具體函數(shù)
rescale01 <- function(x){
  rng <- range(x,na.rm = T,finite=T)
  (x-rng[1])/(rng[2]-rng[1])
} ###函數(shù)名稱為rescale01
rescale01(c(df$a,Inf))

#### exercises
#1, parameters
rescale01_v2 <- function(x,na.rm_TorF,finite_TorF){
  rng <- range(x,na.rm = na.rm,finite=finite)
  (x-rng[1])/(rng[2]-rng[1])
}

#2, reverse_Inf

命名的規(guī)則：函數(shù)名一般為動詞，參數(shù)為名詞并思。使用注釋來解釋代碼庐氮。
- 一些廣義的動詞例如：get, compute, calculate, determine。
- 如果一組函數(shù)功能相似宋彼，可以類似于stringr包中的str_combine等改后綴的方法弄砍。
- ==Ctrl + Shift + R== 添加分節(jié)的符號

## exercises
#1,
f1 <- function(string,prefix){
  substr(string,1,nchar(prefix))==prefix
}
f3 <- function(x,y){
  rep(y,length.out(x))
}

條件執(zhí)行(condition execution)：if..else..語句
- condition的值T or F
- if..else語句中使用邏輯表達式：&& ，||
- 向量化操作符： &,| 只可以用于多個值输涕。
- 在測試相等關系時音婶，注意 ==是向量化，容易輸出多個值莱坎。
- if .. else if .. else if .. else.


## exercise2歡迎函數(shù)
greet <- function(time=lubridate::now()){
  hr <- lubridate::hour(time)
  if(hr<12){
    print("Good morning!")
  }else if (hr<18) {
    print("Good afternoon")
  }else{
    print("Good evening")
  }
}

## exercise3
fizzbuzz <- function(x){
  ###限定輸入的內容格式
  stopifnot(length(x)==1)
  stopifnot(is.numeric(x))
  
  if (x%%3==0 && x%%5!=0) {
    print("fizz")
  }else if (x%%5==0 && x%%3!=0) {
    print("buzz")
  }else if (x%%5==0 && x%%3==0) {
    print("fizzbuzz")
  }else{
    print(x)
  }
}

函數(shù)的參數(shù)：主要包括進行計算的數(shù)據(jù)衣式，控制計算過程的細節(jié)，細節(jié)參數(shù)一般都有默認值
- 檢查參數(shù)值：stopifnot()
- ...捕獲任意數(shù)量的未匹配的參數(shù)檐什，將這些捕獲的值傳遞給另一個函數(shù)碴卧。
返回值：通常是返回最后一個語句的值，可以通過return()提前返回一個值
- 檢查語句是否有誤乃正。

## 使用近似正態(tài)分布計算均值兩端的置信區(qū)間
mean_ci <- function(x,confidence=0.95){
  se <- sd(x)/sqrt(length(x))
  alpha <- 1-confidence
  mean(x)+se*qnorm(c(alpha/2,1-alpha/2))
}

十五章住册，向量vectors

一. 向量概括

向量vectors包括兩種：原子向量(atomic vectors)和列表(list)

原子向量 ：包括6種：logical,numeric(integer, double), character, complex, raw。向量種的各個值都是同種類型的瓮具；
列表：遞歸向量荧飞，列表中也可包括其它列表。列表中的各個值可以是不同類型的搭综。
拓展向量：向量中任意添加額外的元數(shù)據(jù)垢箕。
探索函數(shù)：typeof(), length()

image

二. 原子向量

邏輯型(logical)包括三種：TRUE, FALSE, NA
數(shù)值型(numeric):默認數(shù)值為雙精度型double；
- 注意雙精度型double是近似值(approximations)兑巾，所有的雙精度值都當做是近似值處理条获，表示浮點數(shù)(floating point)
- interger的特殊數(shù)據(jù)NA,double的特殊數(shù)據(jù)NA, NaN,Inf,-Inf需要以is.finite()等判斷
字符串(character)：可以包含任意數(shù)量的數(shù)據(jù)。

三. 原子向量的操作

強制轉換：將一種原子強制轉化為另一種蒋歌；或者系統(tǒng)自動轉化
- 顯示型強制轉化：as.numeric,as.character等帅掘；
- 隱式強制轉化：聯(lián)系上下文context自動轉化委煤，如logical(T/F)———>numeric(1/0);

x <- sample(x = 20,size = 100,replace = T)
y <- x>10
mean(y);sum(y)

檢驗是否為某一原子向量：利用purrr包中的函數(shù)is_ ：is_logical(), is_character等
標量與循環(huán)的規(guī)則：處理不同長度的向量會遵循 向量循環(huán)規(guī)則。例如1:10+1:3：R會拓展較短的向量修档，使其與較長的向量一樣長碧绞。然后再相加。
向量命名：所有類型的向量都是可以命名的.向量完成后利用：purrr::set_names(x,nm=c("q","w","e"))命名吱窝。
向量的取子集：filter()函數(shù)對tibble使用讥邻，對于向量的篩選則使用[]。
- 整數(shù)型數(shù)值向量進行篩選：x[c(1,2,2,4)], x[c(-1,-2,-3)]
- ==使用邏輯向量取子集院峡，提取出TRUE值對應的元素==兴使。可利用比較函數(shù)
- 對命名向量可用字符取子集
- x[] 代表選取x中的全部元素照激，對于矩陣十分重要x[2,],x[,-3]发魄；特殊的，[[]] 代表只提取單個元素俩垃。明確表示需要提取單個元素時使用

四. 列表（遞歸向量recrusive vectors）

列表是建立在原子向量基礎上的一種復雜形式励幼，列表可以包含其它列表。列表保存層級結構口柳，樹形結構苹粟。str()，重點關注列表的結構跃闹。

image

列表取子集 :
- []提取子列表六水，返回的是一個子列表；可用邏輯向量辣卒，整數(shù)向量，字符向量提取睛榄。
- 使用[[]]從列表中提取單個元素荣茫，并非子列表。
- 直接使用$场靴，作用相同于[[]]啡莉。綜合使用。x[[1]][[1]]
- 注意于tibble之間使用的異同點旨剥，tibble[1], tibble[[1]]

a <- list(a=1:3,b="a string",c=pi,d=list(-1,-5))
a[c(1:2)]
a["a"]
a[["a"]][1:2]

五. 特性：

任何向量都可以通過其特性來附加任意元數(shù)據(jù)咧欣。可以將特性看作可以附加在任何對象上的一個向量命名列表轨帜。維度魄咕、名稱、類

泛型函數(shù)：可以根據(jù)不同類型的輸入而進行不同的操作蚌父。面向對象編程的關鍵哮兰∶龋可以根據(jù)不同類型的輸入而進行不同的操作。例如 as.Data()

六. 拓展向量：

利用基礎的原子向量和列表構建出的另外一些重要的向量類型喝滞，稱為拓展向量阁将，其具有類的附加特性。

因子：在整數(shù)型向量的基礎上構建的右遭，添加了水平特征做盅。表示有次序的分類數(shù)據(jù)factor(c("ef","cd","ab"),levels=c("ab","cd","ef"))
日期和日期時間：數(shù)值型的向量。
tibble：拓展的列表窘哈。

Charpter16_purrr Iteration

迭代Iteration：對多個列或多個數(shù)據(jù)集進行相同的操作吹榴。兩種迭代方式：

命令型編程：while(), for()。
函數(shù)式編程宵距。減少重復代碼腊尚。

一. For循環(huán)

需為輸出結果分配出足夠的空間。對循環(huán)效率十分重要满哪。
涉及的函數(shù)包括 ==vector("double",ncol(df)), seq_along(df), [[]]==
1. output
2. for loop sequence
3. the body

###16.2 for循環(huán)
require(tidyverse)
df <- tibble(a=rnorm(10),
             b=rnorm(10),
             c=rnorm(10),
             d=rnorm(10))

output <- vector("double",ncol(df))
output
for (i in seq_along(df)) {
  output[i] <- median(df[[i]])
}
output


### exercise Alice the Camel
humps <- c("five", "four","three","two","one","no")
for (i in humps) {
  cat(str_c("Alice the camel has",rep(i,3),"horse",collapse = "\n"),"\n")
}

二. for循環(huán)的變體

修改現(xiàn)有對象：與自定義的函數(shù)function()一起使用婿斥。
- 注意在循環(huán)loop中，所有for循環(huán)使用的都是[[]] 哨鸭，明確表示是要處理單個元素民宿。
- df[[i]] = rescale01(df[[i]])

### create a tibble
df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

### create a function
rescale01 <- function(x) {
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

### use the loop
for (i in )

循環(huán)模式：
- 通過對數(shù)值索引進行循環(huán)，再通過x[[i]]提取相應的值像鸡。
- 通過對元素element索引活鹰。
- 通過使用名稱索引：for (i in names(x))，需要使用元素的名稱時使用只估。有利于圖表標題和文件名
處理未知的輸出長度志群。
- 應該將loop每次的結果保存再一個列表中，循環(huán)結束后再利用unlist() 或者purrr::flatten_dbl()組合成一個向量
- 當遇到類似的問題蛔钙，如生成一個很長的字符串paste(out,collapse="")锌云，或是個很大的數(shù)據(jù)框 ==rbind_rows(), unlist()== 。應首先使用一個更復雜的對象來保存每次迭代的結果吁脱，最后再一次性的組合起來桑涎。

means <- c(0,1,2)
output <- double()
for (i in seq_along(means)) {
  n <- sample(100,1)
  output <- c(output,rnorm(n,mean = means[[i]])) ### 十分不高效，這里會復制上一次的所有數(shù)據(jù)兼贡。
}

##高效法 
out <- vector("list",length = length(means))
for (i in seq_along(means)){
  n <- sample(100,1)
  out[[i]] <- rnorm(n,mean = means[[i]])
}
unlist(out)

處理未知長度的序列攻冷。利用while循環(huán)來實現(xiàn)。

for (i in seq_along(x)){
    # body
}

## equivalent to 
i <- 1
while(i <= length(x)){
    # body
    i<- i + 1
}

Exercises

####exercises
# 1.讀取文件遍希，保存在大的數(shù)據(jù)框中
files <- dir(path = ".",pattern = "tsv$",full.names = F) ###匹配某一路徑下所有的文件等曼。
data_list <- vector("list",length = length(files))
for (i in seq_along(files)) {
  data_list[[i]] <- read_delim(files[[i]],delim = "\t")
}
bind_rows(data_list)

#2.輸出所有df中數(shù)值列的均值及名稱show_mean(iris)
iris <- as_tibble(iris)
show_means <- function(df,digits=2){
  maxstr <- max(str_length(names(df)))
  for (i in names(df)) {
  if (is.numeric(df[[i]])) {
    cat(
      str_c(str_pad(str_c(i,":"),maxstr+1L,side = "right"),
            format(mean(df[[i]]),digits=2,nsmall=2),
                   sep=" "
              ),
     sep =  "\n"  
    )
  }
}
}

三. for循環(huán)與函數(shù)式編程

函數(shù)式編程語言意味著可以先將for循環(huán)包裝在函數(shù)中。即將函數(shù)名作為參數(shù)傳入到另一個函數(shù)中
利用purrr函數(shù)，可以將復雜問題解決成子問題涉兽，然后再通過管道操作將這些問題的結果組合起來招驴。

col_summary <- function(x,fun){
  out <- vector("double",length(x))
  for (i in seq_along(x)) {
    out[[i]]=fun(x[[i]])
  }
  out
}

## so, we can use a funtion in fun parameter.

col_summary(df,median) ## calculate the median value of each column.

四. 映射函數(shù) The map function.

purrr包函數(shù)map()(返回的是列表),map_lgl(),map_dbl(),map_chr()。map()第二個參數(shù)可以是公式枷畏，字符向量别厘，整型向量，==...== 作為.function 的附加參數(shù)拥诡。第三個參數(shù)為第二個函數(shù)中的附加參數(shù)触趴。例如： map_dbl(df, mean, trim = 0.5)。在對列表中的各個子集內部進行篩選渴肉，返回的依舊是個列表

快捷方式(shortcuts)：對某個數(shù)據(jù)集中的每個分組都擬合一個線性模型冗懦。類似于factor中的一些功能。
- ==.== 作為一個代詞仇祭，代表當前的列表元素披蕉。
- ==~== 作為function(x)
- ==~.$r.squared== 結合
- 提取命名成分map_dbl(~.$r.squared), or map_dbl(2)循環(huán)取列表里的第2個值。
BaseR 中的apply函數(shù)組乌奇。lapply(), sapply(), vapply()没讲。而purrr作為Hardley開發(fā)的，具有更加一致的map_*名字和參數(shù)礁苗，. ~的快捷方式爬凑，并行計算，以及計算的進度條试伙。
exercises中涉及的一些函數(shù)：map(-2:2,rnorm, n=5), map(1:3,rnorm,mean=10).
對操作失敗的處理：safely(), possibly(), quietly()

### split 分割出列表
models <- mtcars %>% 
  split(.$cyl) %>% 
  map(function(df) lm(mpg ~ wt,data=df)) ##  df是一個匿名函數(shù)嘁信，可任意命名。
  
## alias:
models <- mtcars %>% split( .$cyl) %>% map( ~ lm(mpg ~ wt,data=.))

### extract the componet
models %>% 
  map(summary) %>% 
  map_dbl(~.$r.squared)
#>     4     6     8 
#> 0.509 0.465 0.423

map(out, ~ .[. > 0.8]) ###篩選每一列中大于0.8的數(shù)據(jù)疏叨。

map(df,function(xxx) length(unique(xxx))`——》`map(df, ~ length(unique(.)))

五. 多參數(shù)映射mapping over multiple arguments

map2(), pmap()對每一列的功能操作會涉及到多個參數(shù)時潘靖，就需要利用此函數(shù)。注意每次調用時候蚤蔓，發(fā)生變化的參數(shù)放在函數(shù)功能的前面位置秘豹，值保持不變的參數(shù)放在映射函數(shù)的后面；參數(shù)為list

mu <- list(5,10,-3)
map(mu, rnorm, n=5)
### map2 的應用
sigma <- list(1,5,10)
map2(mu,sigma,rnorm,n=5)
### pmap()應用昌粤，多個參數(shù)都未定，就將參數(shù)保存到大列表中
n <- list(3,5,8)
args <- list(n=n,sd=sigma,mean=mu)
pmap(args,rnorm)

image

多個參數(shù) 對于pmap()函數(shù)啄刹，可以將多個輸入列表作為參數(shù)涮坐。且多個列表的長度一樣**可以轉換為tibble，確保每列都有名稱誓军，且與其它列具有相同的長度 **

image

params <- tribble(
  ~mean, ~sd, ~n,
    5,     1,  1,
   10,     5,  3,
   -3,    10,  5
)
params %>% 
  pmap(rnorm)
#> [[1]]
#> [1] 6.02
#> 
#> [[2]]
#> [1]  8.68 18.29  6.13
#> 
#> [[3]]
#> [1] -12.24  -5.76  -8.93  -4.22   8.80

多個參數(shù)袱讹，且多個映射函數(shù)，利用invoke_map()函數(shù)，參數(shù)不同捷雕，甚至函數(shù)本身也不同椒丧。（invoke 調用）

六. 游走函數(shù)，預測函數(shù)救巷，歸納與累計

游走函數(shù)walk壶熏，輸出保存多個文件時非常實用。walk(), pwalk()浦译。
- 例如有一個圖形列表和一個文件名的向量棒假，可以使用pwalk()將每個文件保存到相應的位置。

library(ggplot2)
plots <- mtcars %>% split(.$cyl) %>% 
  map( ~ ggplot(data = .,mapping = aes(mpg,wt))+geom_point())
  path=stringi::stri_c(names(plots),".pdf")
  
  pwalk(list(path,plots),ggsave,path=tempdir())

預測函數(shù)keep(), discard()函數(shù)可以保留輸入中預測值為TRUE和FALSE的元素精盅。some(), every()分別用來確定預測值是否對某個元素為真帽哑。detect()找出預測值為真的第一個元素，detect_index()返回找到的元素的位置叹俏。
歸納約簡與累計 reduce/accumulate：復雜列表簡化為一個簡單列表妻枕。

vs <- list(c(1,2,3,4,5),
           c(1,2,3,4),
           c(1,2,3))
reduce(vs,intersect)

charpter21_ggplot2(graphics for communication)

標簽labels:labs()
- title="" 主標題
- subtitle="" 副標題，添加附加信息粘驰；
- caption= " " 描述數(shù)據(jù)來源 右下角添加信息
- x=" ", y=" "坐標軸信息
- color= "" 圖例中的標題
- x= quote() 坐標軸上使用數(shù)學公式

library(ggplot2)
require(tidyverse)
ggplot(mpg,aes(displ,hwy))+geom_point(aes(color=class))+labs(title = "aaaaa",subtitle = "bbbbb",caption = "cccc ",x="HWY",y="DISPL")

注釋Annotation：為單個或分組的觀測值添加標簽屡谐。
- filter(row_number(desc(hwy))==1) 篩選每個組中hwy數(shù)值最高的觀測observation。
- geom_point() + geom_text(aes(label= ),data=)
- 利用ggrepel包可以自動調整注釋標簽的位置 geom_label_repel(aes(label=))
- 將標題加在圖形里的右上角角落上
- 添加參考線geom_hline(),geom_vline(size=2,color=white)
- 感興趣的點周圍添加個矩形 geom_rect()
- 繪制箭頭晴氨，指向關注的數(shù)據(jù)點geom_segment()

library(ggplot2)
library(ggrepel)
best_in_class <- mpg %>% group_by(class) %>% filter(row_number(desc(hwy))==1)
ggplot(mpg,aes(displ,hwy))+geom_point(aes(color=class))+geom_text(mapping = aes(label=model),data = best_in_class)
### 利用ggrepel用較大的空心圓來強調
ggplot(mpg,aes(displ,hwy))+geom_point(aes(color=class))+geom_point(size=4,shape=1,data = best_in_class)+geom_label_repel(aes(label=model),data = best_in_class)

### 將標題加在圖形里的右上角角落上康嘉。
label <- mpg %>% summarise(displ=max(displ),hwy=max(hwy),label="AAAA\nbbbbbbbb")
ggplot(mpg,aes(displ,hwy))+geom_point()+geom_text(aes(label=label),data = label,vjust="top",hjust="right")

標度scale:控制從數(shù)據(jù)值到圖形屬性的映射。scale_x（圖形屬性名稱）_continuous()（連續(xù)型/離散型/時間/日期時間型）
- 調整坐標軸axis ticks： 刻度 scale_y_continuous(breaks=seq(1,40,by=5)),坐標軸圖例項目scale_y_continuous(labels=NULL)
- 圖例legend:theme(legend.position="none/right/left/top/bottom"), 控制圖例的顯示guides(guide_legend())
- 標度替換：對標度進行數(shù)學轉換籽前。
- 顏色的轉換：利用RColorBrewer/ggsci包,對于數(shù)量少的legend亭珍，可以添加形狀映射。

ggplot(mpg,aes(displ,hwy))+geom_point(aes(color=class))+geom_smooth(se=F,color="darkgrey")+theme(legend.position = "bottom")+guides(color=guide_legend(nrow = 1,override.aes = list(size=4)))

library(ggsci)
library(RColorBrewer)
ggplot(mpg,aes(displ,hwy))+geom_point(aes(color=drv))+scale_color_brewer(palette = "Set1")
ggplot(mpg,aes(displ,hwy))+geom_point(aes(color=drv,shape=drv))+scale_color_aaas()

縮放枝哄，控制圖形范圍肄梨。coord_cartesian()函數(shù)設置xlim和ylim的參數(shù)值。
主題(themes)：定制圖形中的非數(shù)據(jù)元素挠锥。theme_bw(),theme_grey()（默認）,theme_classic(),theme_light()
保存圖形ggsave()

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

人面猴
序言：七十年代末众羡，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子蓖租，更是在濱河造成了極大的恐慌粱侣，老刑警劉巖，帶你破解...
沈念sama閱讀 206,723評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件蓖宦，死亡現(xiàn)場離奇詭異齐婴，居然都是意外死亡，警方通過查閱死者的電腦和手機稠茂，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,485評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門柠偶，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人，你說我怎么就攤上這事诱担≌敝ぃ” “怎么了？”我有些...
開封第一講書人閱讀 152,998評論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵蔫仙，是天一觀的道長料睛。經(jīng)常有香客問我，道長匀哄，這世上最難降的妖魔是什么秦效？我笑而不...
開封第一講書人閱讀 55,323評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮涎嚼，結果婚禮上阱州，老公的妹妹穿的比我還像新娘。我一直安慰自己法梯，他們只是感情好苔货，可當我...
茶點故事閱讀 64,355評論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著立哑，像睡著了一般夜惭。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上铛绰，一...
開封第一講書人閱讀 49,079評論 1贊 285
城市分裂傳說
那天诈茧，我揣著相機與錄音里伯，去河邊找鬼拷泽。笑死，一個胖子當著我的面吹牛陷谱，可吹牛的內容都是我干的这嚣。我是一名探鬼主播鸥昏，決...
沈念sama閱讀 38,389評論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼姐帚！你這毒婦竟也來了吏垮？” 一聲冷哼從身側響起，我...
開封第一講書人閱讀 37,019評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤罐旗，失蹤者是張志新（化名）和其女友劉穎膳汪，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體九秀，經(jīng)...
沈念sama閱讀 43,519評論 1贊 300
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡遗嗽，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 35,971評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了颤霎。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 38,100評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖友酱，靈堂內的尸體忽然破棺而出晴音，到底是詐尸還是另有隱情，我是刑警寧澤缔杉，帶...
沈念sama閱讀 33,738評論 4贊 324
?日本核電站爆炸內幕
正文年R本政府宣布锤躁，位于F島的核電站，受9級特大地震影響或详，放射性物質發(fā)生泄漏系羞。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 39,293評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一霸琴、第九天我趴在偏房一處隱蔽的房頂上張望椒振。院中可真熱鬧，春花似錦梧乘、人聲如沸澎迎。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,289評論 0贊 19
一樁弒父案选调，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽夹供。三九已至，卻和暖如春仁堪，著一層夾襖步出監(jiān)牢的瞬間哮洽，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,517評論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工弦聂，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留鸟辅，地道東北人。一個月前我還...
沈念sama閱讀 45,547評論 2贊 354
代替公主和親
正文我出身青樓横浑，卻偏偏與公主長得像剔桨，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子徙融，可洞房花燭夜當晚...
茶點故事閱讀 42,834評論 2贊 345

R數(shù)據(jù)科學 R for data sciences

前言

一 使用ggplot2進行數(shù)據(jù)可視化

二 工作流：基礎 Workflow：basics

三 使用dplyr進行數(shù)據(jù)轉換

五，探索性數(shù)據(jù)分析 exploratory data analysis(EDA)

七， tibble

八柜蜈， 使用readr進行數(shù)據(jù)的導入

九些举，使用tidyr整理數(shù)據(jù)表 Tidy data

九樊诺，使用dplyr處理關系數(shù)據(jù)仗考，多個數(shù)據(jù)表。

十章 使用stringr處理字符串视卢。

10.1 字符串基礎

10.2 正則匹配

10.3 各類匹配操作

10.5 其它類型的匹配

十一章 使用forcats處理因子

十四章 函數(shù)(Functions)

十五章住册， 向量vectors

一. 向量概括

二. 原子向量

三. 原子向量的操作

四. 列表（遞歸向量recrusive vectors）

五. 特性：

六. 拓展向量：

Charpter16_purrr Iteration

一. For循環(huán)

二. for循環(huán)的變體

三. for循環(huán)與函數(shù)式編程

四. 映射函數(shù) The map function.

五. 多參數(shù)映射mapping over multiple arguments

六. 游走函數(shù)，預測函數(shù)救巷，歸納與累計

charpter21_ggplot2(graphics for communication)

推薦閱讀更多精彩內容

一使用ggplot2進行數(shù)據(jù)可視化

二工作流：基礎 Workflow：basics

三使用dplyr進行數(shù)據(jù)轉換

八柜蜈，使用readr進行數(shù)據(jù)的導入

十章使用stringr處理字符串视卢。

十一章使用forcats處理因子

十四章函數(shù)(Functions)

十五章住册，向量vectors