reshape包;tidyr,dplyr包膜赃;鏈?zhǔn)讲僮鞣?>%
一浦妄、reshape2包對數(shù)據(jù)格式進(jìn)行轉(zhuǎn)換
> x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5),
+ data = 1:5)
> y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5),
+ data = 1:5)
> x
k1 k2 data
1 NA 1 1
2 NA NA 2
3 3 NA 3
4 4 4 4
5 5 5 5
> y
k1 k2 data
1 NA NA 1
2 2 NA 2
3 NA 3 3
4 4 4 4
5 5 5 5
如上兩個(gè)數(shù)據(jù)框,無法直接用
rbind和cbind進(jìn)行合并,會(huì)亂
- merge()處理
可以根據(jù)一個(gè)或多個(gè)公有的向量進(jìn)行合并
#根據(jù)k1進(jìn)行合并
> merge(x,y,by = "k1")
k1 k2.x data.x k2.y data.y
1 4 4 4 4 4
2 5 5 5 5 5
3 NA 1 1 NA 1
4 NA 1 1 3 3
5 NA NA 2 NA 1
6 NA NA 2 3 3
#incomparable = T踏施,表示丟掉NA
>> merge(x, y, by = "k2",incomparables = NA)
> merge(x, y, by = "k2",incomparables = NA)
k2 k1.x data.x k1.y data.y
1 4 4 4 4 4
2 5 5 5 5 5
#根據(jù)k1,k2進(jìn)行合并
> merge(x, y, by = c("k1","k2"))
k1 k2 data.x data.y
1 4 4 4 4
2 5 5 5 5
3 NA NA 2 1
- reshape包
reshape包重構(gòu)整個(gè)數(shù)據(jù)的萬能數(shù)據(jù)包罕邀,最新的是reshape2
> install.packages("reshape2")
> library(reshape2)
#與Excel中數(shù)據(jù)透視表功能類似
> help(package = "reshape2")
(1)melt對寬數(shù)據(jù)進(jìn)行處理畅形,得到長數(shù)據(jù)
> head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
#將列名首字母改為小寫
> names(airquality) <- tolower(names(airquality))
> head(airquality)
ozone solar.r wind temp month day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
#使用melt函數(shù)處理數(shù)據(jù)
> melt(airquality)
> aql <- melt(airquality)
#融合后,每一行都是標(biāo)識符和變量的組合诉探,不能有重復(fù)項(xiàng)日熬。數(shù)據(jù)變?yōu)槿小?#其中variable是因子類型
#這也就是寬數(shù)據(jù)變?yōu)殚L數(shù)據(jù)的過程
> head(aql)
variable value
1 ozone 41
2 ozone 36
3 ozone 12
4 ozone 18
5 ozone NA
6 ozone 28
> head(aql,50)
variable value
1 ozone 41
2 ozone 36
3 ozone 12
4 ozone 18
5 ozone NA
6 ozone 28
7 ozone 23
8 ozone 19
9 ozone 8
10 ozone NA
11 ozone 7
12 ozone 16
13 ozone 11
14 ozone 14
15 ozone 18
16 ozone 14
17 ozone 34
18 ozone 6
19 ozone 30
20 ozone 11
21 ozone 1
22 ozone 11
23 ozone 4
24 ozone 32
25 ozone NA
26 ozone NA
27 ozone NA
28 ozone 23
29 ozone 45
30 ozone 115
31 ozone 37
32 ozone NA
33 ozone NA
34 ozone NA
35 ozone NA
36 ozone NA
37 ozone NA
38 ozone 29
39 ozone NA
40 ozone 71
41 ozone 39
42 ozone NA
43 ozone NA
44 ozone 23
45 ozone NA
46 ozone NA
47 ozone 21
48 ozone 37
49 ozone 20
50 ozone 12
#需要設(shè)置,month和day是用來當(dāng)做ID肾胯,其余四個(gè)作為變量值竖席。
#ID就是用來區(qū)分不同行數(shù)之間的變量
#重要!需要區(qū)分哪部分作為行的觀測值敬肚,哪部分作為列的觀測值
> aql <- melt(airquality,id.vars = c("month","day"))
> head(aql,50)
month day variable value
1 5 1 ozone 41
2 5 2 ozone 36
3 5 3 ozone 12
4 5 4 ozone 18
5 5 5 ozone NA
6 5 6 ozone 28
7 5 7 ozone 23
8 5 8 ozone 19
9 5 9 ozone 8
10 5 10 ozone NA
11 5 11 ozone 7
12 5 12 ozone 16
13 5 13 ozone 11
14 5 14 ozone 14
15 5 15 ozone 18
16 5 16 ozone 14
17 5 17 ozone 34
18 5 18 ozone 6
19 5 19 ozone 30
20 5 20 ozone 11
21 5 21 ozone 1
22 5 22 ozone 11
23 5 23 ozone 4
24 5 24 ozone 32
25 5 25 ozone NA
26 5 26 ozone NA
27 5 27 ozone NA
28 5 28 ozone 23
29 5 29 ozone 45
30 5 30 ozone 115
31 5 31 ozone 37
32 6 1 ozone NA
33 6 2 ozone NA
34 6 3 ozone NA
35 6 4 ozone NA
36 6 5 ozone NA
37 6 6 ozone NA
38 6 7 ozone 29
39 6 8 ozone NA
40 6 9 ozone 71
41 6 10 ozone 39
42 6 11 ozone NA
43 6 12 ozone NA
44 6 13 ozone 23
45 6 14 ozone NA
46 6 15 ozone NA
47 6 16 ozone 21
48 6 17 ozone 37
49 6 18 ozone 20
50 6 19 ozone 12
(2)cast將長數(shù)據(jù)變?yōu)閷挃?shù)據(jù)
reshape2將cast分為了幾種
①dcast:處理數(shù)據(jù)框毕荐,讀取melt的結(jié)果,根據(jù)提供的公式進(jìn)行數(shù)據(jù)融合艳馒。
參數(shù):formula憎亚,融合后的數(shù)據(jù)格式。
"~"在R找那個(gè)表示相關(guān)聯(lián)弄慰,說明而這有關(guān)系第美,但不一定是相等
#重錄數(shù)據(jù)
> aqw <- dcast(aql,month ~ variable, fun.aggregate = mean,na.rm = TRUE)
> head(aqw)
month ozone solar.r wind temp
1 5 23.61538 181.2963 11.622581 65.54839
2 6 29.44444 190.1667 10.266667 79.10000
3 7 59.11538 216.4839 8.941935 83.90323
4 8 59.96154 171.8571 8.793548 83.96774
5 9 31.44828 167.4333 10.180000 76.90000
#fun.aggregate也可以設(shè)置為sum等其他函數(shù)
> aqw <- dcast(aql,month ~ variable, fun.aggregate = sum,na.rm = TRUE)
> head(aqw)
month ozone solar.r wind temp
1 5 614 4895 360.3 2032
2 6 265 5705 308.0 2373
3 7 1537 6711 277.2 2601
4 8 1559 4812 272.6 2603
5 9 912 5023 305.4 2307
②acast:返回向量,矩陣或數(shù)組
二曹动、tidyr&dplyr數(shù)據(jù)轉(zhuǎn)換
這兩個(gè)包相對于reshape2斋日,操作更加簡便。
安裝:
> install.packages(c("tidyr","dplyr"))
> library(tidyr)
> library(dplyr)
- tidyr包(Tidy Messy Data)
Overview
The goal of tidyr is to help you create tidy data. Tidy data is data where:
- Every column is variable.
- Every row is an observation.
- Every cell is a single value.
Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you’ll spend less time fighting with the tools and more time working on your analysis. Learn more about tidy data in[vignette("tidy-data")](https://tidyr.tidyverse.org/articles/tidy-data.html)
(1)gather(),將寬數(shù)據(jù)轉(zhuǎn)化為長數(shù)據(jù)墓陈,類似reshape2的melt()
(2)spread()恶守,將長數(shù)據(jù)轉(zhuǎn)化為寬數(shù)據(jù),類似reshape2的cast()
(3)unit()贡必,將多列合并為一列
(4)separate()兔港,將一列分為多列
以mtcars數(shù)據(jù)集作為演示
#mtcars每一列是變量,每一行是觀測值
#取部分?jǐn)?shù)據(jù)進(jìn)行演示
> tdata <- mtcars[1:10,1:3]
#汽車名是以行名存在的仔拟,對數(shù)據(jù)進(jìn)行處理衫樊,將行名添加到數(shù)據(jù)中
> tdata <- data.frame(names = rownames(tdata),tdata)
names mpg cyl disp
Mazda RX4 Mazda RX4 21.0 6 160.0
Mazda RX4 Wag Mazda RX4 Wag 21.0 6 160.0
Datsun 710 Datsun 710 22.8 4 108.0
Hornet 4 Drive Hornet 4 Drive 21.4 6 258.0
Hornet Sportabout Hornet Sportabout 18.7 8 360.0
Valiant Valiant 18.1 6 225.0
Duster 360 Duster 360 14.3 8 360.0
Merc 240D Merc 240D 24.4 4 146.7
Merc 230 Merc 230 22.8 4 140.8
Merc 280 Merc 280 19.2 6 167.6
(1)gather()函數(shù)
優(yōu)點(diǎn):固定列不變,其他列進(jìn)行轉(zhuǎn)換
> gather(tdata, key = "Key",value = "Value",cyl,disp,mpg)
> gather(tdata, key = "Key",value = "Value",cyl,disp,mpg)
names Key Value
1 Mazda RX4 cyl 6.0
2 Mazda RX4 Wag cyl 6.0
3 Datsun 710 cyl 4.0
4 Hornet 4 Drive cyl 6.0
5 Hornet Sportabout cyl 8.0
6 Valiant cyl 6.0
7 Duster 360 cyl 8.0
8 Merc 240D cyl 4.0
9 Merc 230 cyl 4.0
10 Merc 280 cyl 6.0
11 Mazda RX4 disp 160.0
12 Mazda RX4 Wag disp 160.0
13 Datsun 710 disp 108.0
14 Hornet 4 Drive disp 258.0
15 Hornet Sportabout disp 360.0
16 Valiant disp 225.0
17 Duster 360 disp 360.0
18 Merc 240D disp 146.7
19 Merc 230 disp 140.8
20 Merc 280 disp 167.6
21 Mazda RX4 mpg 21.0
22 Mazda RX4 Wag mpg 21.0
23 Datsun 710 mpg 22.8
24 Hornet 4 Drive mpg 21.4
25 Hornet Sportabout mpg 18.7
26 Valiant mpg 18.1
27 Duster 360 mpg 14.3
28 Merc 240D mpg 24.4
29 Merc 230 mpg 22.8
30 Merc 280 mpg 19.2
# : 表示將某些列聚集到同一列中
> gather(tdata, key = "Key",value = "Value",cyl:disp,mpg)
# - 減去不需要的列
> gather(tdata, key = "Key",value = "Value",cyl,-disp)
names mpg disp Key Value
1 Mazda RX4 21.0 160.0 cyl 6
2 Mazda RX4 Wag 21.0 160.0 cyl 6
3 Datsun 710 22.8 108.0 cyl 4
4 Hornet 4 Drive 21.4 258.0 cyl 6
5 Hornet Sportabout 18.7 360.0 cyl 8
6 Valiant 18.1 225.0 cyl 6
7 Duster 360 14.3 360.0 cyl 8
8 Merc 240D 24.4 146.7 cyl 4
9 Merc 230 22.8 140.8 cyl 4
10 Merc 280 19.2 167.6 cyl 6
#若敲列的名字容易敲錯(cuò),也可以敲編號
> gdata <- gather(tdata, key = "Key",value = "Value",2:4)
(2) spread()函數(shù)
與gather相反
#首先確定哪一列打散
> spread(gdata, key = "Key",value = "Value")
names cyl disp mpg
1 Datsun 710 4 108.0 22.8
2 Duster 360 8 360.0 14.3
3 Hornet 4 Drive 6 258.0 21.4
4 Hornet Sportabout 8 360.0 18.7
5 Mazda RX4 6 160.0 21.0
6 Mazda RX4 Wag 6 160.0 21.0
7 Merc 230 4 140.8 22.8
8 Merc 240D 4 146.7 24.4
9 Merc 280 6 167.6 19.2
10 Valiant 6 225.0 18.1
(3)separate()
可以將一列拆分成多列
> df <- data.frame(x = c(NA,"a.b","a.d","b.c"))
> df
x
1 <NA>
2 a.b
3 a.d
4 b.c
#將這一列科侈,按“.”分為兩列
> separate(df,col = x,into = c("A","B"))
A B
1 <NA> <NA>
2 a b
3 a d
4 b c
> df <- data.frame(x = c(NA,"a.b-c","a-d","b-c"))
> separate(df,col = x,into = c("A","B"))
A B
1 <NA> <NA>
2 a b
3 a d
4 b c
Warning message:
Expected 2 pieces. Additional pieces discarded in 1 rows [2].
#可看出第一個(gè)還是按“.”分割载佳,c的值被丟掉了
#指定sep參數(shù)為連字符
> separate(df,col = x,into = c("A","B"),sep = "-")
A B
1 <NA> <NA>
2 a.b c
3 a d
4 b c
(4)unite()
將separate之后的數(shù)據(jù)連接起來
> x <- separate(df,col = x,into = c("A","B"),sep = "-")
> unite(x,col = "AB",A,B,sep = "-")
AB
1 NA-NA
2 a.b-c
3 a-d
4 b-c
- dplyr包
不僅可以對單個(gè)表格操作,還可以對雙表格操作
[1] "%>%" "across" "add_count"
[4] "add_count_" "add_row" "add_rownames"
[7] "add_tally" "add_tally_" "all_equal"
[10] "all_of" "all_vars" "anti_join"
[13] "any_of" "any_vars" "arrange"
[16] "arrange_" "arrange_all" "arrange_at"
[19] "arrange_if" "as.tbl" "as_data_frame"
[22] "as_label" "as_tibble" "auto_copy"
[25] "band_instruments" "band_instruments2" "band_members"
[28] "bench_tbls" "between" "bind_cols"
[31] "bind_rows" "c_across" "case_when"
[34] "changes" "check_dbplyr" "coalesce"
[37] "collapse" "collect" "combine"
[40] "common_by" "compare_tbls" "compare_tbls2"
[43] "compute" "contains" "copy_to"
[46] "count" "count_" "cumall"
[49] "cumany" "cume_dist" "cummean"
[52] "cur_column" "cur_data" "cur_data_all"
[55] "cur_group" "cur_group_id" "cur_group_rows"
[58] "current_vars" "data_frame" "data_frame_"
[61] "db_analyze" "db_begin" "db_commit"
[64] "db_create_index" "db_create_indexes" "db_create_table"
[67] "db_data_type" "db_desc" "db_drop_table"
[70] "db_explain" "db_has_table" "db_insert_into"
[73] "db_list_tables" "db_query_fields" "db_query_rows"
[76] "db_rollback" "db_save_query" "db_write_table"
[79] "dense_rank" "desc" "dim_desc"
[82] "distinct" "distinct_" "distinct_all"
[85] "distinct_at" "distinct_if" "distinct_prepare"
[88] "do" "do_" "dplyr_col_modify"
[91] "dplyr_reconstruct" "dplyr_row_slice" "ends_with"
[94] "enexpr" "enexprs" "enquo"
[97] "enquos" "ensym" "ensyms"
[100] "eval_tbls" "eval_tbls2" "everything"
[103] "explain" "expr" "failwith"
[106] "filter" "filter_" "filter_all"
[109] "filter_at" "filter_if" "first"
[112] "frame_data" "full_join" "funs"
[115] "funs_" "glimpse" "group_by"
[118] "group_by_" "group_by_all" "group_by_at"
[121] "group_by_drop_default" "group_by_if" "group_by_prepare"
[124] "group_cols" "group_data" "group_indices"
[127] "group_indices_" "group_keys" "group_map"
[130] "group_modify" "group_nest" "group_rows"
[133] "group_size" "group_split" "group_trim"
[136] "group_vars" "group_walk" "grouped_df"
[139] "groups" "id" "ident"
[142] "if_else" "inner_join" "intersect"
[145] "is.grouped_df" "is.src" "is.tbl"
[148] "is_grouped_df" "lag" "last"
[151] "last_col" "lead" "left_join"
[154] "location" "lst" "lst_"
[157] "make_tbl" "matches" "min_rank"
[160] "mutate" "mutate_" "mutate_all"
[163] "mutate_at" "mutate_each" "mutate_each_"
[166] "mutate_if" "n" "n_distinct"
[169] "n_groups" "na_if" "near"
[172] "nest_by" "nest_join" "new_grouped_df"
[175] "nth" "ntile" "num_range"
[178] "one_of" "order_by" "percent_rank"
[181] "progress_estimated" "pull" "quo"
[184] "quo_name" "quos" "recode"
[187] "recode_factor" "relocate" "rename"
[190] "rename_" "rename_all" "rename_at"
[193] "rename_if" "rename_vars" "rename_vars_"
[196] "rename_with" "right_join" "row_number"
[199] "rows_delete" "rows_insert" "rows_patch"
[202] "rows_update" "rows_upsert" "rowwise"
[205] "same_src" "sample_frac" "sample_n"
[208] "select" "select_" "select_all"
[211] "select_at" "select_if" "select_var"
[214] "select_vars" "select_vars_" "semi_join"
[217] "setdiff" "setequal" "show_query"
[220] "slice" "slice_" "slice_head"
[223] "slice_max" "slice_min" "slice_sample"
[226] "slice_tail" "sql" "sql_escape_ident"
[229] "sql_escape_string" "sql_join" "sql_select"
[232] "sql_semi_join" "sql_set_op" "sql_subquery"
[235] "sql_translate_env" "src" "src_df"
[238] "src_local" "src_mysql" "src_postgres"
[241] "src_sqlite" "src_tbls" "starts_with"
[244] "starwars" "storms" "summarise"
[247] "summarise_" "summarise_all" "summarise_at"
[250] "summarise_each" "summarise_each_" "summarise_if"
[253] "summarize" "summarize_" "summarize_all"
[256] "summarize_at" "summarize_each" "summarize_each_"
[259] "summarize_if" "sym" "syms"
[262] "tally" "tally_" "tbl"
[265] "tbl_df" "tbl_nongroup_vars" "tbl_ptype"
[268] "tbl_sum" "tbl_vars" "tibble"
[271] "top_frac" "top_n" "transmute"
[274] "transmute_" "transmute_all" "transmute_at"
[277] "transmute_if" "tribble" "trunc_mat"
[280] "type_sum" "ungroup" "union"
[283] "union_all" "validate_grouped_df" "vars"
[286] "with_groups" "with_order" "wrap_dbplyr_obj"
##一共有288個(gè)函數(shù)
(1)對單個(gè)表格進(jìn)行操作
①過濾filter
#將長度小于等于7的數(shù)據(jù)過濾掉
> dplyr::filter(iris,Sepal.Length>7)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 7.1 3.0 5.9 2.1 virginica
2 7.6 3.0 6.6 2.1 virginica
3 7.3 2.9 6.3 1.8 virginica
4 7.2 3.6 6.1 2.5 virginica
5 7.7 3.8 6.7 2.2 virginica
6 7.7 2.6 6.9 2.3 virginica
7 7.7 2.8 6.7 2.0 virginica
8 7.2 3.2 6.0 1.8 virginica
9 7.2 3.0 5.8 1.6 virginica
10 7.4 2.8 6.1 1.9 virginica
11 7.9 3.8 6.4 2.0 virginica
12 7.7 3.0 6.1 2.3 virginica
由于dplyr的函數(shù)太多臀栈,所以一般運(yùn)用函數(shù)時(shí)蔫慧,是dplyr::函數(shù)名,這樣就不會(huì)出現(xiàn)歧義
②去除重復(fù)行distinct()
類似unique
> dplyr::distinct(rbind(iris[1:10,],iris[1:15,]))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
14 4.3 3.0 1.1 0.1 setosa
15 5.8 4.0 1.2 0.2 setosa
③切片slice()
用于取出數(shù)據(jù)的任意行
> dplyr::slice(iris,10:15)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.1 1.5 0.1 setosa
2 5.4 3.7 1.5 0.2 setosa
3 4.8 3.4 1.6 0.2 setosa
4 4.8 3.0 1.4 0.1 setosa
5 4.3 3.0 1.1 0.1 setosa
6 5.8 4.0 1.2 0.2 setosa
④隨機(jī)取樣sample_n()
> dplyr::sample_n(iris,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 6.9 3.1 4.9 1.5 versicolor
2 5.4 3.9 1.7 0.4 setosa
3 5.9 3.0 4.2 1.5 versicolor
4 6.3 2.9 5.6 1.8 virginica
5 5.1 3.5 1.4 0.2 setosa
6 6.1 2.9 4.7 1.4 versicolor
7 5.5 3.5 1.3 0.2 setosa
8 5.7 2.9 4.2 1.3 versicolor
9 5.8 2.8 5.1 2.4 virginica
10 4.8 3.4 1.6 0.2 setosa
⑤按比例隨機(jī)選取sample_frac()
> dplyr::sample_frac(iris,0.1)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 6.7 3.3 5.7 2.1 virginica
2 5.5 2.4 3.8 1.1 versicolor
3 5.1 3.5 1.4 0.2 setosa
4 4.9 2.4 3.3 1.0 versicolor
5 5.5 3.5 1.3 0.2 setosa
6 4.8 3.0 1.4 0.3 setosa
7 5.6 3.0 4.1 1.3 versicolor
8 6.7 3.3 5.7 2.5 virginica
9 4.8 3.1 1.6 0.2 setosa
10 6.1 2.6 5.6 1.4 virginica
11 6.0 2.9 4.5 1.5 versicolor
12 4.6 3.6 1.0 0.2 setosa
13 4.4 3.2 1.3 0.2 setosa
14 4.4 3.0 1.3 0.2 setosa
15 6.5 3.0 5.8 2.2 virginica
⑥排序arrange()
#按花的長度進(jìn)行排序
> dplyr::arrange(iris,Sepal.Length)
#按相反方向進(jìn)行排序
> dplyr::arrange(iris,desc(Sepal.Length))
⑦取子集select()
#理解select功能
> ?select
⑧統(tǒng)計(jì)函數(shù)summarise()
> summarise(iris,avg = mean(Sepal.Length))
avg
1 5.843333
#還可將mean換成sum权薯,計(jì)算總長度
鏈?zhǔn)讲僮鞣?>%
用于實(shí)現(xiàn)將一個(gè)函數(shù)的輸出傳遞給下一個(gè)函數(shù)姑躲,作為下一個(gè)函數(shù)的輸入,相當(dāng)于是管道函數(shù)
在Rstudio中可以使用Ctrl+shift+M快捷鍵輸出
> head(mtcars,20)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#取出數(shù)據(jù)第十一到第十二行
> head(mtcars,20) %>% tail(10)
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
⑨分組group_by
> dplyr::group_by(iris,Species)
# A tibble: 150 x 5
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# ... with 140 more rows
> iris %>% group_by(Species)
# A tibble: 150 x 5
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# ... with 140 more rows
##再計(jì)算一下平均值
> iris %>% group_by(Species) %>% summarise(avg=mean(Sepal.Width))
# A tibble: 3 x 2
Species avg
* <fct> <dbl>
1 setosa 3.43
2 versicolor 2.77
3 virginica 2.97
##再按結(jié)果大小進(jìn)行排序
> iris %>% group_by(Species) %>% summarise(avg=mean(Sepal.Width)) %>% arrange(avg)
# A tibble: 3 x 2
Species avg
<fct> <dbl>
1 versicolor 2.77
2 virginica 2.97
3 setosa 3.43
⑩添加變量mutate()
> dplyr::mutate(iris,new = Sepal.Length+Petal.Length)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
1 5.1 3.5 1.4 0.2 setosa 6.5
2 4.9 3.0 1.4 0.2 setosa 6.3
3 4.7 3.2 1.3 0.2 setosa 6.0
4
……
(2)對雙表格操作
主要是如何將兩個(gè)表格進(jìn)行整合
多種方式:左連接盟蚣,右連接黍析,內(nèi)連接,全連接屎开,半連接阐枣,反連接等
#創(chuàng)建兩個(gè)數(shù)據(jù)框
> a=data.frame(x1=c("A","B","C"),x2=c(1,2,3))
> b=data.frame(x1=c("A","B","D"),x3=c(T,F,T))
> a
x1 x2
1 A 1
2 B 2
3 C 3
> b
x1 x3
1 A TRUE
2 B FALSE
3 D TRUE
①左連接
以左邊的表為基礎(chǔ)
> dplyr::left_join(a,b,by="x1")
x1 x2 x3
1 A 1 TRUE
2 B 2 FALSE
3 C 3 NA
②右連接
> dplyr::right_join(a,b,by="x1")
x1 x2 x3
1 A 1 TRUE
2 B 2 FALSE
3 D NA TRUE
③內(nèi)連接(取x1的交集)
全連接(取x1的并集)
> dplyr::full_join(a,b,by="x1")
x1 x2 x3
1 A 1 TRUE
2 B 2 FALSE
3 C 3 NA
4 D NA TRUE
> dplyr::inner_join(a,b,by="x1")
x1 x2 x3
1 A 1 TRUE
2 B 2 FALSE
④半連接(根據(jù)右側(cè)表內(nèi)容,對左側(cè)表進(jìn)行過濾牍戚,即將a中與b的交集取出來)
反連接(根據(jù)右側(cè)表內(nèi)容過濾侮繁,但是是將a中與b的補(bǔ)集取出來)
主要是進(jìn)行集合的各種運(yùn)算
> dplyr::semi_join(a,b,by="x1")
x1 x2
1 A 1
2 B 2
> dplyr::anti_join(a,b,by="x1")
x1 x2
1 C 3
數(shù)據(jù)集合并
intrsect union secdev
本質(zhì):集合運(yùn)算
以mtcars數(shù)據(jù)集作為演示
#slice取出的數(shù)據(jù)不包含行名
#使用mute()為數(shù)據(jù)集添加一行
> mtcars <- mutate(mtcars,Model=rownames(mtcars))
> first <- slice(mtcars,1:20)
> second <- slice (mtcars,10:30)
##這兩者之間有重合部分
##接下來測試這些函數(shù)
#intersect()取交集
> intersect(first, second)
mpg cyl disp hp drat wt qsec vs am gear carb Model
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla
#union_all取并集
> union_all(first, second)
mpg cyl disp hp drat wt qsec vs am gear carb Model
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230
Merc 280...10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280
Merc 280C...11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C
Merc 450SE...12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE
Merc 450SL...13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL
Merc 450SLC...14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC
Cadillac Fleetwood...15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood
Lincoln Continental...16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental
Chrysler Imperial...17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial
Fiat 128...18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128
Honda Civic...19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic
Toyota Corolla...20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla
Merc 280...21 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280
Merc 280C...22 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C
Merc 450SE...23 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE
Merc 450SL...24 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL
Merc 450SLC...25 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC
Cadillac Fleetwood...26 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood
Lincoln Continental...27 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental
Chrysler Imperial...28 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial
Fiat 128...29 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128
Honda Civic...30 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic
Toyota Corolla...31 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino
#union取非冗余的并集
> union(first, second)
mpg cyl disp hp drat wt qsec vs am gear carb Model
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino
#setdiff()取first補(bǔ)集
> setdiff(first, second)
mpg cyl disp hp drat wt qsec vs am gear carb Model
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230
#取second的補(bǔ)集
> setdiff(second, first)
mpg cyl disp hp drat wt qsec vs am gear carb Model
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino