data.table語法介紹
因?yàn)檫@篇文章主要是data.table,所以在詳細(xì)對比之前悬而,先來介紹一下dplyr的情況
dplyr的優(yōu)點(diǎn)在于語法優(yōu)雅酪碘,符合人的邏輯帖世,簡單易懂取董;而data.table則在于語法簡介棍苹,運(yùn)行速度快,對于大數(shù)據(jù)來說非常強(qiáng)大茵汰,但是語法有時候也不太容易理解
dplyr包經(jīng)常用的函數(shù)
- select(),選擇列
- filter(),篩選行
- mutate(),增加新列枢里,類似于transform
- group_by,分組
- summarise(),匯總數(shù)據(jù)
data.table
data.table的通用格式為DT[i,j,by],i代表行蹂午,j代表列栏豺,by代表分組依據(jù)
這里的話我們選用iris數(shù)據(jù)集來進(jìn)行說明
> DT <- data.table(iris)
> set.seed(45L)
> DT[,c("V1","V2"):=list(LETTERS[1:3],c(1L,2L))]
> names(DT) <- tolower(names(DT))
> head(DT)
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa A 1
2: 4.9 3.0 1.4 0.2 setosa B 2
3: 4.7 3.2 1.3 0.2 setosa C 1
4: 4.6 3.1 1.5 0.2 setosa A 2
5: 5.0 3.6 1.4 0.2 setosa B 1
6: 5.4 3.9 1.7 0.4 setosa C 2
1.通過i來篩選行
- 通過行數(shù)
選取3到5行的數(shù)據(jù)
> DT[3:5,] #or DT[3:5]
sepal.length sepal.width petal.length petal.width species v1 v2
1: 4.7 3.2 1.3 0.2 setosa C 1
2: 4.6 3.1 1.5 0.2 setosa A 2
3: 5.0 3.6 1.4 0.2 setosa B 1
- 通過特定條件
這里是用"=="這種方式,這種方式雖然簡單易懂豆胸,但是會遍歷整個數(shù)組奥洼,速度會有點(diǎn)慢,所以建議設(shè)置鍵配乱,后面會有講到
> head(DT[species=='setosa'])
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa A 1
2: 4.9 3.0 1.4 0.2 setosa B 2
3: 4.7 3.2 1.3 0.2 setosa C 1
4: 4.6 3.1 1.5 0.2 setosa A 2
5: 5.0 3.6 1.4 0.2 setosa B 1
6: 5.4 3.9 1.7 0.4 setosa C 2
> tail(DT[species=='setosa'])
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.8 1.9 0.4 setosa C 1
2: 4.8 3.0 1.4 0.3 setosa A 2
3: 5.1 3.8 1.6 0.2 setosa B 1
4: 4.6 3.2 1.4 0.2 setosa C 2
5: 5.3 3.7 1.5 0.2 setosa A 1
6: 5.0 3.3 1.4 0.2 setosa B 2
> head(DT[species %in% c("setosa","versicolor")]) #這兩代表或的意思
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa A 1
2: 4.9 3.0 1.4 0.2 setosa B 2
3: 4.7 3.2 1.3 0.2 setosa C 1
4: 4.6 3.1 1.5 0.2 setosa A 2
5: 5.0 3.6 1.4 0.2 setosa B 1
6: 5.4 3.9 1.7 0.4 setosa C 2
> tail(DT[species %in% c("setosa","versicolor")])
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.6 2.7 4.2 1.3 versicolor B 1
2: 5.7 3.0 4.2 1.2 versicolor C 2
3: 5.7 2.9 4.2 1.3 versicolor A 1
4: 6.2 2.9 4.3 1.3 versicolor B 2
5: 5.1 2.5 3.0 1.1 versicolor C 1
6: 5.7 2.8 4.1 1.3 versicolor A 2
> head(DT[sepal.length %between% c(4.5,5)])
sepal.length sepal.width petal.length petal.width species v1 v2
1: 4.9 3.0 1.4 0.2 setosa B 2
2: 4.7 3.2 1.3 0.2 setosa C 1
3: 4.6 3.1 1.5 0.2 setosa A 2
4: 5.0 3.6 1.4 0.2 setosa B 1
5: 4.6 3.4 1.4 0.3 setosa A 1
6: 5.0 3.4 1.5 0.2 setosa B 2
> tail(DT[sepal.length %between% c(4.5,5)])
sepal.length sepal.width petal.length petal.width species v1 v2
1: 4.6 3.2 1.4 0.2 setosa C 2
2: 5.0 3.3 1.4 0.2 setosa B 2
3: 4.9 2.4 3.3 1.0 versicolor A 2
4: 5.0 2.0 3.5 1.0 versicolor A 1
5: 5.0 2.3 3.3 1.0 versicolor A 2
6: 4.9 2.5 4.5 1.7 virginica B 1
2.通過j來對列進(jìn)行操作
2.1 選取列
- 選取一列
.()相當(dāng)于list()
> head(DT[,sepal.width]) #以向量形式展現(xiàn)
[1] 3.5 3.0 3.2 3.1 3.6 3.9
> head(DT[,.(sepal.width)]) #數(shù)據(jù)框的形式展現(xiàn)
sepal.width
1: 3.5
2: 3.0
3: 3.2
4: 3.1
5: 3.6
6: 3.9
- 選取多列
> head(DT[,.(sepal.width,sepal.length)])
sepal.width sepal.length
1: 3.5 5.1
2: 3.0 4.9
3: 3.2 4.7
4: 3.1 4.6
5: 3.6 5.0
6: 3.9 5.4
- 用列數(shù)來選取行
> head(DT[,1,with=FALSE]) #選取第一列
sepal.length
1: 5.1
2: 4.9
3: 4.7
4: 4.6
5: 5.0
6: 5.4
> head(DT[,2,with=FALSE]) #選取第二列
sepal.width
1: 3.5
2: 3.0
3: 3.2
4: 3.1
5: 3.6
6: 3.9
> head(DT[,3,with=FALSE]) #選取第三列
petal.length
1: 1.4
2: 1.4
3: 1.3
4: 1.5
5: 1.4
6: 1.7
2.2 在j上使用函數(shù)
> DT[,sum(sepal.width)]
[1] 458.6
> DT[,.(sum(sepal.width))]
V1
1: 458.6
> DT[,.(SUM=sum(sepal.width))] #可以重命名
SUM
1: 458.6
- 選取列和使用函數(shù)可以一起用
如果列的長度不一溉卓,則會循環(huán)對齊
> head(DT[,.(sepal.width,sd=sd(sepal.width))])
sepal.width sd
1: 3.5 0.4358663
2: 3.0 0.4358663
3: 3.2 0.4358663
4: 3.1 0.4358663
5: 3.6 0.4358663
6: 3.9 0.4358663
- 多個表達(dá)式可以包含在大括號中
> DT[,{print(head(sepal.width))
+ plot(sepal.width)
+ NULL}]
[1] 3.5 3.0 3.2 3.1 3.6 3.9
#這里應(yīng)該是一副散點(diǎn)圖,在代碼塊不好展示圖(主要是懶)
NULL
3.根據(jù)分組來操作j
- 對species中的每一類來計(jì)算sepal.length的和
> DT[,.(SUM=sum(sepal.length),by=species)]
SUM by
1: 876.5 setosa
2: 876.5 setosa
3: 876.5 setosa
4: 876.5 setosa
5: 876.5 setosa
---
146: 876.5 virginica
147: 876.5 virginica
148: 876.5 virginica
149: 876.5 virginica
150: 876.5 virginica
#注意by加.()和沒加.()的區(qū)別
> DT[,.(SUM=sum(sepal.length)),by=.(species)]
species SUM
1: setosa 250.3
2: versicolor 296.8
3: virginica 329.4
- 對多列進(jìn)行分組
> DT[,.(SUM=sum(sepal.width)),by=.(species,v1)]
species v1 SUM
1: setosa A 59.0
2: setosa B 58.6
3: setosa C 53.8
4: versicolor C 46.5
5: versicolor A 45.5
6: versicolor B 46.5
7: virginica B 51.4
8: virginica C 49.6
9: virginica A 47.7
- 在by中使用函數(shù)
> DT[,.(SUM=sum(sepal.length)),by=sign(v2-1)]
sign SUM
1: 0 438.0
2: 1 438.5
- 指定i行子集進(jìn)行分組匯總
> DT[1:40,.(SUM=sum(sepal.length)),by=species]
species SUM
1: setosa 201.5
- 使用.N來計(jì)算每個分組的個數(shù)
> DT[,.(count=.N),by=species]
species count
1: setosa 50
2: versicolor 50
3: virginica 50
4.使用:=來增加搬泥,更改,減少列
注意:用了:=這種方法伏尼,會直接在原數(shù)據(jù)集上進(jìn)行更改忿檩,所以DT <- DT[,:=]是不需要的,直接DT[,:=]就可以了
- 更新一列
> dt <- copy(DT)
> head(dt)
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa A 1
2: 4.9 3.0 1.4 0.2 setosa B 2
3: 4.7 3.2 1.3 0.2 setosa C 1
4: 4.6 3.1 1.5 0.2 setosa A 2
5: 5.0 3.6 1.4 0.2 setosa B 1
6: 5.4 3.9 1.7 0.4 setosa C 2
> head(dt[,v1:=round(exp(v2),2)])
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa 3 1
2: 4.9 3.0 1.4 0.2 setosa 7 2
3: 4.7 3.2 1.3 0.2 setosa 3 1
4: 4.6 3.1 1.5 0.2 setosa 7 2
5: 5.0 3.6 1.4 0.2 setosa 3 1
6: 5.4 3.9 1.7 0.4 setosa 7 2
- 增加多列
> dt[,c("h1","h2"):=.(round(exp(v2)),LETTERS[4:6])]
> head(dt)
sepal.length sepal.width petal.length petal.width species v1 v2 h1 h2
1: 5.1 3.5 1.4 0.2 setosa 3 1 3 D
2: 4.9 3.0 1.4 0.2 setosa 7 2 7 E
3: 4.7 3.2 1.3 0.2 setosa 3 1 3 F
4: 4.6 3.1 1.5 0.2 setosa 7 2 7 D
5: 5.0 3.6 1.4 0.2 setosa 3 1 3 E
6: 5.4 3.9 1.7 0.4 setosa 7 2 7 F
# 上面可以可以寫成,因?yàn)檎故痉奖惚祝薷氖侵贿x取了第5至第9列數(shù)據(jù)
> head(dt[,':='(h1=round(exp(v2)),h2=LETTERS[4:6])][,5:9])
species v1 v2 h1 h2
1: setosa A 1 3 D
2: setosa B 2 7 E
3: setosa C 1 3 F
4: setosa A 2 7 D
5: setosa B 1 3 E
6: setosa C 2 7 F
- 刪除列
> dt[,':='(h1=NULL,h2=NULL)]
> head(dt)
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa A 1
2: 4.9 3.0 1.4 0.2 setosa B 2
3: 4.7 3.2 1.3 0.2 setosa C 1
4: 4.6 3.1 1.5 0.2 setosa A 2
5: 5.0 3.6 1.4 0.2 setosa B 1
6: 5.4 3.9 1.7 0.4 setosa C 2
也可以寫成下面這種
----------
> head(dt[,c("h1","h2"):=NULL])
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa A 1
2: 4.9 3.0 1.4 0.2 setosa B 2
3: 4.7 3.2 1.3 0.2 setosa C 1
4: 4.6 3.1 1.5 0.2 setosa A 2
5: 5.0 3.6 1.4 0.2 setosa B 1
6: 5.4 3.9 1.7 0.4 setosa C 2
- 修改特定條件下的值
> dt[sepal.length>4&v1=='A',v2:=3]
> head(dt[,.(v2)])
v2
1: 3
2: 2
3: 1
4: 3
5: 1
6: 2
5.設(shè)置索引列并進(jìn)行操作
- 在創(chuàng)建數(shù)據(jù)框時就直接設(shè)定索引列
data <- data.table(a=c('A','B','C','A','A','B'),b=rnorm(6),key="a")
> head(data)
a b
1: A 0.3407997
2: A -0.7460474
3: A -0.8981073
4: B -0.7033403
5: B -0.3347941
6: C -0.3795377
- 有數(shù)據(jù)框之后再設(shè)定
> dt <- data.table(a=c('A','B','C','A','A','B'),b=rnorm(6))
> dt
a b
1: A -0.5013782
2: B -0.1745357
3: C 1.8090374
4: A -0.2301050
5: A -1.1304182
6: B 0.2159889
#仔細(xì)對比兩個dt的值
> setkey(dt,a) #會自動對鍵值列進(jìn)行排序
> dt
a b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: B -0.1745357
5: B 0.2159889
6: C 1.8090374
- 查看數(shù)據(jù)框時候有key
> key(dt)
[1] "a"
> haskey(dt)
[1] TRUE
> attributes(dt)
$names
[1] "a" "b"
$row.names
[1] 1 2 3 4 5 6
$class
[1] "data.table" "data.frame"
$.internal.selfref
<pointer: 0x10180cf78>
$sorted
[1] "a"
> attributes(dt)$sorted
[1] "a"
- 設(shè)置a列為索引列后取a列中值為B的行
> dt['B']
a b
1: B -0.1745357
2: B 0.2159889
- 設(shè)置索引之后取a列中值為B的第一行
> dt['B',mult='first'] #mult參數(shù)默認(rèn)為"all"
a b
1: B -0.1745357
- 設(shè)置索引之后取a列中值為B的最后一行
> dt['B',mult='last']
a b
1: B 0.2159889
- 設(shè)置a列為索引列后取a列中值為A或B的行
> dt[c('A','B')]
a b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: B -0.1745357
5: B 0.2159889
- nomatch參數(shù)用于給定在沒有匹配到值得時候該給予什么值燥透,默認(rèn)為NA,也可以設(shè)置為0,0代表對于沒有匹配到的行將不會返回
> dt[c('A','D')]
a b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: D NA
----------
> dt[c('A','D'),nomatch=0]
a b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
- by=.EACHI參數(shù)允許按每一個已知i的子集分組辨图,使用前必須先設(shè)置鍵值列
> dt[c('A','B'),sum(b)]
[1] -1.820448
----------
> dt[c('A','B'),sum(b),by=.EACHI]
a V1
1: A -1.86190135
2: B 0.04145319
- 設(shè)置多個鍵值列
> head(DT)
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa A 1
2: 4.9 3.0 1.4 0.2 setosa B 2
3: 4.7 3.2 1.3 0.2 setosa C 1
4: 4.6 3.1 1.5 0.2 setosa A 2
5: 5.0 3.6 1.4 0.2 setosa B 1
6: 5.4 3.9 1.7 0.4 setosa C 2
> setkey(DT,v1,v2) #會先按v1排序班套,在按v2排序
> head(DT[.('B',1)]) #篩選出v1列值為B,v2列值為1的數(shù)據(jù)
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.0 3.6 1.4 0.2 setosa B 1
2: 5.4 3.7 1.5 0.2 setosa B 1
3: 5.4 3.9 1.3 0.4 setosa B 1
4: 4.6 3.6 1.0 0.2 setosa B 1
5: 5.2 3.4 1.4 0.2 setosa B 1
6: 4.9 3.1 1.5 0.2 setosa B 1
> head(DT[.(c('A','B'),1)]) #篩選出v1列值為A或者B故河,v2列值為1的數(shù)據(jù)
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.1 3.5 1.4 0.2 setosa A 1
2: 4.6 3.4 1.4 0.3 setosa A 1
3: 4.8 3.0 1.4 0.1 setosa A 1
4: 5.7 3.8 1.7 0.3 setosa A 1
5: 4.8 3.4 1.9 0.2 setosa A 1
6: 4.8 3.1 1.6 0.2 setosa A 1
> tail(DT[.(c('A','B'),1)]) #篩選出v1列值為A或者B吱韭,v2列值為1的數(shù)據(jù)
sepal.length sepal.width petal.length petal.width species v1 v2
1: 7.7 2.6 6.9 2.3 virginica B 1
2: 6.7 3.3 5.7 2.1 virginica B 1
3: 7.4 2.8 6.1 1.9 virginica B 1
4: 6.3 3.4 5.6 2.4 virginica B 1
5: 5.8 2.7 5.1 1.9 virginica B 1
6: 6.2 3.4 5.4 2.3 virginica B 1
6 data.table高級操作
- 使用.N來表示行的數(shù)量
> DT[.N] #在i處使用可以返回最后一行
sepal.length sepal.width petal.length petal.width species v1 v2
1: 5.9 3 5.1 1.8 virginica C 2
> DT[,.N] #在j處使用可以返回最后一行的行數(shù)
[1] 150
- .SD
.SD是一個data.table,他包含了各個分組的數(shù)據(jù)鱼的,除了by中的變量的所有元素理盆,且只能在j中使用
> DT[,print(.SD),by=v1]
sepal.length sepal.width petal.length petal.width species v2
1: 5.1 3.5 1.4 0.2 setosa 1
2: 4.6 3.4 1.4 0.3 setosa 1
3: 4.8 3.0 1.4 0.1 setosa 1
4: 5.7 3.8 1.7 0.3 setosa 1
5: 4.8 3.4 1.9 0.2 setosa 1
6: 4.8 3.1 1.6 0.2 setosa 1
7: 5.5 3.5 1.3 0.2 setosa 1
8: 4.4 3.2 1.3 0.2 setosa 1
9: 5.3 3.7 1.5 0.2 setosa 1
10: 6.5 2.8 4.6 1.5 versicolor 1
11: 5.0 2.0 3.5 1.0 versicolor 1
12: 5.6 3.0 4.5 1.5 versicolor 1
13: 6.3 2.5 4.9 1.5 versicolor 1
14: 6.0 2.9 4.5 1.5 versicolor 1
15: 5.4 3.0 4.5 1.5 versicolor 1
16: 5.5 2.6 4.4 1.2 versicolor 1
17: 5.7 2.9 4.2 1.3 versicolor 1
18: 7.1 3.0 5.9 2.1 virginica 1
19: 6.7 2.5 5.8 1.8 virginica 1
20: 5.8 2.8 5.1 2.4 virginica 1
21: 6.9 3.2 5.7 2.3 virginica 1
22: 6.2 2.8 4.8 1.8 virginica 1
23: 6.4 2.8 5.6 2.2 virginica 1
24: 6.0 3.0 4.8 1.8 virginica 1
25: 6.7 3.3 5.7 2.5 virginica 1
26: 4.6 3.1 1.5 0.2 setosa 2
27: 4.9 3.1 1.5 0.1 setosa 2
28: 5.7 4.4 1.5 0.4 setosa 2
29: 5.1 3.7 1.5 0.4 setosa 2
30: 5.2 3.5 1.5 0.2 setosa 2
31: 5.5 4.2 1.4 0.2 setosa 2
32: 5.1 3.4 1.5 0.2 setosa 2
33: 4.8 3.0 1.4 0.3 setosa 2
34: 6.4 3.2 4.5 1.5 versicolor 2
35: 4.9 2.4 3.3 1.0 versicolor 2
36: 6.1 2.9 4.7 1.4 versicolor 2
37: 5.6 2.5 3.9 1.1 versicolor 2
38: 6.6 3.0 4.4 1.4 versicolor 2
39: 5.5 2.4 3.7 1.0 versicolor 2
40: 6.3 2.3 4.4 1.3 versicolor 2
41: 5.0 2.3 3.3 1.0 versicolor 2
42: 5.7 2.8 4.1 1.3 versicolor 2
43: 7.6 3.0 6.6 2.1 virginica 2
44: 6.4 2.7 5.3 1.9 virginica 2
45: 7.7 3.8 6.7 2.2 virginica 2
46: 6.3 2.7 4.9 1.8 virginica 2
47: 7.2 3.0 5.8 1.6 virginica 2
48: 7.7 3.0 6.1 2.3 virginica 2
49: 6.9 3.1 5.1 2.3 virginica 2
50: 6.5 3.0 5.2 2.0 virginica 2
sepal.length sepal.width petal.length petal.width species v2
sepal.length sepal.width petal.length petal.width species v2
1: 5.0 3.6 1.4 0.2 setosa 1
2: 5.4 3.7 1.5 0.2 setosa 1
3: 5.4 3.9 1.3 0.4 setosa 1
4: 4.6 3.6 1.0 0.2 setosa 1
5: 5.2 3.4 1.4 0.2 setosa 1
6: 4.9 3.1 1.5 0.2 setosa 1
7: 5.0 3.5 1.3 0.3 setosa 1
8: 5.1 3.8 1.6 0.2 setosa 1
9: 6.9 3.1 4.9 1.5 versicolor 1
10: 6.6 2.9 4.6 1.3 versicolor 1
11: 5.6 2.9 3.6 1.3 versicolor 1
12: 5.9 3.2 4.8 1.8 versicolor 1
13: 6.8 2.8 4.8 1.4 versicolor 1
14: 5.8 2.7 3.9 1.2 versicolor 1
15: 5.6 3.0 4.1 1.3 versicolor 1
16: 5.6 2.7 4.2 1.3 versicolor 1
17: 6.3 3.3 6.0 2.5 virginica 1
18: 4.9 2.5 4.5 1.7 virginica 1
19: 6.8 3.0 5.5 2.1 virginica 1
20: 7.7 2.6 6.9 2.3 virginica 1
21: 6.7 3.3 5.7 2.1 virginica 1
22: 7.4 2.8 6.1 1.9 virginica 1
23: 6.3 3.4 5.6 2.4 virginica 1
24: 5.8 2.7 5.1 1.9 virginica 1
25: 6.2 3.4 5.4 2.3 virginica 1
26: 4.9 3.0 1.4 0.2 setosa 2
27: 5.0 3.4 1.5 0.2 setosa 2
28: 4.3 3.0 1.1 0.1 setosa 2
29: 5.1 3.8 1.5 0.3 setosa 2
30: 5.0 3.0 1.6 0.2 setosa 2
31: 5.4 3.4 1.5 0.4 setosa 2
32: 4.9 3.6 1.4 0.1 setosa 2
33: 5.0 3.5 1.6 0.6 setosa 2
34: 5.0 3.3 1.4 0.2 setosa 2
35: 5.7 2.8 4.5 1.3 versicolor 2
36: 5.9 3.0 4.2 1.5 versicolor 2
37: 5.8 2.7 4.1 1.0 versicolor 2
38: 6.1 2.8 4.7 1.2 versicolor 2
39: 5.7 2.6 3.5 1.0 versicolor 2
40: 6.0 3.4 4.5 1.6 versicolor 2
41: 6.1 3.0 4.6 1.4 versicolor 2
42: 6.2 2.9 4.3 1.3 versicolor 2
43: 6.3 2.9 5.6 1.8 virginica 2
44: 7.2 3.6 6.1 2.5 virginica 2
45: 6.4 3.2 5.3 2.3 virginica 2
46: 5.6 2.8 4.9 2.0 virginica 2
47: 6.1 3.0 4.9 1.8 virginica 2
48: 6.3 2.8 5.1 1.5 virginica 2
49: 6.9 3.1 5.4 2.1 virginica 2
50: 6.7 3.0 5.2 2.3 virginica 2
sepal.length sepal.width petal.length petal.width species v2
sepal.length sepal.width petal.length petal.width species v2
1: 4.7 3.2 1.3 0.2 setosa 1
2: 4.4 2.9 1.4 0.2 setosa 1
3: 5.8 4.0 1.2 0.2 setosa 1
4: 5.4 3.4 1.7 0.2 setosa 1
5: 5.0 3.4 1.6 0.4 setosa 1
6: 5.2 4.1 1.5 0.1 setosa 1
7: 4.4 3.0 1.3 0.2 setosa 1
8: 5.1 3.8 1.9 0.4 setosa 1
9: 7.0 3.2 4.7 1.4 versicolor 1
10: 6.3 3.3 4.7 1.6 versicolor 1
11: 6.0 2.2 4.0 1.0 versicolor 1
12: 6.2 2.2 4.5 1.5 versicolor 1
13: 6.4 2.9 4.3 1.3 versicolor 1
14: 5.5 2.4 3.8 1.1 versicolor 1
15: 6.7 3.1 4.7 1.5 versicolor 1
16: 5.8 2.6 4.0 1.2 versicolor 1
17: 5.1 2.5 3.0 1.1 versicolor 1
18: 6.5 3.0 5.8 2.2 virginica 1
19: 6.5 3.2 5.1 2.0 virginica 1
20: 6.5 3.0 5.5 1.8 virginica 1
21: 7.7 2.8 6.7 2.0 virginica 1
22: 6.4 2.8 5.6 2.1 virginica 1
23: 6.1 2.6 5.6 1.4 virginica 1
24: 6.7 3.1 5.6 2.4 virginica 1
25: 6.3 2.5 5.0 1.9 virginica 1
26: 5.4 3.9 1.7 0.4 setosa 2
27: 4.8 3.4 1.6 0.2 setosa 2
28: 5.1 3.5 1.4 0.3 setosa 2
29: 5.1 3.3 1.7 0.5 setosa 2
30: 4.7 3.2 1.6 0.2 setosa 2
31: 5.0 3.2 1.2 0.2 setosa 2
32: 4.5 2.3 1.3 0.3 setosa 2
33: 4.6 3.2 1.4 0.2 setosa 2
34: 5.5 2.3 4.0 1.3 versicolor 2
35: 5.2 2.7 3.9 1.4 versicolor 2
36: 6.7 3.1 4.4 1.4 versicolor 2
37: 6.1 2.8 4.0 1.3 versicolor 2
38: 6.7 3.0 5.0 1.7 versicolor 2
39: 6.0 2.7 5.1 1.6 versicolor 2
40: 5.5 2.5 4.0 1.3 versicolor 2
41: 5.7 3.0 4.2 1.2 versicolor 2
42: 5.8 2.7 5.1 1.9 virginica 2
43: 7.3 2.9 6.3 1.8 virginica 2
44: 5.7 2.5 5.0 2.0 virginica 2
45: 6.0 2.2 5.0 1.5 virginica 2
46: 7.2 3.2 6.0 1.8 virginica 2
47: 7.9 3.8 6.4 2.0 virginica 2
48: 6.4 3.1 5.5 1.8 virginica 2
49: 6.8 3.2 5.9 2.3 virginica 2
50: 5.9 3.0 5.1 1.8 virginica 2
sepal.length sepal.width petal.length petal.width species v2
Empty data.table (0 rows) of 1 col: v1
> DT[,.SD,by=v1][]
v1 sepal.length sepal.width petal.length petal.width species v2
1: A 5.1 3.5 1.4 0.2 setosa 1
2: A 4.6 3.4 1.4 0.3 setosa 1
3: A 4.8 3.0 1.4 0.1 setosa 1
4: A 5.7 3.8 1.7 0.3 setosa 1
5: A 4.8 3.4 1.9 0.2 setosa 1
---
146: C 7.2 3.2 6.0 1.8 virginica 2
147: C 7.9 3.8 6.4 2.0 virginica 2
148: C 6.4 3.1 5.5 1.8 virginica 2
149: C 6.8 3.2 5.9 2.3 virginica 2
150: C 5.9 3.0 5.1 1.8 virginica 2
- 返回以v1列為分組的數(shù)據(jù)的第一行和最后一行的數(shù)據(jù)
> DT[,.SD[c(1,.N)],by=v1]
v1 sepal.length sepal.width petal.length petal.width species v2
1: A 5.1 3.5 1.4 0.2 setosa 1
2: A 6.5 3.0 5.2 2.0 virginica 2
3: B 5.0 3.6 1.4 0.2 setosa 1
4: B 6.7 3.0 5.2 2.3 virginica 2
5: C 4.7 3.2 1.3 0.2 setosa 1
6: C 5.9 3.0 5.1 1.8 virginica 2
- 返回以v1和species分組的其他數(shù)據(jù)的匯總數(shù)據(jù)
> DT[,lapply(.SD,sum),by=c("v1","species")]
v1 species sepal.length sepal.width petal.length petal.width v2
1: A setosa 85.9 59.0 25.3 3.9 25
2: A versicolor 98.1 45.5 71.4 22.0 26
3: A virginica 108.1 47.7 89.1 33.1 24
4: B setosa 85.2 58.6 24.0 4.2 26
5: B versicolor 96.3 46.5 69.3 21.4 24
6: B virginica 109.6 51.4 93.3 35.5 25
7: C setosa 79.2 53.8 23.8 4.2 24
8: C versicolor 102.4 46.5 72.3 22.9 25
9: C virginica 111.7 49.6 95.2 32.7 26
- .SDcols
常與.SD一起用,用于對.SD取某些列
> DT[,.SD,by=v1,.SDcols=c("species","sepal.length")]
v1 species sepal.length
1: A setosa 5.1
2: A setosa 4.6
3: A setosa 4.8
4: A setosa 5.7
5: A setosa 4.8
---
146: C virginica 7.2
147: C virginica 7.9
148: C virginica 6.4
149: C virginica 6.8
150: C virginica 5.9
> DT[,.(species,sepal.length),by=v1] #相當(dāng)于這句
v1 species sepal.length
1: A setosa 5.1
2: A setosa 4.6
3: A setosa 4.8
4: A setosa 5.7
5: A setosa 4.8
---
146: C virginica 7.2
147: C virginica 7.9
148: C virginica 6.4
149: C virginica 6.8
150: C virginica 5.9
#也可以是一個函數(shù)的返回值:
> DT[,lapply(.SD,sum),by=v1,.SDcols=paste0("v",2)]
v1 v2
1: A 75
2: B 75
3: C 75
- 串聯(lián)操作凑阶,有點(diǎn)管道(%>%)操作的味道
不串聯(lián)的情況
> DT2 <- copy(DT)
> DT2 <- DT2[,.(SUM=sum(sepal.length)),by=v1]
> DT2[SUM>291.5]
v1 SUM
1: A 292.1
2: C 293.3
> ##串聯(lián)操作
> DT2 <- copy(DT)
> DT2[,.(SUM=sum(sepal.length)),by=v1][SUM>291.5] #分組的情況下有點(diǎn)像SQL中的having
v1 SUM
1: A 292.1
2: C 293.3
7.data.table中的melt和dcast
用法和reshape2包差不多猿规,可以參考
利用reshape2包進(jìn)行數(shù)據(jù)逆透視和數(shù)據(jù)透視