1 sapply
> mydata <- read.csv("exp.csv",header = T,sep = ",") #表達(dá)量矩陣疯暑,首行為表頭
> class(mydata)
[1] "data.frame"
> sapply(mydata,mean,na.rm=TRUE) #獲得數(shù)據(jù)框每列的平均值
gene_id R1 R2 R3 G1 G2 G3
NA 7.29500 6.24375 19.65438 124.26750 55.18437 232.05312
`Warning message:`
` In mean.default(X[[i]], ...) : 參數(shù)不是數(shù)值也不是邏輯值:回覆NA`
sapply
還可以使用mean, sd, var, min, max, median, range, quantile,但是數(shù)據(jù)類型不同.
2 summary
> summary(mydata)
#獲得mean,median,25th and 75th quartiles,min,max
3 fivenum()
> fivenum(x, na.rm = TRUE)
x 為數(shù)值型向量,可以包含NA以及Inf垒手,-Inf
na.rm = TRUE 默認(rèn)將NA和NaN去除,但是Inf還保留喳坠。
fivenum()函數(shù)返回5個(gè)值:Tukey min,lower-hinge, median,upper-hinge,max丈钙。要注意這里的兩個(gè)hinge與分位數(shù)不同。
比如1:10
> fivenum(1:10)
[1] 1.0 3.0 5.5 8.0 10.0
fivenum取中間兩個(gè)數(shù)5胸懈,6取平均值得到中位數(shù)5.5担扑,然后從1,2趣钱,3涌献,4,5.5中取中位數(shù)得到3即為lower-hinge首有。
4 使用Hmisc包
> library(Hmisc)
> describe(mydata)
# n, nmiss, unique, mean, 5,10,25,50,75,90,95th percentiles
# 5 lowest and 5 highest scores
5 使用pastecs包
> library(pastecs)
> stat.desc(mydata)
# nbr.val, nbr.null, nbr.na, min max, range, sum,
# median, mean, SE.mean, CI.mean, var, std.dev, coef.var
6使用psych包
>library(psych)
>describe(mydata)
# item name ,item number, nvalid, mean, sd,
# median, mad, min, max, skew, kurtosis, se
按分組變量匯總信息
> library(psych)
> describe.by(x, group=NULL,mat=FALSE,type=3,...)
7 使用doBy包
> library(doBy)
> summaryBy(mpg + wt ~ cyl + vs, data = mtcars,
> FUN = function(x) { c(m = mean(x), s = sd(x)) } )
# produces mpg.m wt.m mpg.s wt.s for each
# combination of the levels of cyl and vs