之前都是沒有系統(tǒng)地學習R語言额各,5月26日去西安聽生信技能樹 R語言培訓課,Jimmy大神布置的作業(yè)題吧恃,像我這樣純正小白的初學者虾啦,完全負基礎(chǔ),學了又忘痕寓,忘了又學傲醉,繼續(xù)搬磚,一點點去理解呻率,鞏固基礎(chǔ)硬毕,整理詳細一點的版本,花了不少時間礼仗,方便像我這樣純正小白理解:
“#run”之前的代碼是自己想吐咳、參考《R語言實戰(zhàn)》課本和參考生信技能給的作業(yè)參考代碼,“#run”之后的代碼是運行結(jié)果藐守。
“#”后的內(nèi)容在RStudio里被忽略挪丢,為注視內(nèi)容
1 . 打開 Rstudio 告訴我它的工作目錄
Rstudio安裝后,它默認工作路徑一般是Rstudio的安裝主目錄卢厂,如我使用Mac系統(tǒng)乾蓬,它默認的工作目錄是"/Users/chenjiangshu"。如果使用windows系統(tǒng)慎恒,它在工作目錄默認在安裝路徑下任内。- 新建6個向量,基于不同的原子類型融柬。(重點是字符串,數(shù)值粒氧,邏輯值)
a1 <- c("good morning")#字符串向量
a1
a2 <- c(1,5,8,16,21,25)#數(shù)值型向量
a2
a3 <- c("a","b","c","d","e","f")#字符型向量
a3
a4 <- c("a","b","c",1,2,3)#數(shù)值和字符混合越除,但calss默認是字符型
a4
a5 <- c(T,F,T,T,F,F)#邏輯值向量
a5
a6 <- c(1+0i)#復數(shù)向量
a6
#run:
> a1 <- c("good morning")#字符串向量
> a1
[1] "good morning"
> a2 <- c(1,5,8,16,21,25)#數(shù)值型向量
> a2
[1] 1 5 8 16 21 25
> a3 <- c("a","b","c","d","e","f")#字符型向量
> a3
[1] "a" "b" "c" "d" "e" "f"
> a4 <- c("a","b","c",1,2,3)#數(shù)值和字符混合,但calss默認是字符型
> a4
[1] "a" "b" "c" "1" "2" "3"
> a5 <- c(T,F,T,T,F,F)#邏輯值向量
> a5
[1] TRUE FALSE TRUE TRUE FALSE FALSE
> a6 <- c(1+0i)#復數(shù)向量
> a6
[1] 1+0i
-
告訴我在你打開的rstudio里面 getwd() 代碼運行后返回的是什么外盯?
打開RStudio在里面輸入 getwd() 代碼運行后返回的是操作的當前的工作目錄摘盆,運行g(shù)etwd() 代碼相當于查看當前工作目錄。
R 第3題.png
-
- 新建一些數(shù)據(jù)結(jié)構(gòu)饱苟,比如矩陣孩擂,數(shù)組,數(shù)據(jù)框箱熬,列表等重點是數(shù)據(jù)框类垦,矩陣)
4.1 用matrix()函數(shù)創(chuàng)建一個5行4列的矩陣
y <- matrix(1:20,nrow=5,ncol=4)
y
#run
> y <- matrix(1:20,nrow=5,ncol=4)
> y
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
- 4.2 用array()函數(shù)創(chuàng)建一個數(shù)組
dim1 <- c("A1","A2")
dim2 <- c("B1","B2","B3")
dim3 <- c("C1","C2","C3","C4")
z <- array(1:24,c(2,3,4),dimnames = list(dim1,dim2,dim3))
z
#run
> dim1 <- c("A1","A2")
> dim2 <- c("B1","B2","B3")
> dim3 <- c("C1","C2","C3","C4")
> z <- array(1:24,c(2,3,4),dimnames = list(dim1,dim2,dim3))
> z
, , C1
B1 B2 B3
A1 1 3 5
A2 2 4 6
, , C2
B1 B2 B3
A1 7 9 11
A2 8 10 12
, , C3
B1 B2 B3
A1 13 15 17
A2 14 16 18
, , C4
B1 B2 B3
A1 19 21 23
A2 20 22 24
數(shù)組是矩陣的一個自然推廣狈邑。
- 4.3 用data.frame()函數(shù)創(chuàng)建一個數(shù)據(jù)框
patientID <- c(1,2,3,4)
age <- c(25,34,28,52)
diabetes <- c("Type1","Type1","Type2","Type1")
status <- c("Poor","Improved","Excellent","Poor")
patientdata <- data.frame(patientID,age,diabetes,status)
patientdata
#run
> patientID <- c(1,2,3,4)
> age <- c(25,34,28,52)
> diabetes <- c("Type1","Type1","Type2","Type1")
> status <- c("Poor","Improved","Excellent","Poor")
> patientdata <- data.frame(patientID,age,diabetes,status)
> patientdata
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type1 Improved
3 3 28 Type2 Excellent
4 4 52 Type1 Poor
數(shù)據(jù)框不同的列可包含數(shù)值型,字符型的數(shù)據(jù)
- 4.4 用list()函數(shù)創(chuàng)建一個列表
g <- "My First List"
h <- c(25,26,18,39)
j <- matrix(1:10,nrow = 5)
k <- c("one","two","three")
mylist <- list(title=g,ages=h,j,k)
mylist
#run
> g <- "My First List"
> h <- c(25,26,18,39)
> j <- matrix(1:10,nrow = 5)
> k <- c("one","two","three")
> mylist <- list(title=g,ages=h,j,k)
> mylist
$title
[1] "My First List"
$ages
[1] 25 26 18 39
[[3]]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
[[4]]
[1] "one" "two" "three"
列表可包含幾個向量蚤认,矩陣米苹,數(shù)據(jù)框,甚至組合的列表烙懦。
- 在你新建的數(shù)據(jù)框進行切片操作驱入,比如首先取第1,3行氯析, 然后取第4亏较,6列
首先構(gòu)建的數(shù)據(jù)框至少有6列,3行掩缓,創(chuàng)建一個4行雪情,6列的數(shù)據(jù)框:
- 在你新建的數(shù)據(jù)框進行切片操作驱入,比如首先取第1,3行氯析, 然后取第4亏较,6列
patientID <- c(1,2,3,4)
age <- c(25,34,28,52)
diabetes <- c("Type1","Type1","Type2","Type1")
status <- c("Poor","Improved","Excellent","Poor")
gender <- c("male","female","female","male")
incomes <- c("8k","12k","4.5k","7k")
patientdata <- data.frame(patientID,age,diabetes,status,gender,income)
patientdata
patientdata[c(1,3),]
patientdata[,c(4,6)]
patientdata[c(1,3),c(4,6)]
#run
> patientID <- c(1,2,3,4)
> age <- c(25,34,28,52)
> diabetes <- c("Type1","Type1","Type2","Type1")
> status <- c("Poor","Improved","Excellent","Poor")
> gender <- c("male","female","female","male")
> income <- c("8k","12k","4.5k","7k")
> patientdata <- data.frame(patientID,age,diabetes,status,gender,income)
> patientdata
patientID age diabetes status gender income
1 1 25 Type1 Poor male 8k
2 2 34 Type1 Improved female 12k
3 3 28 Type2 Excellent female 4.5k
4 4 52 Type1 Poor male 7k
> patientdata[c(1,3),]
patientID age diabetes status gender income
1 1 25 Type1 Poor male 8k
3 3 28 Type2 Excellent female 4.5k
> patientdata[,c(4,6)]
status income
1 Poor 8k
2 Improved 12k
3 Excellent 4.5k
4 Poor 7k
> patientdata[c(1,3),c(4,6)]
status income
1 Poor 8k
3 Excellent 4.5k
- 使用data函數(shù)來加載R內(nèi)置數(shù)據(jù)集 rivers 描述它。并且可以查看更多的R語言內(nèi)置的數(shù)據(jù)集:https://mp.weixin.qq.com/s/dZPbCXccTzuj0KkOL7R31g
data()
rivers#北美主要河流及長度你辣,
head(rivers)
tail(rivers)
length(rivers)#rivers有多少對象元素
str(rivers)#查看河流的結(jié)構(gòu)
summary(rivers)#獲取描述性統(tǒng)計量(最小值/最大值/四分位數(shù)/數(shù)值型變量/因子向量/邏輯值向量
#run
> head(rivers)
[1] 735 320 325 392 524 450
> tail(rivers)
[1] 500 720 270 430 671 1770
> length(rivers)#rivers有多少對象元素
[1] 141
> str(rivers)#查看河流的結(jié)構(gòu)
num [1:141] 735 320 325 392 524 ...
> summary(rivers)#獲取描述性統(tǒng)計量(最小值/最大值/四分位數(shù)/數(shù)值型變量/因子向量/邏輯值向量)
Min. 1st Qu. Median Mean 3rd Qu. Max.
135.0 310.0 425.0 591.2 680.0 3710.0
"head"和"tail"一般默認讀前6行和后6行巡通。
- 下載 https://www.ncbi.nlm.nih.gov/sra?term=SRP133642 里面的 RunInfo Table 文件讀入到R里面,了解這個數(shù)據(jù)框舍哄,多少列宴凉,每一列都是什么屬性的元素。(參考B站生信小技巧獲取runinfo table) 這是一個單細胞轉(zhuǎn)錄組項目的數(shù)據(jù)表悬,共768個細胞弥锄,如果你找不到RunInfo Table 文件,可以點擊下載蟆沫,然后讀入你的R里面也可以籽暇。
SraRunTable <- read.table("http://www.bio-info-trainee.com/tmp/5years/SraRunTable.txt",fill=TRUE,header = T,sep = "\t")
dim(SraRunTable)
class(colnames(SraRunTable))
#run
> SraRunTable <- read.table("http://www.bio-info-trainee.com/tmp/5years/SraRunTable.txt",fill=TRUE,header = T,sep = "\t")
> dim(SraRunTable)
[1] 768 31
> class(colnames(SraRunTable))
[1] "character"
768行,31列饭庞,元素為字符型戒悠。
- 下載 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229 里面的樣本信息sample.csv讀入到R里面,了解這個數(shù)據(jù)框舟山,多少列绸狐,每一列都是什么屬性的元素。(參考 https://mp.weixin.qq.com/s/fbHMNXOdwiQX5BAlci8brA 獲取樣本信息sample.csv)如果你實在是找不到樣本信息文件sample.csv累盗,也可以點擊下載六孵。
點擊鏈接:https://links.jianshu.com/go?to=http%3A%2F%2Fwww.bio-info-trainee.com%2Ftmp%2F5years%2Fsample.csv
- 下載 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229 里面的樣本信息sample.csv讀入到R里面,了解這個數(shù)據(jù)框舟山,多少列绸狐,每一列都是什么屬性的元素。(參考 https://mp.weixin.qq.com/s/fbHMNXOdwiQX5BAlci8brA 獲取樣本信息sample.csv)如果你實在是找不到樣本信息文件sample.csv累盗,也可以點擊下載六孵。
sample <-read.csv("sample.csv")
colnames(sample)
#run
> sample <-read.csv("sample.csv")
> colnames(sample)
[1] "Accession" "Title" "Sample.Type" "Taxonomy"
[5] "Channels" "Platform" "Series" "Supplementary.Types"
[9] "Supplementary.Links" "SRA.Accession" "Contact" "Release.Date"
- 把前面兩個步驟的兩個表(RunInfo Table 文件,樣本信息sample.csv)關(guān)聯(lián)起來幅骄,使用merge函數(shù)。
SraRunTable <- read.table("http://www.bio-info-trainee.com/tmp/5years/SraRunTable.txt",fill=TRUE,header = T,sep = "\t")
sample <-read.csv("sample.csv")
m=merge(SraRunTable,sample,by.x = 'Sample_Name',by.y = 'Accession')
str(m)
#run
> str(m)
'data.frame': 768 obs. of 42 variables
合并后有768個對象本今,42個變量
課程分享
生信技能樹全球公益巡講
(https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g)
B站公益74小時生信工程師教學視頻合輯
(https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw)