RNA-seq workshop-Day 1

Day 1.jpg

本周的Data Workshop又開始了，這次將圍繞著以R語言為工具，進(jìn)行RNA-seq和ScRNA-seq的分析。今天主要回顧了R introduction的內(nèi)容允乐，溫習(xí)了接下來將要用到的一些commands，然后對(duì)RNA-seq的流程進(jìn)行了系列介紹削咆。

1. Introduction to R (Dr. Rocio T Martinez-Nunez）

1.1 Objects

Assign to objects(vectors, tables, values, functions)

1.2 Commenting your code

Just add (#) before what you want to comment

1.3 system(): communicates with the shell in your computer

system("ls -F/")

1.4 cmd as a group of commands

cmd <- paste("gunzip -c", fastq.files, "| head")
cmd  # to view cmds & runs
system(cmd[1]) # Run the first command of cmd

1.5 Some R tips

1.5.1 ask for help

# in R: ? + function
?system
#in shell : (-h)
system("trim_galore -h")

1.5.2 Tab: look for the list of word match in R.

1.5.3 Arrow keys: up row-the last thing you type in.

1.5.4 Pines %>% in R or | in shell

install. packages("tidyverse")  # install packages
library("tidyverse")  # load packages
download.file("website", "path and name. csv")  # download file
surveys <- read_csv("path and name. csv")  # open file
str( surveys)  # inspect the data: an overview of an object's structure and its elements
dim( surveys)  # size: row numbers and column numbers
head( surveys)  # check the top(first six lines) of the data frame
surveys_new <- surveys %>%  # pipes
filter(weight < 5) %>%  # filter
select(species_id, sex, weight)  # select
str(surveys_new)  # inspect the data: an overview of an object's structure and its elements
dim(surveys_new)  # size: row numbers and column numbers
head(surveys_new)  # check the top(first six lines) of the data frame

Only works when install tidyverse.
%>% : shortcut keys in PC: ctrl + shift + M
%>% means then, (the things we want pipe) on the left, and (the things we want to pine into) on the right.

1.6 Some R functions we will be using:

 # create command cmd that includes trim_galore and its flags with the object we apply it to   
cmd <- paste("trim_galore --length 21 --output_dir trimgalore, fastq.files)  
# run only the first line of the commands
system(cmd[1])
# create vector with the power of 1, 2 and 3:
sapply(1:3, function(x) x^2)
#[1] 1, 4, 9

system(): communicates with the shell.
dir.create(): create directories.
list.files(): list the files in your working directory.
paste(): concatenates vectors after converting into character.
data.frame(): generates a data frame.
sapply(): applies a function to an object and returns a simplified object.

1.7 Loops: vectorization & sapply

for (year in c(2010, 2011, 2012, 2013, 2014, 2015)){
      print(paste("The year is", year))
}

2. Introduction to RNA-seq data analysis (Dr. Alessandra Vigilante)

2.1 What is NGS

Next-generation sequencing (NGS), also known as high-throughput sequencing, is the term used to describe a number of different modern sequencing technologies, such as RNA-seq, ScRNA-seq, ChIP-seq et al.

2.2 Eight stages in RNA-seq Analysis

2.2.1 Define the question of interest (RNA-seq data can tell us)

Relative expression levels within a biological sample
Gene expression differences between biological samples
Quantify alternative transcript levels
Confirm annotated 5′ and 3′ ends of genes
Map exon/intron boundaries

2.2.2 Get the data(data formats)

Raw data: Fastq
Aligned data: SAM, BAM, CRAM
Genome annotation: GFF
Intervals: BED
Variants: VCF, BCF

2.2.3 Clean the data(quality control)

FastQC: trimmomatic, cutadapt
The ShortRead package in R/Bioconductor using the qa() and report () functions

2.2.4 Map the data

Chanllenges: large costs in memory; introns; updates of reference genomes, tools and softwares.
Mapping srategies: de novo assembly, align to transcriptome, align to genome.
Tools: Bowtie 2, TopHat 2, STAR
Pseudo-alignment: Kallisto - faster and more accurate
If you have SAM files you have to transform them to BAM
You can visualise your BAM files in IGV
Use either your BAM file or the transcript abundance file (from Kallisto) to
generate a Count Table
Perform differential expression analysis and downstream analyses

2.2.5 Explore the data

2.2.6 Fit statistical models

2.2.7 Make your analysis reproducible

RNA-seq workflow in the workshop

3. Learning experience

今天第一個(gè)到workshop牍疏，一切準(zhǔn)備很充分，全天學(xué)習(xí)很投入拨齐。
今天課程比較雜鳞陨，遇到的很多新的問題和挑戰(zhàn)，需要好好消化瞻惋。
今天認(rèn)識(shí)了Guys Campus的口腔醫(yī)學(xué)華人博士厦滤，聊得很開心援岩，KCL的口腔醫(yī)學(xué)已經(jīng)世界排名第二啦，進(jìn)一步了解了國外博士的生活和學(xué)習(xí)風(fēng)貌掏导，值得學(xué)習(xí)他們的新技術(shù)新方法窄俏。
今天還認(rèn)識(shí)了Denmark Campus的生信大牛，樂于助人還給我們講述他的學(xué)習(xí)歷程碘菜，希望接下來可以繼續(xù)向他們請(qǐng)教，互幫互助限寞。

本次筆記借鑒了KCL Workshop的學(xué)習(xí)資料及課件忍啸，請(qǐng)勿轉(zhuǎn)載，如需引用請(qǐng)注明履植。

最后編輯于：2018.06.12 13:44:43

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末计雌，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子玫霎，更是在濱河造成了極大的恐慌凿滤，老刑警劉巖，帶你破解...
沈念sama閱讀 211,194評(píng)論 6贊 490
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件庶近，死亡現(xiàn)場(chǎng)離奇詭異翁脆，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)鼻种，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 90,058評(píng)論 2贊 385
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門反番，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人叉钥，你說我怎么就攤上這事罢缸。” “怎么了投队？”我有些...
開封第一講書人閱讀 156,780評(píng)論 0贊 346
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵枫疆，是天一觀的道長。經(jīng)常有香客問我敷鸦，道長息楔，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 56,388評(píng)論 1贊 283
?港島之戀（遺憾婚禮）
正文為了忘掉前任轧膘，我火速辦了婚禮钞螟，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘谎碍。我一直安慰自己鳞滨，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 65,430評(píng)論 5贊 384
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布蟆淀。她就那樣靜靜地躺著拯啦，像睡著了一般澡匪。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上褒链，一...
開封第一講書人閱讀 49,764評(píng)論 1贊 290
城市分裂傳說
那天唁情，我揣著相機(jī)與錄音，去河邊找鬼甫匹。笑死甸鸟，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的兵迅。我是一名探鬼主播抢韭，決...
沈念sama閱讀 38,907評(píng)論 3贊 406
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼恍箭！你這毒婦竟也來了刻恭？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 37,679評(píng)論 0贊 266
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤扯夭，失蹤者是張志新（化名）和其女友劉穎鳍贾，沒想到半個(gè)月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體交洗，經(jīng)...
沈念sama閱讀 44,122評(píng)論 1贊 303
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡骑科，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,459評(píng)論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了构拳。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片纵散。...
茶點(diǎn)故事閱讀 38,605評(píng)論 1贊 340
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖隐圾，靈堂內(nèi)的尸體忽然破棺而出伍掀，到底是詐尸還是另有隱情，我是刑警寧澤暇藏，帶...
沈念sama閱讀 34,270評(píng)論 4贊 329
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布蜜笤，位于F島的核電站，受9級(jí)特大地震影響盐碱，放射性物質(zhì)發(fā)生泄漏把兔。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,867評(píng)論 3贊 312
男人毒藥：我在死后第九天來索命
文/蒙蒙一瓮顽、第九天我趴在偏房一處隱蔽的房頂上張望县好。院中可真熱鬧，春花似錦暖混、人聲如沸缕贡。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,734評(píng)論 0贊 21
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽晾咪。三九已至收擦，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間谍倦，已是汗流浹背塞赂。一陣腳步聲響...
開封第一講書人閱讀 31,961評(píng)論 1贊 265
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留昼蛀，地道東北人宴猾。一個(gè)月前我還...
沈念sama閱讀 46,297評(píng)論 2贊 360
代替公主和親
正文我出身青樓，卻偏偏與公主長得像叼旋，于是被迫代替她去往敵國和親鳍置。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 43,472評(píng)論 2贊 348