英文內(nèi)容搬運自:
https://bioinformaticsworkbook.org/introduction/dataTerminology.html
Learning Objective
- base/nucleotide
- read
- contig
- scaffold
- chromosome
What is a base?
There are four common bases in DNA sequence, A
denine, G
uanine, C
ytosine and T
hymine. U
racil is found in RNA in place of Thyamine
Image taken from wikipedia where more information about nucleotides can also be found.
What is a read?
A read is a string of bases represented by their one letter codes. Here is an example of a read that is 50 bases long. TTAACCTTGGTTTTGAACTTGAACACTTAGGGGATTGAAGATTCAACAACCCTAAAGCTTGGGGTAAAAC
What is a contig?
A contig is the consensus sequence generated by aligning reads to themselves.
The last line is the consensus of the aligned reads. We call this consensus sequence a contig.
What is a scaffold?
A scaffold is a set of contigs that have been ordered and oriented based on mate pair or long distance information.
contig
NNNNNNNNNNNNgitnoc
NNNNNNNNcontig
NNNNNNNNcontig
NNNNgitnoc
In the line above
-
contig
is a string of of bases (ATC or G) - N is an unknown base
-
gitnoc
is the word contig written backwards to represent the reverse complement of a contig
再搜文章一些補充烤镐,有圖就更好了:
contig/scaffold 和 N50/N90
把測序的reads拼接涩咖,如果可以完全拼接起來,中間沒有g(shù)ap,則是contig.如果中間有g(shù)ap班眯,但是知道gap的長度丈秩,這樣的序列稱為scaffold.
contig N50 和scaffold N50
把contig或scaffold按照從大到小的順序排列盯捌,長度達到基因組大小(所有contig或scaffold的長度)的50%時蘑秽,那條contig/scaffold的長度饺著,即為contig/scaffold N50. N50越大,說明基因組組裝的質(zhì)量越高肠牲。同理還有N90,即達到基因組大小90%時的contig/scaffold的長度幼衰。
作者:wo_monic
鏈接:http://www.reibang.com/p/9876964e3d20
來源:簡書
著作權(quán)歸作者所有。商業(yè)轉(zhuǎn)載請聯(lián)系作者獲得授權(quán)缀雳,非商業(yè)轉(zhuǎn)載請注明出處渡嚣。
基因組組裝一般分為三個層次,contig, scaffold和chromosomes. contig表示從大規(guī)模測序得到的短讀(reads)中找到的一致性序列。組裝的第一步就是從短片段(pair-end)文庫中組裝出contig严拒。進一步基于不同長度的大片段(mate-pair)文庫扬绪,將原本孤立的contig按序前后連接,其中會調(diào)整contig方向以及contig可能會存在開口(gap,用N表示)裤唠,這一步會得到scaffolds,就相當(dāng)于supercontigs和meatacontigs。最后基于遺傳圖譜或光學(xué)圖譜將scaffold合并調(diào)整莹痢,形成染色體級別的組裝(chromosome).
https://zhuanlan.zhihu.com/p/38317398
什么是Scaffold种蘸?基因組de novo測序,通過reads拼接獲得Contigs后竞膳,往往還需要構(gòu)建454 Paired-end庫或Illumina Mate-pair庫航瞭,以獲得一定大小片段(如3Kb、6Kb坦辟、10Kb刊侯、20Kb)兩端的序列★弊撸基于這些序列滨彻,可以確定一些Contig之間的順序關(guān)系,這些先后順序已知的Contigs組成Scaffold挪蹭。Contig N50:Reads拼接后會獲得一些不同長度的Contigs.將所有的Contig長度相加,能獲得一個Contig總長度.然后將所有的Contigs按照從長到短進行排序,如獲得Contig 1,Contig 2,contig 3...………Contig 25.將Contig按照這個順序依次相加,當(dāng)相加的長度達到Contig總長度的一半時,最后一個加上的Contig長度即為Contig N50.舉例:Contig 1+Contig 2+ Contig 3 +Contig 4=Contig總長度1/2時,Contig 4的長度即為Contig N50.ContigN50可以作為基因組拼接的結(jié)果好壞的一個判斷標準亭饵。Scaffold N50:Scaffold N50與Contig N50的定義類似.Contigs拼接組裝獲得一些不同長度的Scaffolds.將所有的Scaffold長度相加,能獲得一個Scaffold總長度.然后將所有的Scaffolds按照從長到短進行排序,如獲得Scaffold 1,Scaffold 2,Scaffold 3...………Scaffold 25.將Scaffold按照這個順序依次相加,當(dāng)相加的長度達到Scaffold總長度的一半時,最后一個加上的Scaffold長度即為Scaffold N50.舉例:Scaffold 1+Scaffold 2+ Scaffold3 +Scaffold 4 +Scaffold 5=Scaffold總長度1/2時,Scaffold 5的長度即為Scaffold N50.Scaffold N50可以作為基因組拼接的結(jié)果好壞的一個判斷標準。
作者:白羊鐵蛋
鏈接:http://www.reibang.com/p/117441ac6eb8
來源:簡書
著作權(quán)歸作者所有梁厉。商業(yè)轉(zhuǎn)載請聯(lián)系作者獲得授權(quán)辜羊,非商業(yè)轉(zhuǎn)載請注明出處。
What is a chromosome?
Chromosomes are the largest DNA molecules in a cell.
Scaffolds can be ordered and oriented using a genetic map or Hi-C data into linkage groups or chromosomes.
The ultimate goal of a genome assembly project is to assemble reads into phased chromosomes that represent an actual individual.
Most chromosomal assemblies produced today are not phased or may represent multiple individuals.