為了搏眼球,起了一個(gè)感覺(jué)很專業(yè)的名字掺逼。其實(shí)也是通過(guò)讀文章显拜,學(xué)學(xué)大牛的科研思路衡奥。
繼挪威云杉基因組2013年發(fā)表在Nature上之后,終于又有裸子植物基因組發(fā)表在頂級(jí)期刊cell上了远荠。裸子植物基因組大矮固,組裝難度大,雖然之前發(fā)表了銀杏譬淳,紅豆杉档址,巨杉等裸子植物文章,但都沒(méi)有做到頂級(jí)期刊的標(biāo)準(zhǔn)邻梆。那我們就看看這篇文章都做了哪些內(nèi)容守伸,能讓它發(fā)表在cell上呢?
文章名稱“The Chinese pine genome and methylome unveil key features of conifer evolution”
文章地址:https://www.sciencedirect.com/science/article/pii/S0092867421014288?
摘要:
Conifers dominate the world’s forest ecosystems and are the most widely planted tree species. Their giant and complex genomes present great challenges for assembling a complete reference genome for evolutionary and genomic studies. We present a 25.4-Gb chromosome-level assembly of Chinese pine (Pinus tabuliformis) and revealed that its genome size is mostly attributable to huge intergenic regions and long introns with high transposable element (TE) content. Large genes with long introns exhibited higher expressions levels. Despite a lack of recent whole-genome duplication, 91.2% of genes were duplicated through dispersed duplication, and expanded gene families are mainly related to stress responses, which may underpin conifers’ adaptation, particularly in cold and/or arid conditions. The reproductive regulation network is distinct compared with angiosperms. Slow removal of TEs with high-level methylation may have contributed to genomic expansion. This study provides insights into conifer evolution and resources for advancing research on conifer adaptation and development.
油松基因組長(zhǎng)達(dá)25.4G确虱,有12條染色體含友。裸子植物的基因組都很大(17–35 Gb)替裆,這是由于TE擴(kuò)張導(dǎo)致的校辩。除了基因間的TE,油松基因內(nèi)含子也包含TE辆童。裸子植物內(nèi)含子與被子植物比較長(zhǎng)的多宜咒,之前的裸子植物文章也有研究。此文章還提到把鉴,擁有長(zhǎng)內(nèi)含子的基因表達(dá)量也很高故黑。高水平的甲基化使TE消除速度減慢儿咱,從而導(dǎo)致油松基因組擴(kuò)張。
組裝及注釋結(jié)果:
油松內(nèi)含子平均長(zhǎng)度為10,034bp场晶,其中大于10Kb的內(nèi)含子有29,883個(gè)混埠。于是作者選了67個(gè)其它物種,比較他們基因組大小诗轻,基因钳宪,外顯子,內(nèi)含子長(zhǎng)度等指標(biāo)扳炬。
“We found a positive correlation between the ratio of total intron/exon length with the ge nome’s size, especially in the gymnosperm plants (Figure 2A), indicating that the genome expansion not only occurs in the intergenic region but also in the genic region.”
作者發(fā)現(xiàn)吏颖,基因組大小與總內(nèi)含子長(zhǎng)度/總外顯子長(zhǎng)度成正相關(guān)關(guān)系,尤其在裸子植物中恨樟。這表明基因組的擴(kuò)張不止發(fā)生在基因間區(qū)半醉,還發(fā)生在基因區(qū)。
作者真是不吝嗇勞動(dòng)劝术,一下統(tǒng)計(jì)了67個(gè)物種缩多,佩服。而我總想著少干活夯尽,慚愧瞧壮。
We assessed the annotation completeness of P. tabuliformis using both BUSCO genome model and protein model. The result showed that BUSCO covered 84% of complete genes in the protein model in contrast to 44.5% in the genome model. We compared the gene sets that could be recognized as complete by both BUSCO protein and genome model (Pc-Gc) with the gene sets that could only be recognized by protein mode (Pc-Gm). We found that most super long genes with multi-introns were not detected under genome model but were recognized in protein model, indicating that multiple long introns are the pri?mary causes of low BUSCO genome completeness (Figure 2B).
通過(guò)BUSCO對(duì)注釋的完整性進(jìn)行了評(píng)估,蛋白模型的覆蓋了84%的完整基因而基因組模型只覆蓋了44.5%匙握。同過(guò)比較咆槽,是因?yàn)閾碛性S多內(nèi)含子的超大基因在基因組模型下沒(méi)有檢測(cè)到。(os:按照這種說(shuō)法那是BUSCO這個(gè)軟件自己有問(wèn)題了圈纺?)
To study whether such extraordinarily long introns would disrupt transcription, we divided genes into two groups by the sizes of the first introns and found that the genes with longer first introns always had relatively higher expression levels in all eleven studied organs/tissues than those with shorter introns (Figure S2B).
為了驗(yàn)證超長(zhǎng)內(nèi)含子是否會(huì)阻斷轉(zhuǎn)錄秦忿,作者根據(jù)首個(gè)內(nèi)含子長(zhǎng)度將基因分成兩組,結(jié)果顯示擁有長(zhǎng)的首個(gè)內(nèi)含子的基因表達(dá)量在所有11個(gè)研究的器官/組織中相對(duì)高蛾娶。
(簡(jiǎn)書(shū)設(shè)置文字顏色的方法灯谣,需要用到插入公式 \color{red}{你要寫(xiě)的文字} )
To gain insight into the gene-expression recognition mechanism of small exons from super-long introns in conifers, we manually checked the RNA-junction and DNA methylation patterns of the 10 long genes. Large amounts of RNA-junction data confirm that small exons can be accurately identified and transcribed in a huge DNA that was thousands of times longer than exons (Figure S2E). It is noteworthy that almost all CG and CHG sites in long introns were methylated, whereas exon regions were marked by low methylation levels, especially for the CHG context (Figure S2F), indicating that DNA methylation was probably involved in the accurate exon recognition from super-long introns.
作者為了探究從超長(zhǎng)內(nèi)含子中識(shí)別小外顯子的基因表達(dá)機(jī)制,查看了10個(gè)長(zhǎng)基因的RNA連接和DNA甲基化模式蛔琅。長(zhǎng)內(nèi)含子的CG和CHG幾乎全被甲基化胎许,而外顯子為低甲基化。