導(dǎo)讀
從兩克重的大黃蜂到重達(dá)數(shù)噸的鯨魚,地球上存在著包括人類在內(nèi)的豐富的物種盲链,在過去的漫長(zhǎng)時(shí)間里蝇率,它們幾乎適應(yīng)了地球上的所有環(huán)境。其中刽沾,哺乳動(dòng)物是最多樣化的一類動(dòng)物本慕,無論是在大小上,還是在形狀上侧漓,均表現(xiàn)出豐富的多樣性锅尘。自生命科學(xué)研究出現(xiàn)以來,了解哺乳動(dòng)物的變異是何時(shí)布蔗、如何以及在何種選擇壓力下發(fā)展起來的一直是人們感興趣的問題藤违。此外,通過研究人的進(jìn)化史纵揍,還可以進(jìn)一步了解人類的健康狀況顿乒,例如,那些在許多物種中保守的基因可能是對(duì)正常功能至關(guān)重要的基因泽谨,因此當(dāng)其發(fā)生改變時(shí)可能導(dǎo)致疾病淆游。
2023 年 4 月 28 日,諸多科學(xué)家們與世界上最大的哺乳動(dòng)物基因組學(xué)比較資源 Zoonomia Project 的國(guó)際合作隔盛,同日在 Science 雜志上發(fā)表了 11 篇研究論文。他們對(duì) 240 種哺乳動(dòng)物物種(占哺乳動(dòng)物家族的 80% 以上)的基因組多樣性進(jìn)行了編目拾稳。其中吮炕,部分研究發(fā)現(xiàn)指出人類基因組中經(jīng)過數(shù)百萬年進(jìn)化后保持不變的部分,提供了可能揭示人類健康和疾病的信息访得。
Zoonomia Project 是由麻省理工學(xué)院龙亲、哈佛大學(xué)等單位的科學(xué)家牽頭的一個(gè)大型國(guó)際研究項(xiàng)目,研究人員通過對(duì)一系列哺乳動(dòng)物基因組進(jìn)行測(cè)序悍抑,然后將數(shù)百個(gè)物種的基因組進(jìn)行整合分析鳄炉,為理解哺乳動(dòng)物、哺乳動(dòng)物進(jìn)化和人類自身打開一扇新的大門搜骡。研究人員對(duì)一系列哺乳動(dòng)物基因組進(jìn)行測(cè)序拂盯,然后將它們進(jìn)行對(duì)比,這是一項(xiàng)巨大的計(jì)算任務(wù)记靡。利用這種比對(duì)谈竿,研究人員確定了基因組的關(guān)鍵區(qū)域团驱,在哺乳動(dòng)物物種和數(shù)百萬年的進(jìn)化中最為保守或不變。
作者假設(shè)空凸,雖然這些區(qū)域不產(chǎn)生蛋白質(zhì)嚎花,但可能包含指導(dǎo)蛋白質(zhì)產(chǎn)生時(shí)間和數(shù)量的指令,這些區(qū)域的突變可能在疾病的起源或哺乳動(dòng)物物種的獨(dú)特特征中發(fā)揮重要作用呀洲。通過他們的分析紊选,研究人員也驗(yàn)證了這一假設(shè),并能夠確定至少 10% 的人類基因組是有功能的道逗,大約是蛋白質(zhì)編碼(1%)的十倍兵罢。研究結(jié)果進(jìn)一步揭示了遺傳變異可能在罕見和常見的人類疾病(包括癌癥)中起到因果作用憔辫。
01
如果某些東西對(duì)物種正常的功能很重要趣些,那么它往往會(huì)在進(jìn)化過程中被保存下來,即進(jìn)化約束概念贰您。因此坏平,進(jìn)化約束是衡量基因組中特定區(qū)域在生命進(jìn)化樹上的變化程度。在今日 Science 特刊的一篇研究 Leveraging base-pair mammalian constraint to understand genetic variation and human disease中锦亦,Sullivan 等人觀察到的在許多物種和進(jìn)化過程中保持不變的 DNA 序列舶替,以及在一個(gè)或幾個(gè)譜系中突然開始積累突變的序列,都強(qiáng)有力地表明了功能相關(guān)性和進(jìn)化力量在起作用杠园。研究人員還通過研究髓母細(xì)胞瘤患者顾瞪,發(fā)現(xiàn)了人類基因組進(jìn)化保守位置的突變,他們認(rèn)為這些突變可能導(dǎo)致腦腫瘤生長(zhǎng)更快或抵抗治療抛蚁。結(jié)果表明陈醒,在疾病研究中使用這些數(shù)據(jù)和方法可以更容易地發(fā)現(xiàn)增加疾病風(fēng)險(xiǎn)的遺傳變化。
02
在研究 Evolutionary constraint and innovation across hundreds of placental mammals中瞧甩,研究人員確定了與哺乳動(dòng)物世界中一些特殊特征相關(guān)的基因組部分钉跷,例如非凡的大腦大小、卓越的嗅覺以及在冬季冬眠的能力肚逸。作者使用基因組來證實(shí)爷辙,對(duì)有效種群規(guī)模和多樣性的估計(jì)可以幫助預(yù)測(cè)難以監(jiān)測(cè)和采樣的物種的風(fēng)險(xiǎn)。
03
在另一項(xiàng)研究 A genomic timescale for placental mammal evolution 中表明朦促,甚至在大約 65 萬年前膝晾,即地球被小行星撞擊、恐龍滅絕之前务冕,哺乳動(dòng)物就已經(jīng)開始變異和分化血当。
Superordinal mammalian diversification took place in the Cretaceous during periods of continental fragmentation and sea level rise with little phylogenomic discordance (pie charts: left, autosomes; right, X chromosome), which is consistent with allopatric speciation. By contrast, the Paleogene hosted intraordinal diversification in the aftermath of the K-Pg mass extinction event, when clades exhibited higher phylogenomic discordance consistent with speciation with gene flow and incomplete lineage sorting.
04
另一項(xiàng)題為Three-dimensional genome rewiring in loci with human accelerated regions的研究中棒动,使用 Zoonomia 數(shù)據(jù)和實(shí)驗(yàn)分析檢查了 10000 多個(gè)特定于人類的基因缺失竿报,并將其中一些與神經(jīng)元的功能聯(lián)系起來。
The HAR is nearby and regulates gene A, but not gene B, as the chimpanzee genome folds. An insertion in the human genome brings the HAR closer to gene B, causing expression of gene B. The HAR adapts to being in gene B’s regulatory domain through substitutions to previously conserved nucleotides.
05
一篇題為Comparative genomics of Balto, a famous historic dog, captures lost diversity of 1920s sled dogs的研究中,提供了為什么 1920 年代一只名叫巴爾托的著名雪橇犬能夠在阿拉斯加的惡劣環(huán)境中幸存下來的遺傳解釋枉长。
In an unsupervised admixture analysis, Balto’s ancestry, representing 20th-century Alaskan sled dogs, is assigned predominantly to four Arctic lineage dog populations. He had no discernable wolf ancestry. The Alaskan sled dogs (a working population) did not fall into a distinct ancestry cluster but shared about a third of their ancestry with Balto in the supervised admixture analysis. Balto and working sled dogs carried fewer constrained and missense rare variants than modern dog breeds.IMAGE CREDIT: K. MORRILL
06
一篇題為 The functional and evolutionary impacts of human-specific deletions in conserved elements的研究中饰豺,Xue 等人則分享了對(duì)基因組結(jié)構(gòu)的研究连茧。在確定了僅跨越少數(shù)堿基的缺失后腹侣,他們分析了這些缺失在多種人類細(xì)胞類型中調(diào)節(jié)基因表達(dá)的能力,并探索了這些缺失是否可能導(dǎo)致獨(dú)特的人類表型撤奸。結(jié)果發(fā)現(xiàn)吠昭,復(fù)雜的認(rèn)知功能再次成為人類進(jìn)化過程中序列變化的主要受益者之一,這些小缺失附近的基因系統(tǒng)地富集了那些在大腦和神經(jīng)元功能中發(fā)揮作用的基因胧瓜。通過實(shí)驗(yàn)證實(shí)了它們?cè)诙喾N細(xì)胞類型中的功能后矢棚,作者還觀察到,許多缺失導(dǎo)致人類細(xì)胞中基因表達(dá)的增加府喳,這是獲取新功能的驅(qū)動(dòng)因素蒲肋。
We assessed 10,032 hCONDELs across diverse, biologically relevant datasets and identified tissue-specific enrichment (top left). The regulatory impact of hCONDELs was characterized by comparing chimp and human sequences in MPRAs (bottom left). The ability of hCONDELs to either improve or perturb activating and repressing gene-regulatory elements was assessed (top right). The deleted chimpanzee sequence was reintroduced back into human cells, causing a cascade of transcriptional differences for an hCONDEL regulating LOXL2 (bottom right).
07
在一篇題為 Relating enhancer genetic variation across mammals to complex phenotypes using machine learning的研究中,研究人員使用機(jī)器學(xué)習(xí)來識(shí)別與大腦大小相關(guān)的基因組區(qū)域钝满。
TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic]
08
在題為 Mammalian evolution of human cis-regulatory elements and transcription factor binding sites的研究中兜粘,描述了人類基因組中調(diào)控序列的進(jìn)化。
(A) Distribution of human cCREs by the number of genomes they align.
(B) Projection of cCREs by alignments to the other 240 mammalian genomes.
(C) Project of HNF4A sites (constrained, red; unconstrained, blue).
(D) Heritability enrichment for 69 human traits in partitions of TFBSs ordered by evolutionary constraint.
(E) Heritability enrichment for human traits by subsets of TFBSs.
09
在題為 Insights into mammalian TE diversity through the curation of 248 genome assemblies 的研究中弯蚜,檢測(cè)了 248 個(gè)胎盤哺乳動(dòng)物基因組裝配體的轉(zhuǎn)座元件 (transposable element, TE) 含量孔轴,這是迄今真核生物中最大的 de novo TE 管理工作。研究發(fā)現(xiàn)碎捺,盡管哺乳動(dòng)物在總 TE 含量和多樣性方面相似路鹰,但它們?cè)诮?TE 積累方面表現(xiàn)出實(shí)質(zhì)性的差異。哺乳動(dòng)物在任何給定的時(shí)間往往只積累少數(shù)幾種 TE收厨,其中一種 TE 占主導(dǎo)地位晋柱。此外,還發(fā)現(xiàn)了飲食習(xí)慣與 DNA 轉(zhuǎn)座子入侵之間的關(guān)聯(lián)诵叁。
Five categories of TE were examined: DNA transposons, long interspersed elements (LINEs), long terminal repeat (LTR) retrotransposons, rolling circle (RC) transposons, and short interspersed elements (SINEs). Species with the highest and lowest proportions for each TE type are indicated by a picture of the organism and its common name. With regard to RC and DNA transposons, we found that most mammalian genome assemblies exhibit essentially zero recent accumulation (RC: 240 of 248 mammals had <0.1%; DNA: 210 of 248 mammals had <0.1%).ILLUSTRATIONS: BRITTANY ANN HALE
10
在題為 The contribution of historical processes to contemporary extinction risk in placental mammals 的研究中趣斤,調(diào)查了 240 種哺乳動(dòng)物的單基因組的遺傳變異,發(fā)現(xiàn)由于遺傳負(fù)荷的長(zhǎng)期積累和固定黎休,歷史上種群較小的物種攜帶了比例較大的有害等位基因,有較高的滅絕風(fēng)險(xiǎn)玉凯。
Across 240 mammals, species with smaller historical Ne had lower genetic diversity, higher genetic load, and were more likely to be threatened with extinction. Genomic data were used to train models that predict whether a species is threatened, which can be valuable for assessing extinction risk in species lacking ecological or census data. [Animal silhouettes are from PhyloPic]
11
在題為 Integrating gene annotation with orthology inference at scale 的研究中势腮,提出了 TOGA(Tool to infer Orthologs from Genome Alignments),這是一種集成了結(jié)構(gòu)基因注釋和同源序列推斷的方法漫仆。研究人員將其應(yīng)用于 488 個(gè)胎盤哺乳動(dòng)物和 501 個(gè)鳥類捎拯,從而創(chuàng)建了迄今最大的比較基因資源。
Orthologous, but not paralogous, genes have partially aligning intronic and intergenic regions. TOGA uses this principle to infer orthologous gene loci and integrates orthology inference with gene annotation. Using a reference species, TOGA can be applied to hundreds of aligned query genomes to provide rich comparative genomics resources.
在本期 Science 特刊的一系列論文中盲厌,比較了 240 種哺乳動(dòng)物的基因組署照,其中還包含了許多受威脅或?yàn)l危物種祸泪。這些 DNA 樣本由全球 50 多個(gè)不同的機(jī)構(gòu)收集和提供,這些發(fā)現(xiàn)有助于說明比較基因組學(xué)如何不僅可以闡明某些物種如何取得非凡的壯舉建芙,還可以幫助科學(xué)家更好地了解我們基因組中功能正常的部分以及它們?nèi)绾斡绊懡】岛图膊?/strong>没隘。
參考文獻(xiàn):
1. Bogdan M. Kirilenko et al. Integrating gene annotation with orthology inference at scale. Science (2023).
2. Aryn P. Wilder et al. The contribution of historical processes to contemporary extinction risk in placental mammals. Science (2023).
3. Nicole M. Foley et al. A genomic timescale for placental mammal evolution. Science (2023).
4. Austin B. Osmanski et al. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science (2023).
5. James R. Xue et al. The functional and evolutionary impacts of human-specific deletions in conserved elements. Science (2023).
6. Matthew J. Christmas and Irene M. Kaplow et al. Evolutionary constraint and innovation across hundreds of placental mammals. Science (2023).
7. Katherine L. Moon et al. Comparative genomics of Balto, a famous historic dog, captures lost diversity of 1920s sled dogs. Science (2023).
8. Gregory Andrews et al. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science (2023).
9. Kathleen C. Keough et al. Three-dimensional genome rewiring in loci with human accelerated regions. Science (2023).
10. Irene M. Kaplow et al. Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science (2023).
11. Patrick F. Sullivan et al. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science (2023).