2019年11月bioRxiv生信好文速覽

11月6號(hào)涎跨,biorxiv上post出了一篇獨(dú)特的預(yù)印本(preprint):來(lái)自biorxiv的創(chuàng)建團(tuán)隊(duì)的Richard Server等人兼丰,以bioRxiv: the preprint server for biology為題發(fā)布了一篇preprint秸架,其中涵蓋了對(duì)biorxiv五年來(lái)的總結(jié)【1】。文中提到叠赦,bioRxiv已經(jīng)發(fā)布了超過(guò)64,000篇預(yù)印本文章,每個(gè)月有超過(guò)2000篇新的preprint投放卷拘,每周超過(guò)400萬(wàn)點(diǎn)擊量,且更重要的是祝高,這些數(shù)字還在不斷上升栗弟!

Fig.4. The growth of bioRxiv. A. Monthly submissions to bioRxiv. New articles are in blue; revised articles are in red.

蓬勃發(fā)展的預(yù)印本也開(kāi)始從草根階段登堂入室。11月19號(hào)工闺,Nature雜志就以Every butterfly in the United States and Canada now has a genome sequence為題乍赫,在新聞欄目中報(bào)道了一項(xiàng)剛剛投放到biorxiv上的預(yù)印本手稿。這篇預(yù)印本里陆蟆,來(lái)自德州西南醫(yī)學(xué)中心Grishin實(shí)驗(yàn)室的研究人員報(bào)道了對(duì)美國(guó)和加拿大所有845種蝴蝶物種的基因組的測(cè)序結(jié)果雷厂,以對(duì)蝴蝶的基因組進(jìn)化,特別是蝴蝶種水平的系統(tǒng)發(fā)育同物種分化速率進(jìn)行細(xì)致研究叠殷。

最近Nature上還有另一篇熱門(mén)的進(jìn)化生物學(xué)文章改鲫,那就澳洲學(xué)者通過(guò)群體基因組學(xué)的研究將現(xiàn)代人類(lèi)“走出非洲”的具體位置追溯到非洲南部的博茨瓦納國(guó)【2】。然而林束,該文一經(jīng)發(fā)表就聽(tīng)到了一些不同的論調(diào)∠窦現(xiàn)在,其中的一些聲音終于落在了紙上:來(lái)自瑞典烏普薩拉大學(xué)(Uppsala University)的Carina Schlebusch等人上月于preprints.org以預(yù)印本形式表達(dá)了對(duì)原文的強(qiáng)烈反擊壶冒,直指其結(jié)論完全站不住腳缕题。不知道這篇preprint會(huì)否經(jīng)過(guò)同行評(píng)議轉(zhuǎn)化為一篇短文不久后在Nature見(jiàn)刊呢?

預(yù)知更多關(guān)于這幾篇文章的更多細(xì)節(jié)胖腾?那就請(qǐng)瀏覽我們?yōu)槟鷰?lái)的11月bioRxiv生信好文速覽吧避除。

?1.?北美全部845種蝴蝶基因組測(cè)序展示動(dòng)物進(jìn)化的整體規(guī)律

Genomics of a complete butterfly continent(CC BY-NC-ND 4.0)

Never before have we had the luxury of choosing a continent, picking a large phylogenetic group of animals, and obtaining genomic data for its every species. Here, we sequence all 845 species of butterflies recorded from North America north of Mexico. Our comprehensive approach reveals the pattern of diversification and adaptation occurring in this phylogenetic lineage as it has spread over the continent, which cannot be seen on a sample of selected species. We observe bursts of diversification that generated taxonomic ranks: subfamily, tribe, subtribe, genus, and species. The older burst around 70 Mya resulted in the butterfly subfamilies, with the major evolutionary inventions being unique phenotypic traits shaped by high positive selection and gene duplications. The recent burst around 5 Mya is caused by explosive radiation in diverse butterfly groups associated with diversification in transcription and mRNA regulation, morphogenesis, and mate selection. Rapid radiation correlates with more frequent introgression of speciation-promoting and beneficial genes among radiating species. Radiation and extinction patterns over the last 100 million years suggest the following general model of animal evolution. A population spreads over the land, adapts to various conditions through mutations, and diversifies into several species. Occasional hybridization between these species results in accumulation of beneficial alleles in one, which eventually survives, while others become extinct. Not only butterflies, but also the hominids may have followed this path.

2.?染色體外DNA(ecDNA)上的癌基因在侵略性腫瘤中的角色

Frequent extrachromosomal oncogene amplification drives aggressive tumors(CC-BY-ND 4.0)

Extrachromosomal DNA (ecDNA) amplification promotes high oncogene copy number, intratumoral genetic heterogeneity, and accelerated tumor evolution1–3, but its frequency and clinical impact are not well understood. Here we show, using computational analysis of whole-genome sequencing data from 1,979 cancer patients, that ecDNA amplification occurs in at least 26% of human cancers, of a wide variety of histological types, but not in whole blood or normal tissue. We demonstrate a highly significant enrichment for oncogenes on amplified ecDNA and that the most common recurrent oncogene amplifications arise on ecDNA. EcDNA amplifications resulted in higher levels of oncogene transcription compared to copy number matched linear DNA, coupled with enhanced chromatin accessibility. Patients whose tumors have ecDNA-based oncogene amplification showed increase of cell proliferation signature activity, greater likelihood of lymph node spread at initial diagnosis, and significantly shorter survival, even when controlled for tissue type, than do patients whose cancers are not driven by ecDNA-based oncogene amplification. The results presented here demonstrate that ecDNA-based oncogene amplification plays a central role in driving the poor outcome for patients with some of the most aggressive forms of cancers.

?3.?想了解CRISPR knock-in后同源重組修復(fù)結(jié)果嗎?這篇來(lái)自陳-扎克伯格生物中心Manuel Leonetti課題組的文章不容錯(cuò)過(guò)

Deep profiling reveals substantial heterogeneity of integration outcomes in CRISPR knock-in experiments(CC-BY-NC-ND 4.0)

CRISPR/Cas technologies have transformed our ability to add functionality to the genome by knock-in of payload via homology-directed repair (HDR). However, a systematic and quantitative profiling of the knock-in integration landscape is still lacking. Here, we present a framework based on long-read sequencing and an integrated computational pipeline (knock-knock) to analyze knock-in repair outcomes across a wide range of experimental parameters. Our data uncover complex repair profiles, with perfect HDR often accounting for a minority of payload integration events, and reveal markedly distinct mis-integration patterns between cell-types or forms of HDR templates used. Our analysis demonstrates that the two sides of a given double-strand break can be repaired by separate pathways and identifies a major role for sequence micro-homology in driving donor mis-integration. Altogether, our comprehensive framework paves the way for investigating repair mechanisms, monitoring accuracy, and optimizing the precision of genome engineering.

?4.?BlobToolKit:檢測(cè)基因組組裝質(zhì)量的可視化工具

BlobToolKit – Interactive quality assessment of genome assemblies(CC-BY 4.0)

We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility.?We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at?https://blobtoolkit.genomehubs.org/view. We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.

?5.?印度學(xué)者:一個(gè)有參轉(zhuǎn)錄組分析的極簡(jiǎn)pipeline

A Simplest Bioinformatics Pipeline for Whole Transcriptome Sequencing: Overview of the Processing and Steps from Raw Data to Downstream Analysis(CC-BY-NC 4.0)

Recent advances in next generation sequencing (NGS) technologies have heralded the genomic research. From the good-old inferring differentially expressed genes (DEG) using microarray to the current adage NGS-based whole transcriptome or RNA-Seq pipelines, there have been advances and improvements. With several bioinformatics pipelines for analysing RNA-Seq on rise, inferring the candidate DEGs prove to be a cumbersome approach as one may have to reach consensus among all the pipelines. To Check this, we have benchmarked the well known cufflinks-cuffdiff pipeline on a set of datasets and outline it in the form of a protocol where researchers interested in performing whole transcriptome shotgun sequencing and it’s downstream analysis can better disseminate the analysis using their datasets.

?6.?轉(zhuǎn)座子和重序列注釋工具RepeatModeler升級(jí)啦

RepeatModeler2: automated genomic discovery of transposable element families(CC-BY 4.0)

The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a new pipeline that greatly facilitates this process. This new program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete LTR retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Oryza sativa (rice). In these three species, RepeatModeler2 identified approximately three times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. The program had an extremely low false positive rate when applied to simulated genomes devoid of TEs. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license (https://github.com/Dfam-consortium/RepeatModeler,?https://github.com/Dfam-consortium/TETools).

注:本文于生信菜鳥(niǎo)團(tuán)一周文獻(xiàn)推薦37中亦有呈遞

?7.?15年開(kāi)始研發(fā)胸嘁、好評(píng)不斷的泛基因組研究工具Coinfinder終于刊文

Coinfinder: Detecting Significant Associations and Dissociations in Pangenomes(CC-BY-NC-ND 4.0)

Coinfinder identifies genes that co-occur (associate) or avoid (dissociate) with each other across the accessory genomes of a pangenome of interest. Genes that associate or dissociate more often than expected by chance, suggests that those genes have a connection (attraction or repulsion) that is interesting to explore. Identification of these groups of genes will further the field’s understanding of the importance of accessory genes. Coinfinder is a freely available, open-source software which can identify gene patterns locally on a personal computer in a matter of hours.

8.?一款純粹的Smith-Waterman local alignment工具SLAST:BLAST的挑戰(zhàn)者,還是匆匆過(guò)客凉逛?

SLAST: Simple Local Alignment Search Tool(CC-BY-NC-ND 4.0)

We present a local alignment search tool not based on the usual strategy of seed and grow often employed for these tools. Instead, we just find regions in the database sequences having a high density of seed matches and then we perform a Smith-Waterman local alignment of the query sequence into these regions. This approach has some advantages for some use cases.


9.?法國(guó)索邦大學(xué)(Sorbonne Universités):基于基因排布順序的系統(tǒng)發(fā)育樹(shù)構(gòu)建工具PhyChro

Phylogenetic reconstruction based on synteny block and gene adjacencies(CC BY-NC-ND 4.0)

Gene order can be used as an informative character to reconstruct phylogenetic relationships-between species independently from the local information present in gene/protein sequences. PhyChro is a reconstruction method based on chromosomal rearrangements, applicable to a wide range of eukaryotic genomes with different gene contents and levels of synteny conservation. For each synteny breakpoint issued from pairwise genome comparisons, the algorithm defines two disjoint sets of genomes, named partial splits, respectively supporting the two block adjacencies defining the breakpoint. Considering all partial splits issued from all pairwise comparisons, a distance between two genomes is computed from the number of partial splits separating them. Tree reconstruction is achieved through a bottom-up approach by iteratively grouping sister genomes minimizing genome distances. PhyChro estimates branch lengths based on the number of synteny breakpoints and provides confidence scores for the branches. PhyChro performance isevaluatedon two datasets of 13 vertebrates and 21 yeast genomes by using up to 130 000 and 179 000 breakpoints respectively, a scale of genomic markers that has been out of reach until now. PhyChro reconstructs very accurate tree topologies even at known problematic branching positions. Its robustness has been benchmarked for different synteny block reconstruction methods. On simulated data PhyChro reconstructs phylogenies perfectly in almost all cases, and shows the highest accuracy compared to other existing tools. PhyChro is very fast, reconstructing the vertebrate and yeast phylogenies in less than 15 min. Availability PhyChro will be freely available under the BSD license after publication

?10.?馬普物理化學(xué)所S?ding實(shí)驗(yàn)室開(kāi)發(fā)真核生物宏轉(zhuǎn)錄組基因注釋新工具

MetaEuk – sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics(CC-BY 4.0)

Results MetaEuk is a toolkit for high-throughput, reference-based discovery and annotation of protein-coding genes in eukaryotic metagenomic contigs. It performs fast searches with 6-frame-translated fragments covering all possible exons and optimally combines matches into multi-exon proteins. We used a benchmark of seven diverse, annotated genomes to show that MetaEuk is highly sensitive even under conditions of low sequence similarity to the reference database. To demonstrate MetaEuk’s power to discover novel eukaryotic proteins in large-scale metagenomic data, we assembled contigs from 912 samples of the Tara Oceans project. MetaEuk predicted >12,000,000 protein-coding genes in eight days on ten 16-core servers. Most of the discovered proteins are highly diverged from known proteins and originate from very sparsely sampled eukaryotic supergroups.

?11.?【preprints.org人類(lèi)起源于博茨瓦納性宏?這個(gè)玩笑有點(diǎn)大

Human Origins in Southern African Palaeo-wetlands? Strong Claims from Weak Evidence

Chan and colleagues in their paper titled “Human origins in a southern African palaeo-wetland and first migrations” (https://www.nature.com/articles/s41586-019-1714-1) report 198 novel whole mitochondrial DNA (mtDNA) sequences and infer that ‘a(chǎn)natomically modern humans’ originated in the Makgadikgadi–Okavango palaeo-wetland of southern Africa around 200 thousand years ago. This claim relies on weakly informative data. In addition to flawed logic and questionable assumptions, the authors surprisingly disregard recent evidence and debate on human origins in Africa. As a result, the emphatic and high profile conclusions of the paper are unjustified.

?博茨瓦納在非洲的位置


引文

1.?Server, R. et al., bioRxiv: the preprint server for biology. bioRxiv, 2019.

2.?Eva K. F. Chan, et al. Humanorigins in a southern African palaeo-wetland and first migrations. Nature, 2019.


?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市状飞,隨后出現(xiàn)的幾起案子毫胜,更是在濱河造成了極大的恐慌,老刑警劉巖诬辈,帶你破解...
    沈念sama閱讀 218,682評(píng)論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件酵使,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡焙糟,警方通過(guò)查閱死者的電腦和手機(jī)口渔,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,277評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)穿撮,“玉大人缺脉,你說(shuō)我怎么就攤上這事痪欲。” “怎么了攻礼?”我有些...
    開(kāi)封第一講書(shū)人閱讀 165,083評(píng)論 0 355
  • 文/不壞的土叔 我叫張陵业踢,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我礁扮,道長(zhǎng)知举,這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,763評(píng)論 1 295
  • 正文 為了忘掉前任太伊,我火速辦了婚禮雇锡,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘倦畅。我一直安慰自己遮糖,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,785評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布叠赐。 她就那樣靜靜地躺著欲账,像睡著了一般。 火紅的嫁衣襯著肌膚如雪芭概。 梳的紋絲不亂的頭發(fā)上赛不,一...
    開(kāi)封第一講書(shū)人閱讀 51,624評(píng)論 1 305
  • 那天,我揣著相機(jī)與錄音罢洲,去河邊找鬼踢故。 笑死,一個(gè)胖子當(dāng)著我的面吹牛惹苗,可吹牛的內(nèi)容都是我干的殿较。 我是一名探鬼主播,決...
    沈念sama閱讀 40,358評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼桩蓉,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼淋纲!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起院究,我...
    開(kāi)封第一講書(shū)人閱讀 39,261評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤洽瞬,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后业汰,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體伙窃,經(jīng)...
    沈念sama閱讀 45,722評(píng)論 1 315
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,900評(píng)論 3 336
  • 正文 我和宋清朗相戀三年样漆,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了为障。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 40,030評(píng)論 1 350
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖产场,靈堂內(nèi)的尸體忽然破棺而出鹅髓,到底是詐尸還是另有隱情,我是刑警寧澤京景,帶...
    沈念sama閱讀 35,737評(píng)論 5 346
  • 正文 年R本政府宣布窿冯,位于F島的核電站,受9級(jí)特大地震影響确徙,放射性物質(zhì)發(fā)生泄漏醒串。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,360評(píng)論 3 330
  • 文/蒙蒙 一鄙皇、第九天 我趴在偏房一處隱蔽的房頂上張望芜赌。 院中可真熱鬧,春花似錦伴逸、人聲如沸缠沈。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 31,941評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)洲愤。三九已至,卻和暖如春顷锰,著一層夾襖步出監(jiān)牢的瞬間柬赐,已是汗流浹背。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 33,057評(píng)論 1 270
  • 我被黑心中介騙來(lái)泰國(guó)打工官紫, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留肛宋,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 48,237評(píng)論 3 371
  • 正文 我出身青樓束世,卻偏偏與公主長(zhǎng)得像酝陈,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子毁涉,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,976評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容