10X單細(xì)胞（10X空間轉(zhuǎn)錄組）TCR數(shù)據(jù)分析之TCRdist(2)

hello,大家好耸成，我們繼續(xù)我們的TCR數(shù)據(jù)分析宅静，這一專題會有非常多的內(nèi)容套利，我們慢慢分享召嘶，文獻(xiàn)在Quantifiable predictive features define epitopespecific T cell receptor repertoires，影響因子49（nature）曲梗。今天我們的任務(wù)還是要多學(xué)習(xí)一些基礎(chǔ)的概念和算法赞警。

TCRs from T cells that recognize the same pMHC epitope often share conserved sequence features, suggesting that it may be possible to predictively model epitope specificity（關(guān)于基因重排和抗原表位等相關(guān)的基礎(chǔ)知識妓忍，在我的文章10X單細(xì)胞（10X空間轉(zhuǎn)錄組）TCR數(shù)據(jù)分析之TCR 內(nèi)在調(diào)控潛力系統(tǒng)（TiRP）），這里強(qiáng)調(diào)的是對于相同的pMHC愧旦，TCR富集的序列會含有相同的motif世剖，這個已經(jīng)被無數(shù)的實驗證實，所以笤虫，表明有可能對表位特異性進(jìn)行預(yù)測建模旁瘫。（這也是我們這個專題的終極目的）。

這里就需要我們上一篇提到的內(nèi)容琼蚯，如果對抗原富集后的TCR進(jìn)行建模分析酬凳，首先a distance measure on the space of TCRs（TCR的距離度量） that permits clustering and visualization（這里的聚類和可視化與單細(xì)胞轉(zhuǎn)錄組不同）, a robust repertoire diversity metric that accommodates the low number of paired public receptors observed when compared to single-chain analyses（允許少量的其他單鏈序列，畢竟尋找motif）, and a distancebased classifier（分類器遭庶，這個在機(jī)器學(xué)習(xí)中非常常見） that can assign previously unobserved TCRs to characterized repertoires with robust sensitivity and specificity宁仔。

圖片.png

當(dāng)然，具體的抗原表位富集后的TCR序列contains a clustered group of receptors that share core sequence similarities, together with a dispersed set of diverse ‘outlier’ sequences（這是很自然的,這些相似的序列必然擁有相同的motif罚拟，從而特異性的結(jié)合抗原表位）台诗。通過識別核心序列中的共享基序，我們能夠突出顯示驅(qū)動 TCR 識別基本要素的關(guān)鍵保守殘基赐俗。（看來這里的序列還是氨基酸序列）。

這里我們測序得到的TCR序列弊知，我們需要總結(jié)和分析的部分是include length, charge, and hydrophobicity of the CDR3 regions, clonal diversity (within individuals), and amino acid sequence sharing (across individuals) following well-established approaches to repertoire analysis阻逮。（建立的方法我們后面介紹，總之秩彤，很多指標(biāo)需要我們深入分析叔扼，而不簡簡單單是基因序列，單細(xì)胞的TCR分析需要我們升級）漫雷。

圖片.png

Mean values for CDR3 length, charge, and hydrophobicity tightly clustered for the majority of the epitopes, and all CDR3 features showed substantially overlapping ranges（看來確實可以依據(jù)抗原富集來尋找起作用的motif）瓜富。

這里簡單回顧一下作者的發(fā)現(xiàn)，（1）found negative correlations between CDR3 charge and peptide charge（CDR3的電荷和肽段電荷成反比,以及 CDR3 長度和肽長度之間）降盹。表明電荷和長度互補(bǔ)可能在某些表位的 pMHC 識別中起作用（基礎(chǔ)知識与柑，了解即可）。（2）Whereas substantial levels of sharing or publicity were observed for individual chains（單鏈比較蓄坏，很多都是一樣的）价捧，當(dāng)考慮配對的 αβ 受體時，觀察到個體之間的共享水平較低（這一點很有意思涡戳，單鏈比較有大量的相同演痒，而配對的雙鏈卻鮮有一致的惠勒，有意思）。

單細(xì)胞測TCR的作用衔沼，By using paired single-cell TCRαβ sequencing, we were able to determine whether V and J segment usage was correlated both within a chain (for example, Vα –Jα , Vβ –Jβ ) and across chains (for example, Vα –Vβ , Vα –Jβ).（尋找相關(guān)性）。

相對于沒有進(jìn)行抗原表位富集的TCR序列觅赊，病毒抗原表位識別后的TCR序列found varying degrees of dominance of single and pairwise gene associations。（這個也是在預(yù)料之中）。

圖片.png

圖注：V and J gene segment usage and covariation in epitopespecific responses（V 和 J 基因片段使用和表位特異性反應(yīng)中的協(xié)變）. a, Gene segment usage and gene–gene pairing landscapes are illustrated using four vertical stacks（垂直堆疊） (one for each V and J segment) connected by curved paths whose thickness is proportional to the number of TCR clones with the respective gene pairing（就是芍查牛基圖） (each panel is labelled with the four gene segments atop their respective colour stacks and the epitope identifier in the top middle). Genes are coloured by frequency within the repertoire with a fixed colour sequence used throughout the manuscript which begins red (most frequent), green (second most frequent), blue, cyan, magenta, and black. The enrichment of gene segments relative to background frequencies is indicated by up or down arrows with an arrowhead number equal to the log2 of the fold change. b, Jensen–Shannon divergence（有關(guān)JS散度大家可以參考文章KL散度、JS散度掌测、Wasserstein距離） between the observed gene frequency distributions and background frequencies, normalized by the mean Shannon entropy of the two distributions (higher values reflect stronger gene preferences). c, Adjusted mutual information of gene usage correlations between regions (higher values indicate more strongly covarying gene usage). The lower limits of the colour ranges in b and c were chosen to highlight significant changes内贮。 A summary of the number of subjects, total number of TCR sequences

圖片.png

圖注：Gene segment usage and gene–gene pairing landscapes are illustrated graphically using four vertical stacks (one for each V and J segment) connected by curved segments with thickness proportional to the number of TCRs with the respective gene pairing (each panel is labelled with the four gene segments atop their respective colour stacks and the epitope identifier in the top middle). Genes are coloured by frequency within the repertoire with a fixed colour sequence used throughout the manuscript which begins red (most frequent), green (second most frequent), blue, cyan, magenta, and black. Clonally expanded TCRs were reduced to a single data point for this analysis. The number of clones is indicated to the left of each panel. The enrichment of gene segments relative to background frequencies is indicated by up or down arrows, with each successive arrowhead corresponding to an additional twofold deviation (for example, one arrowhead = twofold enrichment, two arrowheads = fourfold enrichment).（和上圖的表現(xiàn)形式一致）。

每個表位特異性反應(yīng)的特征是單個基因的過度表達(dá)以及顯著的基因配對偏好汞斧，這就為我們對單獨的抗原表位進(jìn)行建模尋找motif提供了理論依據(jù)夜郁。每個表位特異性基因頻率分布和背景分布之間的 Jensen-Shannon 散度用于量化基因偏好的總大小（這個需要我們有一點的算法基礎(chǔ)）。We quantified the degree of gene usage covariation between pairs of segments using the adjusted mutual information score（這也是重要的一環(huán)）粘勒。

為了尋找motif竞端，TCR的距離定義就需要排上用場了。（概念和計算原理上篇已經(jīng)說過庙睡，CD3的懲罰更重）事富。

圖片.png

圖注：2D kernel principal components analysis (PCA) projection of the TCRdist landscape coloured by Vα (left panel) and Vβ (right panel) gene usage. Three groups of receptors that correspond to TCR logos and clusters depicted in c are indicated with dashed ellipses.（單細(xì)胞都很常見的方法）

圖片.png

圖注：Epitope-specific TCR landscapes were projected into two dimensions (2D) using kernel PCA analysis applied to the TCRdist distance matrix: TCRs with small TCRdist values tend to project to nearby points in 2D. The same 2D projection is shown in the four panels of each row, coloured by Vα , Jα , Vβ and Jβ gene segment usage (left to right, respectively). The colours are based on gene frequency in the projected repertoire and follow the same sequence used throughout the manuscript: in decreasing order, 1, red; 2, green; 3, blue; 4, cyan; 5, magenta; 6, black; followed by assorted colours for rare frequencies. A summary of number of subjects,

To complement these landscape projections, we performed $TCRdist$ based

clustering of the epitope-specific receptors and constructed hierarchical
distance trees(一個很好的分析軟件，TCRdist)（It is important to note that clonal expansions are not reflected in these repertoire landscape analyses, as each unique receptor is included only once.)乘陪，不計算重復(fù)）统台，developed a TCR logo representation that summarizes the gene frequencies, CDR3 amino acid sequences, and inferred rearrangement（這個地方也需要注意，大家做過生化實驗的應(yīng)該都懂這個）啡邑。主要有一個cluster組成贱勃，其他的序列也是相似的結(jié)構(gòu)，這就為我們尋找motif提供了便利谤逼。除了相似受體的核心cluster之外贵扰，每個repertoire還包含彼此明顯不同的受體的多個區(qū)域。
structures of a set of TCRs as a tool to further annotate these clusters

圖片.png

圖注：Average-linkage dendrogram of TCRdist receptor clusters coloured by generation probability, with TCR logos for selected receptor subsets (the branches enclosed in dashed boxes labelled with size of the TCR clusters). Each logo depicts the V- (left side) and J- (right side) gene frequencies, CDR3 amino acid sequences (middle), and inferred rearrangement structure (bottom bars coloured by source region, light grey for the V-region, dark grey for J, black for D, and red for N-insertions) of the grouped receptors. (n = 13 mice, 291 TCR clones.)

盡管 CDR3 序列保守性在 TCRdist 簇標(biāo)識中很明顯流部，但這些共享的 CDR3 殘基中有許多直接來自 V 和 J 區(qū)的基因組序列戚绕，因此反映了觀察到的基因使用偏差，為了尋找CDR3的motif序列枝冀，采用了遞歸搜索算法舞丛，identified sequence patterns that occur significantly more often in the observed receptors than in two V- and J-gene-matched background sets of receptor sequences（這需要結(jié)構(gòu)生物學(xué)的只是了，知道的太少了宾茂，慚愧）瓷马。

圖片.png

注：Enriched CDR3 sequence motifs define key features of epitope specificity. The top-scoring CDR3α (left TCR logo) and CDR3β (right TCR logo) sequence motifs are shown for each repertoire. The motif sequence logo is shown at full height (top) and scaled (bottom) by per-column relative entropy to background frequencies derived from TCRs with matching gene-segment composition in order to highlight motif positions under selection. For three epitopes with solved ternary TCR–pMHC structures, the enriched motif positions are mapped onto the 3D structure: motif positions shown in green sticks; peptide in magenta; alpha (beta) chain in yellow (blue) cartoons; selected hydrogen bonds shown as dotted green lines。

propose that these statistically enriched, non-germline-encoded motifs have a critical role in mediating TCR recognition（應(yīng)該是這樣的）跨晴，對TCR的蛋白結(jié)構(gòu)分析也證明了這一點欧聘。所以我們對于TCR的序列分析，能夠識別驅(qū)動 TCR 識別（抗原）essential elements的關(guān)鍵保守殘基端盆，這個分析怀骤，太重要了费封。

接下來應(yīng)用 TCRdist 測量來定量評估表位特異性庫中的受體多樣性和density，采用了一個new diversity metric (TCRdiv) that generalizes Simpson’s diversity index（辛普森多樣性指數(shù)蒋伦，大家可以百度一下弓摘，看看這個指數(shù)） by capturing similarity among receptors in addition to exact identity, as Simpson’s diversity index is highly sensitive to sampling noise because of the relative rarity of observing identical αβ pairs among individuals。

Examination of TCRdiv scores for the analysed repertoires for single chains as well as paired receptors clarified trends seen in the earlier analyses（例如：the PB1 repertoire exhibited low diversity in the α -chain and high β -chain diversity）

圖片.png

如上所述痕届，我們的landscape分析表明韧献，每個repertoire都由一組或多組共享相似序列特征的cluster受體以及更多樣化的離群cluster組成⊙薪校考慮到cluster和發(fā)散的 TCR 的貢獻(xiàn)锤窑，開發(fā)了一個特定于repertoires的最近鄰評分（NN 距離），它捕獲了每個受體周圍的受體密度（計算為受體與其在repertoires中的最近鄰受體之間的平均 TCRdist）嚷炉。 Although variation across repertoires was apparent in the NN-distance distributions渊啰，大多數(shù)表位表現(xiàn)出近似雙峰分布，其中一個具有低 NN 距離的受體峰代表受體分布的主要和密集采樣的主要cluster申屹，而具有更大 NN 距離的受體的第二個峰反映了異常受體绘证。

圖片.png

為了確認(rèn)這些非成簇受體的抗原特異性，把兩個峰的受體提取出來哗讥，然后實驗衡量binding特異性四聚體的能力（識別相應(yīng)抗原的能力）嚷那。在每種情況下都確認(rèn)了受體的反應(yīng)性，表明這些不同的異常受體中至少有一些是legitimate杆煞，if unconventional, solutions to the problem of epitope specificity车酣，部分解釋了這種現(xiàn)象。

這個軟件還有分類器的功能索绪，幫助我們識別專有T細(xì)胞的motif，比如浸潤腫瘤的TCR序列等等贫悄，非常有價值瑞驱，今天的基礎(chǔ)知識我們就到這里，下一篇我們分享軟件TCRdist的算法和代碼窄坦。

生活很好唤反，有你更好

最后編輯于：2021.09.09 09:39:57

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

禁止轉(zhuǎn)載，如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者鸭津。

人面猴
序言：七十年代末彤侍，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子逆趋，更是在濱河造成了極大的恐慌盏阶，老刑警劉巖，帶你破解...
沈念sama閱讀 206,013評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件闻书，死亡現(xiàn)場離奇詭異名斟，居然都是意外死亡脑慧，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門砰盐，熙熙樓的掌柜王于貴愁眉苦臉地迎上來闷袒，“玉大人，你說我怎么就攤上這事岩梳∧抑瑁” “怎么了？”我有些...
開封第一講書人閱讀 152,370評論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵冀值，是天一觀的道長也物。經(jīng)常有香客問我，道長池摧，這世上最難降的妖魔是什么焦除？我笑而不...
開封第一講書人閱讀 55,168評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮作彤，結(jié)果婚禮上膘魄，老公的妹妹穿的比我還像新娘。我一直安慰自己竭讳，他們只是感情好创葡，可當(dāng)我...
茶點故事閱讀 64,153評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著绢慢，像睡著了一般灿渴。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上胰舆，一...
開封第一講書人閱讀 48,954評論 1贊 283
城市分裂傳說
那天骚露，我揣著相機(jī)與錄音，去河邊找鬼缚窿。笑死棘幸，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的倦零。我是一名探鬼主播误续，決...
沈念sama閱讀 38,271評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼扫茅！你這毒婦竟也來了蹋嵌？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 36,916評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤葫隙，失蹤者是張志新（化名）和其女友劉穎栽烂，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,382評論 1贊 300
?護(hù)林員之死
正文獨居荒郊野嶺守林人離奇死亡愕鼓，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,877評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年钙态，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片菇晃。...
茶點故事閱讀 37,989評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡册倒，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出磺送，到底是詐尸還是另有隱情驻子，我是刑警寧澤，帶...
沈念sama閱讀 33,624評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布估灿，位于F島的核電站崇呵，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏馅袁。R本人自食惡果不足惜域慷，卻給世界環(huán)境...
茶點故事閱讀 39,209評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望汗销。院中可真熱鬧犹褒，春花似錦、人聲如沸弛针。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,199評論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽削茁。三九已至宙枷，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間茧跋，已是汗流浹背慰丛。一陣腳步聲響...
開封第一講書人閱讀 31,418評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留瘾杭，地道東北人璧帝。一個月前我還...
沈念sama閱讀 45,401評論 2贊 352
代替公主和親
正文我出身青樓，卻偏偏與公主長得像富寿，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子锣夹，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 42,700評論 2贊 345

10X單細(xì)胞（10X空間轉(zhuǎn)錄組）TCR數(shù)據(jù)分析之TCRdist(2)

Mean values for CDR3 length, charge, and hydrophobicity tightly clustered for the majority of the epitopes, and all CDR3 features showed substantially overlapping ranges（看來確實可以依據(jù)抗原富集來尋找起作用的motif）瓜富。

相對于沒有進(jìn)行抗原表位富集的TCR序列觅赊，病毒抗原表位識別后的TCR序列found varying degrees of dominance of single and pairwise gene associations。（這個也是在預(yù)料之中）。

為了尋找motif竞端，TCR的距離定義就需要排上用場了。（概念和計算原理上篇已經(jīng)說過庙睡，CD3的懲罰更重）事富。

To complement these landscape projections, we performed based

Examination of TCRdiv scores for the analysed repertoires for single chains as well as paired receptors clarified trends seen in the earlier analyses（例如：the PB1 repertoire exhibited low diversity in the α -chain and high β -chain diversity）

這個軟件還有分類器的功能索绪，幫助我們識別專有T細(xì)胞的motif，比如浸潤腫瘤的TCR序列等等贫悄，非常有價值瑞驱，今天的基礎(chǔ)知識我們就到這里，下一篇我們分享軟件TCRdist的算法和代碼窄坦。

推薦閱讀更多精彩內(nèi)容

To complement these landscape projections, we performed $TCRdist$ based