1. 樣本組成
99份病毒滅活處理的血清樣本:分為對照(健康)組、疑似但實(shí)為普通流感組氮墨、新冠感染輕癥組甩鳄、新冠感染重癥組。
2. 樣本處理過程
- 5 μL serum 溶解在 50 μL lysis buffer (8 M urea in 100 mM triethylammonium bicarbonate, TEAB) 悍手;還原、烷基化袍患、兩步trypsin酶切坦康、 TMTpro 16-plex標(biāo)記;
- 預(yù)分120個(gè)組分诡延,最終合并為40個(gè)組分滞欠,Q Exactive HF-X DDA檢測;
- PD搜庫: Homo sapiens fasta database downloaded from UniprotKB on 07 Jan 2020 and the SARS-CoV-2 virus fasta downloaded from NCBI (version NC_045512.2).
- The peptide-spectrum-match allowed 1% target false discovery rate (FDR) (strict) and 5% target FDR (relaxed). Normalization was performed against the total peptide amount.
-
Quality control:The quality of proteomic data was ensured at multiple levels.
a. First, a mouse liver digest was used for instrument performance evaluation.
b. We also run water samples (buffer A) as blanks every 4 injections to avoid carry-over.
c. Serum samples of four patient groups from both training and validation cohorts were randomly distributed in eight different batches.
d. Six samples were injected in technical replicates. - non-target metabolomics 代謝組學(xué)分析:每個(gè)樣本分成四份肆良,進(jìn)行四種檢測:two for analysis using two separate reverse-phase /ultra-performance liquid chromatography (RP/UPLC)-MS/MS methods with positive ion-mode electrospray ionization (ESI), one for analysis using RP/ UPLC-MS/MS with negative-ion mode ESI, and one for analysis using hydrophilic interaction liquid chromatography (HILIC)/UPLC-MS/MS with negative-ion mode ESI.
-
統(tǒng)計(jì)學(xué)分析
a . 差異倍數(shù)選擇:Log2 fold-change (log2 FC) was calculated on the mean of the same patient group for each pair of comparing
groups. The statistical significantly changed proteins or metabolites were selected using the criteria of adjust p value less than 0.05 indicated and absolute log2 FC larger than 0.25.
b. t-test:Two-sided unpaired Welch’s t test was performed for each pair of comparing groups and adjusted p values were calculated using Benjamini & Hochberg correction.
c. 機(jī)器學(xué)習(xí):From the training cohort, the important features were selected with mean decrease accuracy larger than 3 using random forest containing a thousand trees using R package randomForest (version 4.6.14) random forest analysis with 10-fold cross validation as binary classification of paired severe and non-severe group using combined differentially regulated proteins and metabolites features. The random forest analysis was further performed for a hundred times on the matrix with only the selected important features using normalized additive predicting probability as the final predicting probability and the larger probability as the predictive label. Those selected important features were used for the random forest analysis on the independent validation cohort.
3. Results
為了有效地對單細(xì)胞測序數(shù)據(jù)進(jìn)行各種處理分析筛璧,特別是細(xì)胞亞型的鑒定,通常需要首先對單細(xì)胞測序數(shù)據(jù)進(jìn)行降維惹恃。單細(xì)胞測序數(shù)據(jù)的降維方法主要可分為兩大類:
1夭谤、Dimensionality reduction(降維),降維方法通常是把高維數(shù)據(jù)通過優(yōu)化保留原始數(shù)據(jù)中的關(guān)鍵特征后投射到低維空間巫糙,從而可以通過二維或三維的形式把數(shù)據(jù)展示出來朗儒。
常用的降維方法有:
1)PCA(Principle Component Analysis),主成分分析,是一種線性的降維方法醉锄;
2)t-SNE(T-distributed stochastic neighbor embedding)乏悄,是一種非線性的降維方法;
3)UMAP (uniform manifold approximation and projection) (Becht et al., 2018, Nat. Biotechnol.),
4)scvis (Ding et al., 2018, Nat. Commun.)
2恳不、Feature selection(特征選擇)檩小,主要是通過去除信息含量少的基因而保留信息含量最多的基因來降低數(shù)據(jù)的維度。
常用的Feature selection的方法有:
1)基于先驗(yàn)信息的方法(如已知細(xì)胞的亞型)妆够。比如通過SCDE軟件鑒定已知不同細(xì)胞亞型間的差異表達(dá)基因识啦,然后再基于差異表達(dá)基因來聚類分析等负蚊。
2)非監(jiān)督方法神妹。又可細(xì)分為:
(i) 基于highly variable genes (HVG) ;
(ii) 基于spike-in家妆,如scLVM (Buettner et al., 2015)和BASiCS (Vallejos et al., 2015)等鸵荠;
(iii)基于 dropout,如M3Drop (Andrews and Hemberg, 2018)伤极。
參考:https://www.cnblogs.com/aipufu/articles/11470334.html
-
Part 1. Proteomic and metabolomic profiling of COVID-19 sera
蛋白質(zhì)組共鑒定到894個(gè)蛋白和941個(gè)代謝物蛹找,查看QC樣本CV,及UMAP降維后樣本分布情況哨坪。
-
Part 2. Identification of severe patients using machine learning
選取部分?jǐn)?shù)據(jù)蛋白質(zhì)組及代謝組數(shù)據(jù)作為訓(xùn)練集進(jìn)行隨機(jī)森林機(jī)器學(xué)習(xí)來區(qū)分重癥新冠患者庸疾,找到29個(gè)重要的變量,包括22個(gè)蛋白和7個(gè)代謝物当编。用建立好的模型對另外10個(gè)做驗(yàn)證届慈。
-
Part 3. Proteomic and metabolomic changes in severe COVID-19 sera
新冠患者與非新冠患者共105個(gè)差異蛋白和373個(gè)差異代謝物,其中有93個(gè)蛋白和204個(gè)代謝物與新冠的嚴(yán)重程度相關(guān)忿偷;93個(gè)差異蛋白主要富集在activation of the complement system, macrophage function and platelet degranulation三條信號通路中金顿,包括50個(gè)蛋白;相應(yīng)的鲤桥,代謝物中82個(gè)在上述三條信號通路中揍拆。文章剩余部分便是對這三條信號通路進(jìn)行具體闡述了,不再一一介紹茶凳。
4. 后記
整篇文章的分析并不復(fù)雜嫂拴,整體思路:QC(數(shù)據(jù)可信)?? 機(jī)器學(xué)習(xí)區(qū)分患者與正常人(分組)?? 差異蛋白o(hù)r代謝物,尤其是與疾病嚴(yán)重程度相關(guān)的差異蛋白贮喧。(差異分子)??pathway 分析顷牌,闡述疾病的主要病理特征。
從蛋白質(zhì)組學(xué)的數(shù)據(jù)來看塞淹,這文章選取的差異倍數(shù)并不是很大窟蓝,log(fold change)=1/4(可能是考慮到16標(biāo)的壓縮效應(yīng),我的10標(biāo)都選的是1.2倍),最好有另外一種技術(shù)進(jìn)行驗(yàn)證才更有說服力运挫;機(jī)器學(xué)習(xí)的樣本數(shù)較少状共。