今天我們繼續(xù)深入分析NMF青瀑,看看NMF到底能帶給我們什么。參考文獻(xiàn)是Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer,發(fā)表于cell冰寻,頂級(jí)期刊蛇券,這里我們只關(guān)心NMF分析的部分缔恳。當(dāng)然,文章的其他部分也很重要病袄,我們后續(xù)分享搂赋。
首先來看作者進(jìn)行NMF分析的目的。
Intra-tumoral Expression Heterogeneity of the Malignant Compartment陪拘。(研究腫瘤細(xì)胞的異質(zhì)性)厂镇。
We next explored how expression states varied among different malignant cells within the same tumor, focusing on ten tumors from which the largest numbers of malignant cell transcriptomes were acquired.(不同病人相同腫瘤)。We used non-negative matrix factorization to uncover coherent sets of genes that were preferentially co-expressed by subsets of malignant cells(這個(gè)地方就采用了NMF的方法左刽,來識(shí)別惡性細(xì)胞亞群優(yōu)先共表達(dá)的相關(guān)基因集捺信,類似于WGCNA,但是原理目的都不一樣欠痴,注意區(qū)分 )迄靠。For example, we defined six gene signatures that vary among malignant cells of MEEI25(例子,在癌變細(xì)胞亞群中NMF找到的特異基因模塊)喇辽。
Applying the approach to each of the ten tumors defined a total of 60 gene signatures that coherently vary across individual cells in at least one tumor掌挚。Next, we used hierarchical clustering to distill these 60 signatures into meta-signatures that reflect common expression programs that vary within multiple tumors(識(shí)別的基因模塊進(jìn)行層次聚類)。
The high concordance between signatures from different tumors suggests that they reflect common patterns of intra-tumoral expression heterogeneity.(這是聚類的意義)菩咨。
接下來就是對(duì)識(shí)別到的模塊基因的特征進(jìn)行分析了吠式,Seven expression programs were preferentially expressed by subsets of malignant cells in at least two tumors.來看NMF的主要作用就是用來識(shí)別腫瘤內(nèi)部不同亞群之間的特異基因模塊陡厘。非常重要,一定要認(rèn)真學(xué)習(xí)并學(xué)以致用特占。
接下來我們看一看具體的方法糙置。
For each of the 10 tumors, non-negative matrix factorization (as implemented by the MATLAB nnmf function, with the number of factors set to 10(注意這里的因子數(shù),跟樣本一致是目,這個(gè)地方需要格外注意谤饭,視情況可以進(jìn)行人為的劃分)) was used to identify variable expression programs。NNMF was applied to the relative expression values (Er), by transforming all negative values to zero.(這個(gè)地方大家要對(duì)比NMF和我們常用的PCA的區(qū)別)懊纳。Notably, undetected genes include many drop-out events (genes that are expressed but are not detected in particular cells due to the incomplete transcriptome coverage), which introduce challenges for normalization of single-cell RNA-seq; since NNMF avoids the exact normalized values of undetected genes (as they are all zero), it may be beneficial in analysis of single-cell RNA-seq (data not shown).
(這個(gè)地方也是非常重要的揉抵,NMF相對(duì)于PCA的優(yōu)勢)。We retained only programs for which the standard deviation in cell scores within the respective tumor was larger than 0.8, which resulted in a total of 60 programs across the 10 tumors.(特征的提取也是一門學(xué)問)嗤疯。The 60 programs were compared by hierarchical clustering (data not shown), using one minus the Pearson correlation coefficient over all gene scores as a distance metric.(如上圖)冤今,Six clusters of programs were identified manually and used to define meta-signatures. For each cluster, NNMF gene scores were log2-transformed and then averaged across the programs in the cluster, and genes were ranked by their average scores。The top 30 genes for each cluster were defined as the meta signature that was used to define cell scores
(這個(gè)地方是精髓茂缚,有的文章是提取前50個(gè)基因)辟汰。each of those genes had average scores above 1 and a t test p value below 0.05, based ontheir scores across the individual programs in the cluster. Since the number of programs in a cluster was small this analysis was not powered to correct for multiple testing and thus we refer to an uncorrected p value and selected the top ranked genes。However, while confidence is difficult to establish for individual genes in each meta-program, each gene-set defined as a meta-program is highly significant in its co-variation in tumors. For each of the meta-programs, and within each of the tumors included in those meta-programs (2-8 tumors for each meta-program), the average Pearson correlation between all pairs of genes included in the gene-set (calculated across single malignant cells from the respective tumor) was higher than that obtained for 10,000 control gene-sets, which were selected to reproduce the overall distribution of expression levels of the meta-program genes(分?jǐn)?shù)的求解過程我們下一篇詳細(xì)分享)阱佛。
To show the robustness of the NNMF-derived programs with regards to the number of NNMF factors, we repeated the NNMF analysis with the number of factors between 5 and 15(這個(gè)迭代過程也很重要)。Each of the seven meta-signatures was robustly identified with each of the NNMF parameters.
想深入學(xué)習(xí)生物信息的同學(xué)戴而,基礎(chǔ)的數(shù)學(xué)知識(shí)還是要掌握一些的凑术。
生活很好,等你超越