hello胸哥,大家好紫岩,今天我們來(lái)分享一個(gè)很好的做細(xì)胞通訊的分析軟件,在原有細(xì)胞通訊軟件的基礎(chǔ)上更上一層吝镣,利用單細(xì)胞數(shù)據(jù)從頭構(gòu)建信號(hào)網(wǎng)絡(luò)堤器,今天我們就開(kāi)參透它,看看這個(gè)軟件主要的功能和適用環(huán)境末贾。
其實(shí)關(guān)于細(xì)胞通訊的軟件已經(jīng)分享了很多了闸溃,每個(gè)軟件都有其特點(diǎn)和優(yōu)劣勢(shì),這里列舉出來(lái)拱撵,大家有興趣的可以參考
10X單細(xì)胞(10X空間轉(zhuǎn)錄組)通訊分析之NicheNet
10X單細(xì)胞(10X空間轉(zhuǎn)錄組)通訊分析CellChat之多樣本通訊差異分析
10X單細(xì)胞(10X空間轉(zhuǎn)錄組)通訊分析之CellChat
10X單細(xì)胞通訊分析之scMLnet(配受體與TF辉川,差異基因(靶基因)網(wǎng)絡(luò)通訊分析)
10X單細(xì)胞之細(xì)胞通訊篇章-----Connectome
10X單細(xì)胞通訊分析之CrosstalkR(特異性和通訊強(qiáng)度的變化都很重要)
10X單細(xì)胞通訊分析之ICELLNET
10X空間轉(zhuǎn)錄組通訊分析章節(jié)3
空間通訊分析章節(jié)2
10X空間轉(zhuǎn)錄組做細(xì)胞通訊的打開(kāi)方式
細(xì)胞通訊軟件RNAMagnet
單細(xì)胞數(shù)據(jù)細(xì)胞通訊分析軟件NATMI
好了,開(kāi)始我們今天的分享拴测,文章在CytoTalk: De novo construction of signal transduction networks using single-cell RNA-Seq data乓旗,今年剛才發(fā)表于Science Advances,影響因子13分集索,我們先來(lái)看看文章的內(nèi)容寸齐,最后看一看示例代碼
Abstract
Single-cell technology has opened the door for studying signal transduction in a complex tissue at unprecedented resolution. However, there is a lack of analytical methods for de novo construction of signal transduction pathways using single-cell omics data.(一開(kāi)始就拋出問(wèn)題,從頭構(gòu)建信號(hào)轉(zhuǎn)導(dǎo))抄谐,于是作者就開(kāi)發(fā)了一個(gè)新的方法渺鹦,CytoTalk
- CytoTalk first constructs intracellular and intercellular(細(xì)胞內(nèi)和細(xì)胞間) gene-gene interaction networks(這里指配受體) using an information-theoretic measure between two cell types。
-
Candidate signal transduction pathways in the integrated network are identified using the prizecollecting Steiner forest algorithm.(信號(hào)識(shí)別蛹含,這個(gè)算法我們?cè)诜椒ㄖ锌匆幌拢?/strong>
We applied CytoTalk to a single-cell RNA-Seq data set on mouse visual cortex and evaluated predictions using high-throughput spatial transcriptomics data generated from the same tissue.(這個(gè)地方注意毅厚,10X單細(xì)胞和10X空間轉(zhuǎn)錄組的數(shù)據(jù)都用到了),Compared to published methods, genes in our inferred signaling pathways have significantly higher spatial expression correlation only in cells that are spatially closer to each other, suggesting improved accuracy of CytoTalk(嗯浦箱,效果不錯(cuò)吸耿,挑選出來(lái)的配受體有明顯的空間區(qū)域性,配受體在空間上都是在鄰近區(qū)域交流酷窥,很贊)咽安,F(xiàn)urthermore, using single-cell RNA-Seq data with receptor gene perturbation, we found that predicted pathways are enriched for differentially expressed genes between the receptor knockout and wild type cells, further validating the accuracy of CytoTalk(這部分在結(jié)果中看看),In summary, CytoTalk enables de novo construction of signal transduction pathways and facilitates comparative analysis of these pathways across tissues and conditions.
Introduction蓬推,這部分我們提煉一下
- Signal transduction is the primary mechanism for cell-cell communication
- Signaling pathways are highly dynamic and crosstalk among them is prevalent.(信號(hào)通路是高度動(dòng)態(tài)的妆棒,并且它們之間的串?dāng)_很普遍。 )。
重點(diǎn)來(lái)了糕珊,Due to these two features, simply examining expression levels of ligand and receptor genes cannot reliably capture the overall activities of signaling pathways and interactions among them动分。這里提到了[### NicheNet,配受體和靶基因的網(wǎng)絡(luò)分析红选。 -
這些方法的缺陷澜公,However, these methods are based on known annotations of signaling pathways.
To our knowledge, currently no method exists to perform de novo prediction of the entire signal transduction pathways emanating from the ligand-receptor pairs.,這個(gè)思路跟我之前分享的文章 10X單細(xì)胞通訊分析之scMLnet(配受體與TF喇肋,差異基因(靶基因)網(wǎng)絡(luò)通訊分析)應(yīng)該是一樣的坟乾。
Here we describe the CytoTalk algorithm for de novo construction of signaling network (union of multiple signaling pathways) between two cell types using scRNASeq data. - The algorithm first constructs an integrated network consisting of intracellular and inter-cellular functional gene interactions.
- It then identifies the signaling network by solving a prize-collecting Steiner forest problem.(這個(gè)專有名詞我們?cè)诜椒ㄖ薪榻B)。
- We demonstrate the performance of the algorithm using high throughput spatial transcriptomics(空間轉(zhuǎn)錄組數(shù)據(jù)) data and scRNA-Seq data(單細(xì)胞數(shù)據(jù)) with perturbation(攝動(dòng); 微擾) to the receptor genes in a signaling pathway蝶防。
Results
結(jié)果1 甚侣、 Wiring of signaling pathways is highly cell type-dependent 信號(hào)通路的"接線"與細(xì)胞類型高度相關(guān)
A hallmark of signal transduction pathways is their high level of cell-type specific wiring pattern.(hallmark
大家應(yīng)該不陌生吧),Single-cell transcriptome data allows us to examine the cell typespecific activity of individual signaling pathways beyond just ligand and receptor genes.(這個(gè)地方大家注意一下慧脱,信號(hào)通路的活性高低是可以通過(guò)富集的方式計(jì)算出來(lái),但是某個(gè)信號(hào)通路表達(dá)水平高低的受到配受體信號(hào)的調(diào)控)贺喝。To this end, we examined the canonical fibroblast growth factor receptor 2 (FGFR2) signaling pathway in two tissue types, mammary gland and skin.(為此菱鸥,我們檢查了乳腺和皮膚兩種組織中的典型成纖維細(xì)胞生長(zhǎng)因子受體2(FGFR2)信號(hào)傳導(dǎo)途徑。 看來(lái)讀文獻(xiàn)對(duì)英文水平也很有幫助哈 ??)躏鱼,我們就不著重介紹這個(gè)生理過(guò)程了氮采,看軟件帶給了我們什么,我們需要知道的是一些受體的激活染苛,導(dǎo)致了一些通路基因的上調(diào)鹊漠,從而改變了一些生物學(xué)的功能。
對(duì)于一個(gè)公共的單細(xì)胞數(shù)據(jù)茶行,這個(gè)數(shù)據(jù)當(dāng)然是進(jìn)行注釋過(guò)的躯概,計(jì)算表達(dá)特意分?jǐn)?shù),preferential expression measure (PEM) (有關(guān)PEM的計(jì)算我們?cè)诜椒ㄖ杏懻?strong>)畔师,for each pathway gene in each involved cell type娶靡,發(fā)現(xiàn)同一受體(FGFR2)下游的四個(gè)典型亞通路顯示驚人的細(xì)胞類型特異性活性,具體取決于所涉及的細(xì)胞類型看锉。 那也就是說(shuō)姿锭,其實(shí)對(duì)于相同的受體,不同細(xì)胞類型激活的信號(hào)通路上是有差別的伯铣,The PI3K/AKT pathway is most active for signaling between fibroblasts and luminal epithelial cells in the mammary gland. In contrast, The JAK-STAT pathway is most active for signaling between keratinocyte stem cells and basal cells in skin.
To evaluate the extent of cell type-specific wiring of signaling pathways, we examined all manually annotated signaling pathways in the Reactome database呻此。For each pathway, we computed its cell type-specific activity score。We found that the majority of pathways exhibit high degree of cell typespecific activities(這個(gè)我感覺(jué)應(yīng)該就是這樣的吧腔寡,不算什么新的發(fā)現(xiàn))焚鲜。
This is true even for the same cell types but located in different tissues(這個(gè)地方是需要格外注意的),In summary, these results highlight the need for analytical tools for de novo construction of complete signaling pathways (instead of ligand-receptor pairs) using single-cell transcriptome data.(確實(shí)是這樣)。
結(jié)果2 Overview of the CytoTalk algorithm 我們提煉一下
CytoTalk is designed for de novo construction of a signal transduction network between two cell types恃泪,which is defined as the union of multiple signal transduction pathways.
- It first constructs a weighted integrated gene network comprised of both intracellular and intercellular functional gene-gene interactions(也就是配受體網(wǎng)絡(luò))郑兴。Intracellular functional gene interactions are computed and weighted using mutual information(共同信息) between two genes.Two intracellular networks are connected via crosstalk edges。Ligand-receptor pairs with higher cell-type-specific(細(xì)胞類型特異性) gene expression but lower correlated expression within the same cell type (thus more likely to be involved in crosstalk instead of self talk) are assigned higher crosstalk weights.(這個(gè)地方重點(diǎn)理解一下贝乎,一個(gè)配體或者受體gene隨便表達(dá)水平較低情连,但是細(xì)胞類型特異性很強(qiáng),說(shuō)明這個(gè)gene參與了網(wǎng)絡(luò)的CrossTalk览效,不可能是自身隨意產(chǎn)生却舀,這種情況給予更高的權(quán)重,很合理)锤灿。集成網(wǎng)絡(luò)中的節(jié)點(diǎn)通過(guò)其細(xì)胞類型特定的基因表達(dá)和與網(wǎng)絡(luò)中配體/受體基因的接近程度相結(jié)合來(lái)加權(quán)挽拔。 (看來(lái)涉及到很多的算法了),We use a network propagation procedure to determine the closeness of a gene to the ligand/receptor gene.With the integrated network as the input, we formulate the identification of signaling network as a prizecollecting Steiner forest (PCSF) problem(這個(gè)地方很陌生但校,大家可以參考文章PRODIGY: personalized prioritization of driver genes)螃诅。使用PCSF算法的基本原理是找到一個(gè)最佳子網(wǎng)絡(luò),其中包括具有高水平細(xì)胞類型特異性表達(dá)并與高得分配體-受體對(duì)緊密相連的基因状囱。(我們需要知道這個(gè))This optimal subnetwork is defined as the signaling network between the two cell types. The statistical significance of the candidate signaling network is computed using a null score distribution of signaling networks generated using degreepreserving randomized networks.(顯著性檢驗(yàn)术裸,這部分結(jié)果需要在方法中重點(diǎn)關(guān)注一下了)。
結(jié)果3 Performance evaluation using spatial transcriptomics data(用到小鼠皮層的數(shù)據(jù))
We identified signaling networks between the three pairs of cell types, endothelial-microglia (EndoMicro), endothelial-astrocyte (EndoAstro) and astrocyte-neuron (AstroNeuro), respectively亭枷。The predicted cell-type-specific signaling networks consist of 481, 404, and 1051 genes and involves 51, 44, and 35 ligand-receptor interactions (crosstalk edges), respectively袭艺。Compared to PCSFs identified using 1000 randomized input networks(置換檢驗(yàn)), all predicted signaling networks have significantly smaller objective function scores and larger fractions of crosstalk edges (empirical p-values < 0.001)
Several predicted ligandreceptor pairs are known to mediate signal transduction between the three cell types.
接下來(lái)借助空間數(shù)據(jù),這個(gè)時(shí)候的網(wǎng)絡(luò)會(huì)考慮到的細(xì)胞之間的距離
Our rationale is that cells that are close together are more likely to signal to each other.(這個(gè)在10X空間轉(zhuǎn)錄組上也是同樣適用)叨粘。因此猾编,signaling pathway genes are expected to have higher spatial expression correlation in these cells than cells that are further apart.
首先是方法之間的比較
we first asked what fractions of the predicted ligand-receptor pairs are shared among the six methods.(六個(gè)方法共同預(yù)測(cè)的配受體對(duì))。We reason that a more accurate method will have on average a larger fraction of overlapped predictions with all other methods(按照這個(gè)說(shuō)法升敲,作者的軟件最好 ??)
然后是對(duì)空間數(shù)據(jù)的研究發(fā)現(xiàn)答倡,鄰近的細(xì)胞類型更容易發(fā)生交流,距離遠(yuǎn)的細(xì)胞交流較少驴党,其他的方法越?jīng)]有這樣的特點(diǎn)苇羡。
However, pathways predicted by NicheNet and SoptSC also show significantly larger PCCs compared to random gene pairs among intermediate and distant cell pairs, suggesting that those predictions are false positive predictions.
Taken together, these results demonstrate that CytoTalk has significant improvement over published methods.
結(jié)果4 Performance evaluation using scRNA-Seq data without receptor gene expression(受體基因被敲除)。
這種條件下鼻弧, 作者發(fā)現(xiàn)了新的信號(hào)通路设江,當(dāng)然了,作者的軟件預(yù)測(cè)準(zhǔn)確性最高攘轩。
Discussion
We introduce a computational method, CytoTalk, for the construction of cell-typespecific signal transduction pathways using scRNA-Seq data.The input to CytoTalk are scRNA-Seq data and known ligand-receptor interactions. Unlike previous methods using known pathway annotations , CytoTalk constructs full pathways .
反正效果就是好叉存。
In summary, CytoTalk provides a much-needed means for de novo construction of complete cell-type-specific signaling pathways. Comparative analysis of signaling pathways will lead to a better understanding of cell-cell communication in healthy and diseased tissues.
Method
方法1 Construction of intracellular functional gene interaction network
基因共表達(dá)網(wǎng)絡(luò),成對(duì)基因之間的關(guān)系度帮,算法比較陌生歼捏,大家可以查一下
2稿存、Crosstalk score of a ligand-receptor pair between two cell types
define a crosstalk score between gene i in cell type A and gene j in cell type B as below. Genes i and j encode a ligand and a receptor or vice versa.
3、Construction of an integrated network between two cell types
我們構(gòu)建了一個(gè)集成的網(wǎng)絡(luò)瞳秽,該網(wǎng)絡(luò)由通過(guò)已知的配體-受體相互作用連接的兩個(gè)細(xì)胞內(nèi)網(wǎng)絡(luò)組成瓣履。 We collected 1,941 manually annotated ligand-receptor interactions,if the ligand gene and the receptor gene are present in the two intracellular networks, we connect them and denote the edge as a crosstalk edge.
4练俐、重點(diǎn) De novo identification of signaling network between two cell types
We formulate the identification of a signaling network between two cell types as a prize-collecting Steiner forest (PCSF) problem. Because the forest is a disjoint set of trees, PCSF problem is a generalization of the classical prize-collecting Steiner tree (PCST) problem. The individual signaling pathways are represented as trees, the collection of which (forest) represents the entire signaling network between two cell types.
We define edge costs and node prizes in the integrated network as follows. The z-score normalized edge weights of the integrated network are first scaled to the range of [0, 1]. Edge cost is then defined as 1 ? ???????????????????????????????. Node prize is defined based on both PEM value of a gene and its closeness to the ligand/receptor genes in the network in order to identify signaling networks centered around the crosstalk edges. To capture the closeness, we use a network propagation procedure to calculate a relevance coefficient for each gene in an intracellular network.
where ???????????????????? is the relevance coefficient vector for all genes in the intracellular network at iteration t. ???????????????????? is the initial value of the relevance coefficient vector such that ??????????????????2(??) = 1 if gene i is a ligand or receptor. Otherwise, ??????????????????2(??) = 0. ??′ is a normalized edge weight matrix for an intracellular network, which is defined as ??3 = ??/??/??????/??/??. Here, W is set to the original mutual information matrix and D is defined as a diagonal matrix such that ??(??, ??) is the sum of row i of the matrix W. This network propagation procedure is equivalent to a random walk with restart on the network. ?? is a tuning parameter that controls the balance between prior information (known ligands or receptors) and network smoothing. Node prize of a gene is defined as the product of its PEM value and the relevance coefficient to capture both the cell-type-specificity and the closeness of this gene to the ligand or receptor gene in the network. To avoid extremely large node prizes for ligand or receptor genes, we used ?? = 0.9 in this study.
The PCSF algorithm identifies an optimal forest in a network that maximizes the total amount of node prizes and minimizes the total amount of edge costs in the forest. While PCSF problem is NP-hard and often needs a high computational cost, we employ a PCSF formulation established in and use a highly efficient prizecollecting Steiner tree (PCST) algorithm to identify the PCSF. The objective function of the PCSF problem is defined as below.
where F represents a forest (i.e. multiple disconnected trees) in the integrated network. ??(??) denotes the sum of edge costs in the forest F and ??(?? c ) denotes the sum of node prizes of the remaining subnetwork excluding the forest F from the network. We modify the integrated network by introducing an artificial node and a number of artificial edges to the original network. The artificial edges connect the artificial node to all genes in the original network. The costs of all artificial edges are the same and are defined as ??, which influences the number of trees, k, in the resulting PCSF. ?? is a parameter for balancing the edge costs and node prizes, which influences the size of the resulting PCSF. By tuning parameters ?? and ??, multiple PCSTs can be identified with the artificial node as the root node. For each identified PCST, a PCSF can be obtained by removing the artificial node and artificial edges from the PCST.
We identify the signaling network between two cell types by searching for a robust PCSF across the full parameter space . For each identified PCSF, we compute the occurrence of each edge in all identified PCSFs to construct a background distribution of edge occurrence frequency. Next, we calculate a p-value for each PCSF by comparing the edge occurrence frequency distribution of this PCSF to the distribution of all other identified PCSFs using one-sided Kolmogorov-Smirnov test. The PCSF with the minimum p-value is considered as the most robust signaling network predicted by CytoTalk.
To further evaluate the statistical significance of the identified PCSF, we construct null distributions for the objective function and for the fraction of crosstalk edges in a PCSF using 1000 null PCSFs identified from randomized integrated networks. To generated the randomized networks, we separately shuffle the edges of the two intracellular networks while preserving the node degree distribution, node prizes and crosstalk edges as the original integrated network.
算法理解起來(lái)有點(diǎn)難袖迎,頭都有點(diǎn)疼了。
我們看看示例代碼
看來(lái)腳本都已經(jīng)封好了腺晾,直接用
Input files
A comma-delimited “.csv” file containing scRNA-Seq data for each cell type under study. Each file contains the ln-transformed normalized scRNA-Seq data for a cell type with rows as genes (GENE SYMBOL) and columns as cells. The files should be named as: scRNAseq_Fibroblasts.csv, scRNAseq_Macrophages.csv, scRNAseq_EndothelialCells.csv, scRNAseq_CellTypeName.csv …
A “TwoCellTypes.txt” file indicating the two cell types between which the signaling network is predicted. Please make sure that the cell type names should be consistent with scRNA-Seq data files above.
A “LigandReceptor_Human.txt” or "LigandReceptor_Mouse.txt" file listing all known ligand-receptor pairs. The first column (ligand) and the second column (receptor) are separated by a tab (\t). Currently, 1942 and 1855 ligand-receptor pairs are provided for human and mouse, respectively.
A “Species.txt” file indicating the species from which the scRNA-Seq data are generated. Currently, “Human” and “Mouse” are supported.
A “Cutoff_GeneFilter.txt” file indicating the cutoff for removing lowly-expressed genes in the processing of scRNA-Seq data. The default cutoff value is 0.1, which means that genes expressed in less than 10% of all cells of a given type are removed.
A “BetaUpperLimit.txt” file indicating the upper limit of the test values of the algorithm parameter β, which is inversely proportional to the total number of genes in a given cell-type pair after removing lowly-expressed genes in the processing of scRNA-Seq data. Based on preliminary tests, the upper limit of β value is suggested to be 100 (default) if the total number of genes in a given cell-type pair is above 10,000. However, if the total number of genes is below 5000, it is necessary to increase the upper limit of β value to 500.
Please download "CytoTalk_package_v2.0.zip". All example input files are in the /Input/ folder and should be customized and copied into the /CytoTalk/ folder before running. The /CytoTalk/ folder can only be used ONCE for a given cell-type pair. Please use a new /CytoTalk/ folder for analysis of other cell-type pairs.
Run CytoTalk
Copy the input file-added “/CytoTalk/” folder to your working directory and execute the following script:
bash InferSignalingNetwork.sh
[Alternative way] The whole computation above may take 5.5 hours (2.3 GHz 8-Core Intel Core i9, 14 logical cores for parallel computation), of which 4 hours are used for computing pair-wise mutual information between genes in the construction of intracellular networks for the given two cell types. Considering that users may have alternative ways for constructing cell-type-specific intracellular networks, we divide the whole computation into two steps below.
bash InferIntracellularNetwork_part1.sh # around 4 hours
bash InferIntercellularNetwork_part2.sh # around 1.5 hours
The outputs of the script "part1.sh" are two comma-delimited files "IntracellularNetwork_TypeA.txt" and "IntracellularNetwork_TypeB.txt", containing the adjacency matrices of two intracellular networks for the given two cell types, respectively. These two files are the inputs of the script "part2.sh", which can generate the final predicted signaling network.
CytoTalk output
The output folder, “/CytoTalk/IllustratePCSF/”, contains a network topology file and six attribute files that are ready for import into Cytoscape for visualization and further analysis of the predicted signaling network between the given two cell types.
Network topology | Edge attribute | Node attribute |
---|---|---|
PCSF_edgeSym.sif | PCSF_edgeCellType.txt PCSF_edgeCost.txt | PCSF_geneCellType.txt燕锥,PCSF_geneExp.txt PCSF_genePrize.txt,PCSF_geneRealName.txt |
大家不妨試一下吧悯蝉, 生活很好归形,有你更好