GSEA is designed to analyze ranked lists of all available genes and does not require a threshold.
Y叔的clusterProfiler做GSEA分析超級(jí)方便~~
參數(shù)解讀:
- GSEA searches for pathways whose genes are enriched at the top or bottom of the ranked gene list, more so than expected by chance alone.
- To calculate an enrichment score (ES) for a pathway, GSEA progressively examines genes from the top to the bottom of the ranked list, increasing the ES if a gene is part of the pathway and decreasing the score otherwise.
- These running sum values are weighted, so that enrichment in the very top- (and bottom-) ranking genes is amplified, whereas enrichment in genes with more moderate ranks are not amplified.
- The ES score is calculated as the maximum value of the running sum and normalized relative to pathway size, resulting in a normalized enrichment score (NES) that reflects the enrichment of the pathway in the list. Positive and negative NES values represent enrichment at the top and bottom of the list, respectively.
- Finally, a permutation-based P value is computed and corrected for multiple testing to produce a permutation-based false-discovery rate (FDR) Q value that ranges from 0 (highly significant) to 1 (not significant)
- Resulting pathways are selected using the FDR Q value threshold (e.g., Q < 0.05) and ranked using NES. In addition, the ‘leading edge’ aspect of the GSEA analysis identifies specific genes that most strongly contribute to the detected enrichment signal of a pathway.
從以上參數(shù)解讀了解到儒旬,結(jié)果可通過(guò) FDR q-value篩選滑负,利用NES進(jìn)行排序解讀毒涧。
# Select gene set form msigdf
msigdf.human <-msigdf::msigdf.human
GeneS<-msigdf.human[grepl("LYSOSOM",msigdf.human$geneset),]%>% select(geneset, symbol) %>% as.data.frame
# gene.rank is a data.frame, which contains Gene and FC (Fold-Change)
gene.rank
gene.sort <- arrange(gene.rank, desc(logFC)) # 降序排列
geneList1<-gene.sort$FC
names(geneList1)<-gene.sort$Gene # 命名
# GSEA
gsea<- GSEA(geneList, TERM2GENE = GeneS, verbose=FALSE, pvalueCutoff = 0.5)
# Note: 1. 結(jié)果是不斷變的誉碴,原因是by="fgsea",可以將nPerm設(shè)置的大一些,默認(rèn)是1000.
# 2. 原始文獻(xiàn)中的選擇:p value < 0.05诸蚕, FDR < 0.25,可結(jié)合自己需求調(diào)整。
# 可視化第一個(gè)結(jié)果氧猬,gseaplot2中by函數(shù)可以選擇部分結(jié)果展示背犯,如"preranked".
gseaplot2(gsea,geneSetID=1,title=gsea$Description[1],pvalue_table=T)
# geneSetID多選擇幾個(gè),即可實(shí)現(xiàn)多個(gè)gene set的可視化盅抚。