10X單細(xì)胞 & 10XATAC聯(lián)合之調(diào)控網(wǎng)絡(luò)分析（IReNA）

hello富纸，大家好毁欣，這一篇給大家?guī)淼氖?0X單細(xì)胞和10XATAC聯(lián)合進(jìn)行網(wǎng)絡(luò)調(diào)控的分析谤辜，也很經(jīng)典，參考的文獻(xiàn)在IReNA: integrated regulatory network analysis of single-cell transcriptomes董习，我們看看分析的方法和內(nèi)容,關(guān)于10X單細(xì)胞聯(lián)合ATAC分析調(diào)控網(wǎng)絡(luò)的內(nèi)容，大家可以參考我之前的文章10X單細(xì)胞（10X空間轉(zhuǎn)錄組）基因調(diào)控網(wǎng)絡(luò)分析之Pando

圖片.png

好了爱只，開始分享

IReNA（Integrated Regulatory Network Analysis）是一個用于執(zhí)行regulatory network analysis的 R 包皿淋。 IReNA 包含兩種重建基因調(diào)控網(wǎng)絡(luò)的方法。第一種是單獨(dú)使用單細(xì)胞 RNA 測序 (scRNA-seq) 數(shù)據(jù)恬试。第二個是使用測序 (ATAC-seq) 整合 scRNA-seq 數(shù)據(jù)和來自 Assay for Transposase Accessible Chromatin 的染色質(zhì)可及性概況窝趣。 IReNA 執(zhí)行模塊化調(diào)控網(wǎng)絡(luò)，以揭示模塊之間的關(guān)鍵轉(zhuǎn)錄因子和重要調(diào)控關(guān)系训柴，提供有關(guān)調(diào)控機(jī)制的生物學(xué)見解哑舒。

For users who want to use ATAC-seq data to refine regulatory relationships, Computer or server of linux system and the following software are needed: samtools, bedtools and fimo.

安裝

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.12")
BiocManager::install(c('Rsamtools', 'ChIPseeker', 'monocle',
                       'RcisTarget', 'RCy3', 'clusterProfiler'))
install.packages("devtools")
devtools::install_github("jiang-junyao/IReNA")
library(IReNA)

Workflow

圖片.png

Inputs of IReNA

如果僅使用 scRNA-seq 或大量 RNA-seq 數(shù)據(jù)運(yùn)行 IReNA，則需要以下輸入：(2) scRNA-seq 或bulk RNA-seq 數(shù)據(jù)的Raw counts和 (3) 物種的參考基因組 (4) GTF 文件 (5) 轉(zhuǎn)錄因子的Motif information幻馁。

If both ATAC-seq data and scRNA-seq data are used to perform IReNA, the following files are needed: (1) Bam, peak and footprint files of ATAC-seq data (2) Raw counts of scRNA-seq or bulk RNA-seq data (3) Reference genome of the species (5) Motif information of transcription factors.

(1) Bam, peak and footprint files of ATAC-seq data
(2) Seurat object and monocle object / Raw counts of scRNA-seq or bulk RNA-seq data
對于分析過seurat對象和monocle對象的文件洗鸵，IReNA提供了將monocle對象的偽時間添加到seurat對象的meta數(shù)據(jù)中的功能（該功能僅支持monocle2的monocle對象）。然后仗嗦，繼續(xù)下一步膘滨。

###Add pseudotime to the Seurat object
seurat_with_time <- add_pseudotime(seurat_object, monocle_object)

對于只有raw counts的分析文件，IReNA 提供了“l(fā)oad_counts”函數(shù)來加載 scRNA-seq 數(shù)據(jù)的raw counts稀拐，并返回 seurat 對象火邓。如果數(shù)據(jù)是 10X 格式，設(shè)置參數(shù)‘datatype = 0’德撬。如果數(shù)據(jù)為普通計數(shù)格式（'txt'為文件名后綴）铲咨，則設(shè)置參數(shù)'dayatype =1'。

### load 10X counts
seurat_object <- load_counts('10X_data/sample1/', datatype = 0)
### load normal counts
seurat_object <- load_counts('test_data.txt',datatype = 1)

如果使用bulk RNA-seq 數(shù)據(jù)來識別基本的調(diào)控關(guān)系蜓洪，只需加載bulk RNA-seq 數(shù)據(jù)的raw counts鸣驱，然后使用與 scRNA-seq 數(shù)據(jù)相同的代碼進(jìn)行進(jìn)一步分析。 建議只加載差異表達(dá)基因的表達(dá)譜蝠咆，否則將花費(fèi)更多的時間來計算每個基因?qū)Φ南嚓P(guān)性踊东。 Deseq2 和 edgeR 可用于識別bulk RNA-seq 數(shù)據(jù)中差異表達(dá)的基因北滥。

(3) Reference genome of the species
(4) GTF file
Gene transfer format, you can download it from http://www.ensembl.org/info/data/ftp/index.html
(5) Motif information of transcription factors
IReNA 包含源自 TRANSFAC 版本 201803 的四種物種（智人、Mus musculus闸翅、斑馬魚和雞）的 DNA 模體數(shù)據(jù)集再芋。以下代碼用于從 TRANSFAC 或user定義的模體數(shù)據(jù)集中獲取模體數(shù)據(jù)集，其格式應(yīng)與這些相同來自 TRANSFAC 數(shù)據(jù)庫坚冀。

library(IReNA)
###call Mus musculus motif database
motif1 <- Tranfac201803_Mm_MotifTFsF
###call Homo sapiens motif database
motif1 <- Tranfac201803_Hs_MotifTFsF
###call Zebrafish motif database
motif1 <- Tranfac201803_Zf_MotifTFsF
###call Chicken motif database
motif1 <- Tranfac201803_Ch_MotifTFsF

以下內(nèi)容包含4個部分

IReNA 包含四個主要部分來重建監(jiān)管網(wǎng)絡(luò)：

第 1 部分：分析 scRNA-seq 或bulk RNA-seq 數(shù)據(jù)以獲得基本的調(diào)控關(guān)系
第 2 部分：使用 Fimo（或替代選項：RcisTarget）refine regulatory relaionships（無 ATAC-seq 數(shù)據(jù)）
第 3 部分：分析 ATAC-seq 數(shù)據(jù)以refine regulatory relationships（使用 ATAC-seq 數(shù)據(jù)）
第 4 部分：監(jiān)管網(wǎng)絡(luò)分析和可視化

圖片.png

Part 1: Analyze scRNA-seq or bulk RNA-seq data to get basic regulatory relationships

IReNA 支持三種輸入格式：

scRNA-seq 或bulk RNA-seq 數(shù)據(jù)的raw count济赎。 raw count可以通過 IReNA 中的“l(fā)oad_counts”函數(shù)加載并轉(zhuǎn)換為 Seurat 對象；
包含raw count的 Seurat 對象记某。數(shù)據(jù)加載完畢后司训，IReNA會使用R包monocle計算pseudotime，并將pseudotime添加到Seurat對象的meta數(shù)據(jù)中液南。
meta數(shù)據(jù)中帶有偽時間的 Seurat 對象壳猜。

Parameter ‘gene.use’ in ‘get_pseudotime’ function indicate the genes use to calculate pseudotime, if it’s null, this function will automatically use variable genes calculated by ‘FindVariableFeatures’ function in Seurat package to calculate pseudotime.

###Read seurat_object
seurat_object <- readRDS('seurat_object.rds')
###calculate the pseudotime and return monocle object
monocle_object <- get_pseudotime(seurat_object,gene.use)
###Add pseudotime to the Seurat object
###This function only support monocle object from monocle2
seurat_with_time <- add_pseudotime(seurat_object, monocle_object)

Next, use differentially expressed genes (DEGs) across the pseudotime to refine the seurat object, if you already have identified DEGs, you just need to run subset function in seurat:

### DEGs used here is the character class
seurat_with_time <- subset(seurat_with_time, features = DEGs)

I also recommend ‘differentialGeneTest’ function in monocle to calculate DEGs across pseudotime. DEGs will be used to make expression profile in further analysis. (If you use our test data, you can skip this part, because our test data only contains differentially expressed genes)

### identify DEGs across pseudotime (qvalue < 0.05 and num_cells_expressed > 0.1)
library(monocle)
monocle_object <- detectGenes(monocle_object, min_expr = 3)
monocle_object <- estimateDispersions(monocle_object)
diff1 <- monocle::differentialGeneTest(mo,fullModelFormulaStr = "~Pseudotime",relative_expr = TRUE)
sig_genes <- subset(diff1, qval < 0.05)
sig_genes <- subset(sig_genes, num_cells_expressed > 0.1)
### Use DEGs to refine seurat object
seurat_with_time <- subset(seurat_with_time, features = rownames(sig_genes))

Then, cells are divided into 50 bins across pseudotime. The bin is removed if all genes in this bin have no expression. The gene is filtered if absolute fold change < 0.01 (set by the parameter ‘FC’). Then, genes will be clustered through K-means algorithm (the number of clusters ‘K’ is set by the parameter ‘K1’).
If bulk RNA-seq data are used to identify regulatory relationships, load your expression matrix as expression_profile that generated by get_SmoothByBin_PseudotimeExp(), then run the following codes. I suggest to input expression profile that only contains differentially expressed genes, or you will a huge of time to calculate correlation of each gene pair.

###Get expression profiles ordered by pseudotime
expression_profile <- get_SmoothByBin_PseudotimeExp(seurat_with_time, Bin = 50)
###Filter noise and logFC in expression profile
expression_profile_filter <- fileter_expression_profile(expression_profile, FC=0.01)
###K-means clustering
clustering <- clustering_Kmeans(expression_profile_filter, K1=4)
clustering[1:5,1:5]
##        KmeansGroup FoldChangeQ95     SmExp1     SmExp2     SmExp3
## TCEB3            1      2.395806 -0.2424532 -0.8964990 -0.9124960
## CLK1             1      2.508335 -0.1819044  0.7624798  0.4867972
## MATR3            1      2.700294 -1.4485729  0.7837425  0.3028892
## AKAP11           1      2.415084 -0.6120681 -0.3849580  0.3898393
## HSF2             1      2.528111 -0.8125698 -0.6166004  0.8533309

Visualize the clustering of gene expression profiles through the heatmap

plot_kmeans_pheatmap(clustering,ModuleColor1 = c('#67C7C1','#5BA6DA','#FFBF0F','#C067A9'))

圖片.png

由于Kmeans算法的特點(diǎn)，每次聚類都會得到不同的結(jié)果滑凉。

在第一列中添加基因的Ensmble ID统扳，然后計算每個基因?qū)Φ南嚓P(guān)性并選擇包含至少一個轉(zhuǎn)錄因子且絕對相關(guān)性> 0.4（由參數(shù)'correlation_filter'設(shè)置）的基因?qū)Α?‘correlation_filter’的值取決于數(shù)據(jù)的噪聲，數(shù)據(jù)噪聲越大畅姊，需要越大的‘correlation_filter’來獲得可信的相關(guān)關(guān)系咒钟。建議將 p 值 < 0.05 的相關(guān)性用于參數(shù)“correlation_filter”。

###Add Ensembl ID as the first column of clustering results
Kmeans_clustering_ENS <- add_ENSID(clustering, Spec1='Hs')
Kmeans_clustering_ENS[1:5,1:5]
##                 Symbol KmeansGroup FoldChangeQ95     SmExp1     SmExp2
## ENSG00000011007  TCEB3           1      2.395806 -0.2424532 -0.8964990
## ENSG00000013441   CLK1           1      2.508335 -0.1819044  0.7624798
## ENSG00000015479  MATR3           1      2.700294 -1.4485729  0.7837425
## ENSG00000023516 AKAP11           1      2.415084 -0.6120681 -0.3849580
## ENSG00000025156   HSF2           1      2.528111 -0.8125698 -0.6166004
### Caculate the correlation
motif1 <- Tranfac201803_Hs_MotifTFsF
### for 
regulatory_relationships <- get_cor(Kmeans_clustering_ENS, motif = motif1, correlation_filter = 0.6, start_column = 4)

Part 2: Analyze binding motifs of target genes to refine regulatory relaionships (without ATAC-seq data)

For ATAC-seq data is not available. IReNA uses fimo to check whether motifs of transcription factors that regulate the target gene occur in the upstream of target genes. If motifs of transcription factors that regulate the target gene exist, this gene pair will be retained.

First, get TSS (transcription start site) regions (default is upstream 1000 to downstream 500) of target genes. Gtf file used here is available from http://www.ensembl.org/info/data/ftp/index.html

gtf <- read.delim("D:/Homo_sapiens.GRCh38.104.gtf", header=FALSE, comment.char="#")
gene_tss <- get_tss_region(gtf,rownames(Kmeans_clustering_ENS))
head(gene_tss)
##              gene  chr    start      end
## 1 ENSG00000237973 chr1   630074   631574
## 2 ENSG00000116285 chr1  8027309  8025809
## 3 ENSG00000162441 chr1  9944407  9942907
## 4 ENSG00000116273 chr1  6612731  6614231
## 5 ENSG00000177674 chr1 11735084 11736584
## 6 ENSG00000171608 chr1  9650731  9652231

Then, get the sequence of these TSS regions based on reference genome (reference genome can be download from UCSC database), and use fimo to scan these sequence to check whether the motif of transcription factor that regulates the target gene occur in the promoter regions of the target gene. Because fimo software only supports linux environment, we generate some shell script to run Fimo software.

應(yīng)該設(shè)置以下四個參數(shù)：

refdir：參考基因組路徑
fimodir：Fimo 軟件的路徑若未。如果 Fimo 已經(jīng)設(shè)置為全局環(huán)境變量朱嘴，只需設(shè)置‘fimodir <- fimo’。
outputdir1：shell腳本的輸出路徑和目標(biāo)基因tss區(qū)域的序列粗合。（函數(shù)'find_motifs_targetgenes'會在'outputdir1'路徑下自動生成兩個文件夾（'fasta'和'fimo'）腕够，并存儲目標(biāo)基因tss區(qū)域的序列在“fimo”中的“fasta”和shell腳本中）
Motifdir：motif 文件的路徑，可從 https://github.com/jiang-junyao/MEMEmotif 或 TRANSFAC 數(shù)據(jù)庫下載舌劳。

Please note that, at the end of ‘outputdir1’,‘motifdir’ and ‘sequencedir’, the symbol ‘/’should be contained. What’s more, the chromosome name of your reference genome used here should be the same as the chromosome name in the gene_tss

### run the following codes in the R under linux environment
refdir='/public/home/user/genome/hg38.fa'
fimodir <- 'fimo'
outputdir1 <- '/public/home/user/fimo/'
motifdir <- '/public/home/user/fimo/Mememotif/'
find_motifs_targetgenes(gene_tss,motif1,refdir,fimodir,outputdir1,motifdir)

Then run the following shell codes to activate fimo

### run the following codes in the shell
cd /public/home/user/fimo/fimo
sh fimoall.sh

Then, Fimo result are stored in the ‘fimo’ folder under outputdir1, and we run the following R codes to refine regulatory relationships

motif1 <- Tranfac201803_Hs_MotifTFsF
outputdir <- paste0(outputdir1,'fimo/')
fimo_regulation <- generate_fimo_regulation(outputdir,motif1)
filtered_regulatory_relationships <- filter_regulation_fimo(fimo_regulation, regulatory_relationships)

Part 3: Analyze ATAC-seq data to refine regulatory relationships (with ATAC-seq data)

For ATAC-seq data is available, IReNA calculates the footprints of transcription factors and footprint occupancy score (FOS) to refine regulatory relationships. The footprints whose distance is less than 4 are merged and then the sequence of each footprint is obtained from the reference genome through the function ‘get_merged_fasta’. The reference genome should be in fasta/fa format which can be downloaded from UCSC or other genome database.

###merge footprints whose distance is less than 4
filtered_footprints <- read.table('footprints.bed',sep = '\t')
fastadir <- 'Genome/hg38.fa' 
merged_fasta <- get_merged_fasta(filtered_footprints,fastadir)
write.table(merged_fasta,'merged_footprints.fasta',row.names=F,quote=F,col.names=F)

After obtaining the motif sequences, use fimo software to identify binding motifs in the footprints. Because fimo software only supports linux environment, we generate a shell script to run Fimo software.

First, identify differentially expressed genes related motifs through the function ‘motif_select’, which will reduce the running time of the subsequent analysis process.

Then, you should set the following five parameters:

1. fimodir: path of Fimo software. If Fimo has been set to the global environment variable, just set ‘fimodir <- fimo’.
1. outputdir1: output path of the shell scripts.
1. outputdir: output path of Fimo result.
1. motifdir: path of the motif file, which can be downloaded from https://github.com/jiang-junyao/MEMEmotif or TRANSFAC database.
1. sequencedir: path of the sequence which is generated by the function ‘get_merged_fasta’.

Please note that, at the end of ‘outputdir’ and ‘sequencedir’ the symbol ‘/’should be contained.

### Identify differentially expressed genes related motifs
motif1 <- motifs_select(Tranfac201803_Hs_MotifTFsF, rownames(Kmeans_clustering_ENS)) ###Kmeans_clustering_ENS was obtained in part1
### run find_motifs()
fimodir <- 'fimo'
outputdir1 <- '/public/home/user/fimo/'
outputdir <- '/public/home/user/fimo/output/'
motifdir <- '/public/home/user/fimo/Mememotif/'
sequencedir <- '/public/home/user/fimo/merged_footprints.fasta'
find_motifs(motif1,step=20,fimodir, outputdir1, outputdir, motifdir, sequencedir)

Then run the following shell codes to activate fimo

### run the following commands in the shell
cd /public/home/user/fimo/ 
mkdir output
sh ./fimo_all.sh

Then, we combine these Fimo consequence in ‘outputdir’. Notably outputdir folder should only contain fimo result files. Next, we load the peaks file and overlap differential peaks and motif footprints through overlap_footprints_peaks() function

###Combine all footprints of motifs
combined <- combine_footprints(outputdir)
peaks <- read.delim('differential_peaks.bed')
overlapped <- overlap_footprints_peaks(combined,peaks)

However, the running time of overlap_footprints() is too long, so it’s highly recommanded to use bedtools to do overlap in linux system. If you want to use bedtools to do overlap, you need to output ‘combined_footprints’ dataframe, and transfer it to shell.(If you make analysis in linux system, ignore transfer part)

### output combined_footprints
write.table(combined,'combined.txt',quote = F,row.names = F,col.names = F,sep = '\t')
### Transfer combied.txt and differential_peaks.bed to linux system, and then run the following commands in the shell
bedtools intersect -a combined.txt -b differential_peaks.bed -wa -wb > overlappd.txt

Next, we intergrate bioconductor package ChIPseeker to get footprint-related genes. Before we run get_related_genes(), we need to specify TxDb, which can be download from: Bioconductor TxDb list. Kmeans_clustering_ENS used here was obtained in part1.

### If you make overlap by bedtools, read 'overlapped.txt' to R
overlapped <- read.table('overlapped.txt')
###get footprint-related genes
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
list1 <- get_related_genes(overlapped,txdb = txdb,motif=Tranfac201803_Mm_MotifTFsF,Species = 'Hs')
###Get candidate genes/TFs-related peaks
list2 <- get_related_peaks(list1,Kmeans_clustering_ENS)
### output filtered footprints
write.table(list2[[1]],'filtered_footprints.bed', quote = F, row.names = F, col.names = F, sep = '\t')

Then, because of the size of original BAM file is too large, so we need to use samtools to extract footprints realated regions in BAM to reduce running time of function which analyze bam files in IReNA. (If you use our test data, just skip this step)

### transfer filtered_footprints.bed to linux system and run the following codes
samtools view -hb -L filtered_footprint.bed SSC_patient1.bam > SSC1_filter.bam
samtools view -hb -L filtered_footprint.bed SSC_patient2.bam > SSC2_filter.bam
samtools view -hb -L filtered_footprint.bed esc.bam > esc_filter.bam

此外，通過 wig_track() 計算足跡中每個位置的切割玫荣，并使用這些切割來計算足跡的 FOS甚淡，以識別決定監(jiān)管關(guān)系的豐富 TF。此處使用的regulatory_relationships 是在第1 部分中計算的捅厂。這里使用的參數(shù) FOS_threshold 是過濾低質(zhì)量足跡的閾值贯卦，您可以增加它以減少導(dǎo)出足跡的數(shù)量。

### calculate cuts of each each position in footprints
bamfilepath1 <- 'SSC1_filter.bam'
bamfilepath2 <- 'SSC2_filter.bam'
bamfilepath3 <- 'esc_filter.bam'
cuts1 <- wig_track(bamfilepath = bamfilepath1,bedfile = list2[[1]])
cuts2 <- wig_track(bamfilepath = bamfilepath2,bedfile = list2[[1]])
cuts3 <- wig_track(bamfilepath = bamfilepath3,bedfile = list2[[1]])
wig_list <- list(cuts1,cuts2,cuts3)
### get related genes of footprints with high FOS
potential_regulation <- Footprints_FOS(wig_list,list2[[2]], FOS_threshold = 0.1)
### Use information of footprints with high FOS to refine regulatory relationships
filtered_regulatory <- filter_ATAC(potential_regulation,regulatory_relationships)

Part 4: Regulatory network analysis and visualization

After we get ‘filtered_regulatory_relationships’ and ‘Kmeans_clustering_ENS’, we can reconstruct regulatory network. Run network_analysis() to get regulatory, this function will generate a list which contain the following 9 elements:

(1)Cor_TFs.txt: list of expressed TFs in the gene networks.
(2)Cor_EnTFs.txt: list of TFs which significantly regulate gene modules (or enriched TFs).
(3)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs in which the source gene is enriched TF.
(4)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs in which both source gene and target gene are enriched TFs.
(5)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs only including regulations within each module but not those between modules, in this step
(6)TF_list: enriched TFs which significantly regulate gene modules
(7)TF_module_regulation: details of enriched TFs which significantly regulate gene modules
(8)TF_network: regulatory network for enriched transcription factors of each module
(9)intramodular_network: intramodular regulatory network

Here, we use refined regulatory relationships from part2 to reconstruct regulatory networks

TFs_list <- network_analysis(filtered_regulatory_relationships,Kmeans_clustering_ENS,TFFDR1 = 10,TFFDR2 = 10)

We can also make enrichment analysis for differentially expressed genes in each module. Before you run this function, you need to download the org.db for your species through BiocManager.

### Download Homo sapiens org.db
#BiocManger::install('org.Hs.eg.db')
library(org.Hs.eg.db)
### Enrichment analysis (KEGG)
enrichment_KEGG <- enrich_module(Kmeans_clustering_ENS, org.Hs.eg.db, enrich.db = 'KEGG',organism = 'hsa')
#enrichment_GO <- enrich_module(Kmeans_cluster_ENS, org.Hs.eg.db, 'GO')
head(enrichment_KEGG)
##                ID                 Description module -log10(q-value) GeneRatio
## hsa03010 hsa03010                    Ribosome      1        6.500237    13/101
## hsa03040 hsa03040                 Spliceosome      1        1.964315     9/101
## hsa03022 hsa03022 Basal transcription factors      1        1.964315     5/101
## hsa05016 hsa05016        Huntington's disease      1        1.126355     9/101
## hsa03013 hsa03013               RNA transport      1        1.126355     8/101
## hsa05200 hsa05200          Pathways in cancer      2        4.866522    38/276
##           BgRatio       pvalue     p.adjust       qvalue
## hsa03010  92/5894 3.661614e-09 3.258836e-07 3.160551e-07
## hsa03040 128/5894 3.138210e-04 1.119399e-02 1.085638e-02
## hsa03022  37/5894 3.773255e-04 1.119399e-02 1.085638e-02
## hsa05016 183/5894 3.932733e-03 7.708051e-02 7.475579e-02
## hsa03013 152/5894 4.330366e-03 7.708051e-02 7.475579e-02
## hsa05200 327/5894 1.153409e-07 1.949262e-05 1.359809e-05
##                                                                                                                                                                                                  geneID
## hsa03010                                                                                                                              6122/6143/6206/6194/51187/6135/6161/6167/6159/6166/9045/6136/6191
## hsa03040                                                                                                                                             10992/3312/9775/10569/83443/5356/10594/8449/494115
## hsa03022                                                                                                                                                                       9519/2962/6877/2957/6879
## hsa05016                                                                                                                                                 9519/5978/7019/51164/498/27089/54205/4512/2876
## hsa03013                                                                                                                                                     11218/8761/1983/9775/5042/50628/6612/10921
## hsa05200 8312/2263/2260/6932/7188/3912/6655/331/5595/7976/5296/1021/7473/4790/329/1027/2335/6772/6654/5290/5335/2258/3845/5156/54583/5899/3688/26060/3716/836/6774/8313/5293/3913/6777/5727/5154/112398
##          Count
## hsa03010    13
## hsa03040     9
## hsa03022     5
## hsa05016     9
## hsa03013     8
## hsa05200    38

Moreover, you can do GO analysis

library(org.Hs.eg.db)
### Enrichment analysis (GO)
enrichment_GO <- enrich_module(Kmeans_clustering_ENS, enrich.db = 'GO',org.Hs.eg.db)

You can visualize regulatory networks for enriched transcription factors of each module through plot_network() function by setting type parameter as ‘TF’. This plot shows regulatory relationships between transcription factors in different modules that significantly regulating other modules. The size of each vertex determine the significance of this transcription factor. Yellow edges are positive regulation, grey edges are negative regulation.

plot_tf_network(TFs_list)

圖片.png

可以通過 plot_intramodular_network() 函數(shù)可視化具有豐富功能的模塊內(nèi)網(wǎng)絡(luò)焙贷。在運(yùn)行此功能之前撵割，您可以選擇要在圖中顯示的每個模塊的豐富功能。如果輸入所有豐富的函數(shù)辙芍，該函數(shù)將自動選擇每個模塊中-log10(qvalue) 最高的函數(shù)呈現(xiàn)在圖中啡彬。此外羹与，每個模塊中邊數(shù)最多的轉(zhuǎn)錄因子也將顯示在圖中。

### select functions that you want to present in the figure
enrichment_KEGG_select <- enrichment_KEGG[c(3,7,11),]
### plotting
plot_intramodular_network(TFs_list,enrichment_KEGG_select,layout = 'circle')

圖片.png

It is strongly recommended to use Cytoscape to display the regulatory networks. We provide a function that can provide different Cytoscape styles. You need to install and open Cytoscape before running the function.

###optional: display the network in cytoscape, open cytoscape before running this function
initiate_cy(TFs_list, layout1='degree-circle', type='TF')
initiate_cy(TFs_list, layout1='grid', type='module')

These are the picture we processed through Cytoscape, which can show the regulatory relationship of modularized transcription factors.

Use Cytoscape to visualize regulatory network for enriched transcription factors of each module

圖片.png

Use Cytoscape to visualize intramodular network

圖片.png

Option: Use RcisTarget to refine regulatory relaionships (without ATAC-seq data)

IReNA also provides filter_regulation function (Based on RcisTarget) to refine regulation relationships. Due to the limitations of RcisTarget, this function currently only supports three species (Hs, Mm and Fly). So if the species of your data is not included, and you don’t have ATAC-seq data, you can use unrefined regulatory relaionships to perform part4 analysis directly.

Before run this function, you need to download Gene-motif rankings database from https://resources.aertslab.org/cistarget/ and set the Rankingspath1 as the path of the downloaded Gene-motif rankings database. If you don’t know which database to choose, we suggest that using ‘hg19-500bp-upstream-7species.mc9nr’ for human, using ‘mm9-500bp-upstream-10species.mc9nr’ for mouse, using ‘dm6-5kb-upstream-full-tx-11species.mc8nr’ for fly. You can download it manually, or use R code (If you are in mainland china, i suggest you to download database from website):

### Download Gene-motif rankings database
featherURL <- "https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather"
download.file(featherURL, destfile=basename(featherURL)) # saved in current dir
### Refine regulatory relaionships
Rankingspath1 <- 'hg19-500bp-upstream-7species.mc9nr1.feather' # download from https://resources.aertslab.org/cistarget/
filtered_regulation_Rcis <- filter_regulation(Kmeans_clustering_ENS, regulatory_relationships, 'Hs', Rankingspath1)

生活很好庶灿，有你更好

最后編輯于：2021.11.30 17:23:12

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

禁止轉(zhuǎn)載纵搁，如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者。

人面猴
序言：七十年代末往踢，一起剝皮案震驚了整個濱河市腾誉，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌峻呕，老刑警劉巖利职，帶你破解...
沈念sama閱讀 216,324評論 6贊 498
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異瘦癌，居然都是意外死亡猪贪，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,356評論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門佩憾，熙熙樓的掌柜王于貴愁眉苦臉地迎上來哮伟，“玉大人，你說我怎么就攤上這事妄帘±慊疲” “怎么了？”我有些...
開封第一講書人閱讀 162,328評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵抡驼，是天一觀的道長鬼廓。經(jīng)常有香客問我，道長致盟，這世上最難降的妖魔是什么碎税？我笑而不...
開封第一講書人閱讀 58,147評論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮馏锡，結(jié)果婚禮上雷蹂，老公的妹妹穿的比我還像新娘。我一直安慰自己杯道，他們只是感情好匪煌，可當(dāng)我...
茶點(diǎn)故事閱讀 67,160評論 6贊 388
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著党巾，像睡著了一般萎庭。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上齿拂，一...
開封第一講書人閱讀 51,115評論 1贊 296
城市分裂傳說
那天驳规，我揣著相機(jī)與錄音，去河邊找鬼署海。笑死吗购，一個胖子當(dāng)著我的面吹牛医男，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播巩搏，決...
沈念sama閱讀 40,025評論 3贊 417
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼昨登，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了贯底？” 一聲冷哼從身側(cè)響起丰辣，我...
開封第一講書人閱讀 38,867評論 0贊 274
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎禽捆，沒想到半個月后笙什，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,307評論 1贊 310
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡胚想，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,528評論 2贊 332
?白月光啟示錄
正文我和宋清朗相戀三年琐凭，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片浊服。...
茶點(diǎn)故事閱讀 39,688評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡统屈，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出牙躺，到底是詐尸還是另有隱情愁憔，我是刑警寧澤，帶...
沈念sama閱讀 35,409評論 5贊 343
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布孽拷，位于F島的核電站吨掌，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏脓恕。R本人自食惡果不足惜膜宋，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,001評論 3贊 325
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望炼幔。院中可真熱鬧秋茫，春花似錦、人聲如沸乃秀。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,657評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽环形。三九已至，卻和暖如春衙傀，著一層夾襖步出監(jiān)牢的瞬間抬吟，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 32,811評論 1贊 268
情欲美人皮
我被黑心中介騙來泰國打工统抬，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留火本，地道東北人危队。一個月前我還...
沈念sama閱讀 47,685評論 2贊 368
代替公主和親
正文我出身青樓，卻偏偏與公主長得像钙畔，于是被迫代替她去往敵國和親茫陆。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,573評論 2贊 353

10X單細(xì)胞 & 10XATAC聯(lián)合之調(diào)控網(wǎng)絡(luò)分析（IReNA）

好了爱只，開始分享

For users who want to use ATAC-seq data to refine regulatory relationships, Computer or server of linux system and the following software are needed: samtools, bedtools and fimo.

安裝

Workflow

Inputs of IReNA

如果僅使用 scRNA-seq 或大量 RNA-seq 數(shù)據(jù)運(yùn)行 IReNA，則需要以下輸入：(2) scRNA-seq 或bulk RNA-seq 數(shù)據(jù)的Raw counts和 (3) 物種的參考基因組 (4) GTF 文件 (5) 轉(zhuǎn)錄因子的Motif information幻馁。

If both ATAC-seq data and scRNA-seq data are used to perform IReNA, the following files are needed: (1) Bam, peak and footprint files of ATAC-seq data (2) Raw counts of scRNA-seq or bulk RNA-seq data (3) Reference genome of the species (5) Motif information of transcription factors.

以下內(nèi)容包含4個部分

IReNA 包含四個主要部分來重建監(jiān)管網(wǎng)絡(luò)：

Part 1: Analyze scRNA-seq or bulk RNA-seq data to get basic regulatory relationships

IReNA 支持三種輸入格式：

Next, use differentially expressed genes (DEGs) across the pseudotime to refine the seurat object, if you already have identified DEGs, you just need to run subset function in seurat:

Visualize the clustering of gene expression profiles through the heatmap

Part 2: Analyze binding motifs of target genes to refine regulatory relaionships (without ATAC-seq data)

應(yīng)該設(shè)置以下四個參數(shù)：

Please note that, at the end of ‘outputdir1’,‘motifdir’ and ‘sequencedir’, the symbol ‘/’should be contained. What’s more, the chromosome name of your reference genome used here should be the same as the chromosome name in the gene_tss

Then run the following shell codes to activate fimo

Then, Fimo result are stored in the ‘fimo’ folder under outputdir1, and we run the following R codes to refine regulatory relationships

Part 3: Analyze ATAC-seq data to refine regulatory relationships (with ATAC-seq data)

Please note that, at the end of ‘outputdir’ and ‘sequencedir’ the symbol ‘/’should be contained.

Then run the following shell codes to activate fimo

Next, we intergrate bioconductor package ChIPseeker to get footprint-related genes. Before we run get_related_genes(), we need to specify TxDb, which can be download from: Bioconductor TxDb list. Kmeans_clustering_ENS used here was obtained in part1.

Part 4: Regulatory network analysis and visualization

We can also make enrichment analysis for differentially expressed genes in each module. Before you run this function, you need to download the org.db for your species through BiocManager.

It is strongly recommended to use Cytoscape to display the regulatory networks. We provide a function that can provide different Cytoscape styles. You need to install and open Cytoscape before running the function.

Use Cytoscape to visualize intramodular network

Option: Use RcisTarget to refine regulatory relaionships (without ATAC-seq data)

推薦閱讀更多精彩內(nèi)容