RcisTarget || 從基因列表到調控網(wǎng)絡

單細胞技術正在加強我們對細胞本身的理解颤霎。什么是細胞類型，不就是基因選擇性表達的結果嗎？而基因的選擇表達受到一系列的轉錄調控扒俯，在這個意義上奶卓，細胞命運背后的驅動力在于各自轉錄因子表達的程序化及其靶標基因。有的細胞類型里面是不是有其特異的轉錄因子呢撼玄？

我們不止一次提過夺姑，細胞的特征由其表達的基因描述。

所以在單細胞數(shù)據(jù)分析中掌猛，比較重要的一步就是拿到某一群細胞的差異基因（一般是 one to others的方法）盏浙。然后，由這個基因集來推斷細胞的特征荔茬，而基因的表達受到轉錄因子的調節(jié)废膘。不同的轉錄因子調控不同的基因，進而使細胞表現(xiàn)出不同的狀態(tài)或類型兔院。進一步地說殖卑，細胞保持其狀態(tài)又需要表觀遺傳信號來維持。拿到基因集之后坊萝，我們要描述該亞群的特點就轉變?yōu)榛蚣奶卣鞣趸１疚奈覀兝^續(xù)跟著SCENIC的內核，來基于基因集分析轉錄因子（Transcription factor 十偶，TF）菩鲜。在我們應用工具的時候，往往困難的不是如何使用工具惦积，而是為什么要用接校？

今天我們介紹的RcisTarget是一個R-package用于識別基因列表中轉錄因子(TF)的結合基序（binding motifs ）。首先我們獲得一個基因列表：

library(RcisTarget)
library(Seurat)
library(SeuratData)
library(tidyverse)

mk <-FindAllMarkers( pbmc3k.final)

 mk %>% filter(cluster == 'B') %>% top_n(35,avg_logFC) %>% head()
                    p_val avg_logFC pct.1 pct.2     p_val_adj cluster      gene
CD79A.3      0.000000e+00  2.987583 0.936 0.041  0.000000e+00       B     CD79A
MS4A1.3      0.000000e+00  2.341555 0.855 0.053  0.000000e+00       B     MS4A1
CD79B.3     2.655974e-274  2.413475 0.916 0.142 3.642403e-270       B     CD79B
LINC00926.1 2.397625e-272  1.970393 0.564 0.009 3.288103e-268       B LINC00926
TCL1A.3     9.481783e-271  2.489493 0.622 0.022 1.300332e-266       B     TCL1A
HLA-DQA1.2  2.942395e-266  2.119572 0.890 0.118 4.035201e-262       B  HLA-DQA1

(mk %>% filter(cluster == 'B') %>% top_n(35,avg_logFC))$gene   -> Bcell
(mk %>% filter(cluster == 'Naive CD4 T') %>% top_n(35,avg_logFC))$gene   -> NaiveCD4T


geneLists <- list('Bcell'=Bcell,'NaiveCD4T'=NaiveCD4T)
geneLists
geneLists
$Bcell
 [1] "CD79A"     "MS4A1"     "CD79B"     "LINC00926" "TCL1A"     "HLA-DQA1"  "VPREB3"   
 [8] "HLA-DQB1"  "CD74"      "HLA-DRA"   "FCER2"     "HLA-DPB1"  "BANK1"     "HLA-DRB1" 
[15] "HLA-DPA1"  "TSPAN13"   "HLA-DQA2"  "FCRLA"     "CD37"      "HLA-DRB5"  "HVCN1"    
[22] "PKIG"      "HLA-DOB"   "BLK"       "HLA-DMA"   "GNG7"      "CD72"      "HLA-DMB"  
[29] "P2RX5"     "IGLL5"     "EAF2"      "PPAPDC1B"  "IRF8"      "SMIM14"    "IGJ"      

$NaiveCD4T
 [1] "RPS12"     "RPS27"     "RPS6"      "RPS25"     "RPL9"      "RPL31"     "RPS3A"    
 [8] "LDHB"      "RPS27A"    "MALAT1"    "RPS13"     "RPS29"     "CCR7"      "CD3D"     
[15] "NPM1"      "CD3E"      "NOSIP"     "LEF1"      "PRKCQ-AS1" "PIK3IP1"   "JUNB"     
[22] "TMEM66"    "FHIT"      "CD7"       "IL7R"      "MAL"       "LINC00176" "TCF7"     
[29] "LDLRAP1"   "C6orf48"   "LEPROTL1"  "NDFIP1"    "C12orf57"  "CD8B"      "DNAJB1"

我們知道這B 細胞和 NaiveCD4T高度相關的兩個基因集狮崩，里面有我們熟悉的B 細胞的marker 蛛勉。

我們要用RcisTarget 來預測轉錄因子，就要知道這個是什么睦柴？首先是英語詞根（cis-）是順的意思诽凌。也就是在我們的討論體系里面是順式轉錄調控因子（cisTarget），這個是什么坦敌？

真核生物轉錄起始過程十分復雜侣诵，往往需要多種蛋白因子的協(xié)助，轉錄因子與RNA聚合酶Ⅱ形成轉錄起始復合體狱窘，共同參與轉錄起始的過程杜顺。根據(jù)轉錄因子的作用特點可分為二類；第一類為普遍轉錄因子,它們與RNA聚合酶Ⅱ共同組成轉錄起始復合體時蘸炸，轉錄才能在正確的位置開始躬络。除TFⅡD以外，還發(fā)現(xiàn)TFⅡA搭儒，TFⅡB洗鸵，TFⅡF越锈，TFⅡE，TFⅡH等膘滨，它們在轉錄起始復合體組裝的不同階段起作用甘凭。第二類轉錄因子為組織細胞特異性轉錄因子，這些TF是在特異的組織細胞或是受到一些類固醇激素\生長因子或其它刺激后火邓，開始表達某些特異蛋白質分子時丹弱，才需要的一類轉錄因子。

順式作用元件(cis-acting element)存在于基因旁側序列中能影響基因表達的序列铲咨。順式作用元件包括啟動子躲胳、增強子、調控序列和可誘導元件等纤勒，它們的作用是參與基因表達的調控坯苹。順式作用元件本身不編碼任何蛋白質，僅僅提供一個作用位點摇天，要與反式作用因子相互作用而起作用粹湃。

作為獨立的R程序包，RcisTarget使用特有的數(shù)據(jù)庫泉坐。在運行RcisTarget之前为鳄，您需要下載并安裝相關數(shù)據(jù)庫：RcisTarget需要兩種數(shù)據(jù)庫來分析基因list:

1。Gene-motif rankings::提供每個motif的所有基因的排名(~得分)腕让。
2孤钦。轉錄因子的motif注釋。

每對基因基序的得分可以用不同的參數(shù)來進行纯丸。因此偏形，我們提供多個數(shù)據(jù)庫(motif-rankings)，根據(jù)以下幾種可能性:

Species: Species of the input gene set. Available values: Human (Homo sapiens), mouse (Mus musculus) or fly (Drosophila melanogaster)
Scoring/search space: determine the search space around the transcription start site (TSS). Available values: 500 bp uptream the TSS, 5kbp or 10kbp around the TSS (e.g. 10kbp upstream and 10kbp downstream of the TSS).
Number of orthologous species taken into account to score the motifs (e.g. conservation of regions). Available values: 7 or 10 species

具體的數(shù)據(jù)庫在：https://resources.aertslab.org/cistarget/

作為一個例子觉鼻，可以在R里面這樣下載：

featherURL <- "https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather" 
download.file(featherURL, destfile=basename(featherURL)) # saved in current dir

當然這個不容易成功俊扭。

RcisTarget執(zhí)行的所有計算都是基于motif。然而滑凉，大多數(shù)用戶會對TFs可能調節(jié)基因集感興趣统扳。motif與轉錄因子的關聯(lián)在一個獨立的文件中提供喘帚。除了被源數(shù)據(jù)所注釋的基序(即直接注釋)外畅姊，我們還根據(jù)基序相似性和基因序列(例如與被注釋基序的其他基因的相似性)推斷出了一些進一步的注釋，即Motif 的注釋文件吹由。

下面用一個小的數(shù)據(jù)庫來演示一下若未。除了完整的數(shù)據(jù)庫(20k motifs)外，這是一個包含4.6k motifs的子集(人類:rcistarget .hg19. motifdb . cisbpont .500bp)倾鲫。這些子集可用于示范目的粗合。他們將為現(xiàn)有的主題提供相同的AUC評分萍嬉。但是，為了得到更準確的結果隙疚，我們強烈推薦使用完整版本(~20k motif)壤追，因為motif的歸一化富集評分(NES)取決于數(shù)據(jù)庫中motif的總數(shù)。

# From Bioconductor
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("RcisTarget.hg19.motifDBs.cisbpOnly.500bp") 

library(RcisTarget.hg19.motifDBs.cisbpOnly.500bp)
# Rankings
data(hg19_500bpUpstream_motifRanking_cispbOnly)
motifRankings <- hg19_500bpUpstream_motifRanking_cispbOnly
motifRankings

Rankings for RcisTarget.
  Organism: human (hg19)
  Number of genes: 22284 (22285 available in the full DB)
  Number of MOTIFS: 4687
** This database includes rankings up to 5050

Subset (4.6k cisbp motifs) of the Human database scoring motifs 500bp upstream the TSS (hg19-500bp-upstream-7species.mc9nr)

motif 注釋文件

data(motifAnnotations_hgnc)
 motifAnnotations_hgnc[1:4,1:4]
              motif     TF directAnnotation inferred_Orthology
1:   bergman__Abd-B  HOXA9            FALSE               TRUE
2:    bergman__Aef1   ZNF8            FALSE               TRUE
3:     bergman__Cf2 ZNF853            FALSE               TRUE
4: bergman__EcR_usp  NR1H2            FALSE               TRUE


head(motifAnnotations_hgnc[(directAnnotation==TRUE) 
+                       & 
+                           (TF %in% c("FOXP3")), ])
                             motif    TF directAnnotation inferred_Orthology
1:                    cisbp__M3285 FOXP3             TRUE              FALSE
2:                    cisbp__M3286 FOXP3             TRUE              FALSE
3:                    cisbp__M5474 FOXP3             TRUE              FALSE
4:                    cisbp__M6249 FOXP3             TRUE              FALSE
5: hocomoco__FOXP3_HUMAN.H11MO.0.D FOXP3             TRUE              FALSE
6:      swissregulon__hs__FOXP3.p2 FOXP3             TRUE              FALSE
   inferred_MotifSimil annotationSource                description
1:               FALSE directAnnotation gene is directly annotated
2:               FALSE directAnnotation gene is directly annotated
3:               FALSE directAnnotation gene is directly annotated
4:               FALSE directAnnotation gene is directly annotated
5:               FALSE directAnnotation gene is directly annotated
6:               FALSE directAnnotation gene is directly annotated

一旦加載了基因列表和數(shù)據(jù)庫供屉，cisTarget()就可以使用它們行冰。cisTarget()運行按順序執(zhí)行的步驟
(1)motif 富集分析,
(2)motif-TF注釋,和
(3)選擇的重要基因。

motifEnrichmentTable_wGenes <- cisTarget(geneLists, 
         motifRankings,
         motifAnnot=motifAnnotations_hgnc)
motifEnrichmentTable_wGenes_wLogo <- addLogo(motifEnrichmentTable_wGenes)

resultsSubset <- motifEnrichmentTable_wGenes_wLogo[1:10,]

library(DT)
datatable(resultsSubset[,-c("enrichedGenes", "TF_lowConf"), with=FALSE], 
          escape = FALSE, # To show the logo
          filter="top", options=list(pageLength=5))

也可以將這些步驟作為單獨的命令分別運行伶丐。對用戶感興趣的一個輸出進行分析悼做，或者優(yōu)化工作流以在多個基因列表上運行它。我們將分步驟來演示哗魂。

motif 富集分析

估計每個基序在基因集上的過表達（over-representation）的第一步是計算每對基序-基序集的曲線下面積(AUC)肛走。這是根據(jù)基因集對基序排序的恢復曲線計算的(根據(jù)基序在其鄰近度上的得分，基因依次遞減录别，如motifRanking數(shù)據(jù)庫中提供的那樣)朽色。

motifs_AUC <- calcAUC(geneLists, motifRankings, nCores=1)
(motifs_AUC )


AUC for 2 gene-sets and 4687 motifs.
  Organism: human (hg19)
  Subset (4.6k cisbp motifs) of the Human database scoring motifs 500bp upstream the TSS (hg19-500bp-upstream-7species.mc9nr)

AUC matrix preview:
          cisbp__M4240 cisbp__M5090 cisbp__M0288 cisbp__M4475 cisbp__M1434
Bcell       0.01490239    0.0476514  0.012773475   0.06948408   0.02396159
NaiveCD4T   0.00000000    0.0000000  0.009829234   0.01775604   0.03664447

AUC是由geneset提供的圖形矩陣。原則上庶灿，AUC主要是作為下一步的輸入纵搁。然而，也有可能探索分數(shù)的分布往踢，例如:

par(mfrow = c(1,2))

for(i in c("Bcell",'NaiveCD4T')){
auc <- getAUC(motifs_AUC)[i,]
hist(auc, main=i, xlab="AUC histogram",
     breaks=100, col="#ff000050", border="darkred")
nes3 <- (3*sd(auc)) + mean(auc)
abline(v=nes3, col="red")
}

motif-TF注釋

顯著性motif的選擇是基于歸一化富集評分( Normalized Enrichment Score,NES)進行的腾誉。每個motif的NES是根據(jù)基因集中所有基序的AUC分布[(x-mean)/sd]計算。那些通過給定閾值(默認為3.0)的motifs 被認為是重要的峻呕。

motifEnrichmentTable <- addMotifAnnotation(motifs_AUC, nesThreshold=3,
                                           motifAnnot=motifAnnotations_hgnc,
                                           highlightTFs=list(Bcell="RFX5",NaiveCD4T='ZNF274'))
head(motifEnrichmentTable[,-"TF_lowConf", with=FALSE])
   geneSet        motif   NES   AUC highlightedTFs TFinDB                 TF_highConf
1:   Bcell cisbp__M4510 13.80 0.281           RFX5     **   RFX5 (directAnnotation). 
2:   Bcell cisbp__M4546 13.10 0.267           RFX5     **   RFX5 (directAnnotation). 
3:   Bcell cisbp__M4476 12.60 0.258           RFX5     **   RFX5 (directAnnotation). 
4:   Bcell cisbp__M0830  7.02 0.150           RFX5                                   
5:   Bcell cisbp__M4575  6.21 0.134           RFX5     **   RFX5 (directAnnotation). 
6:   Bcell cisbp__M6544  5.83 0.127           RFX5        HIVEP1 (directAnnotation).

找出每個基序富集最佳的基因

從RcisTarget搜索motif 的富集結果利职，找到一個motif 是“enriched”并不意味著所有的gene-list的motif 高分。這樣瘦癌，工作流程的第三步是確定哪些基因(基因集)對每個重要的motifs是高度排序的猪贪。有兩種方法來識別這些基因:(1)相當于那些用于iRegulon和i-cisTarget(method=“iCisTarget”方法,如果運行時間不是問題，建議使用該方法),和(2)更快的一種是實現(xiàn)基于一個近似使用平均分布在每一個等級(method=“aprox”方法,掃描多個基因集)讯私。

motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable,
                      rankings=motifRankings, 
                      geneSets=geneLists)

motifEnrichmentTable_wGenes[1:4,]
   geneSet        motif   NES   AUC highlightedTFs TFinDB               TF_highConf
1:   Bcell cisbp__M4510 13.80 0.281           RFX5     ** RFX5 (directAnnotation). 
2:   Bcell cisbp__M4546 13.10 0.267           RFX5     ** RFX5 (directAnnotation). 
3:   Bcell cisbp__M4476 12.60 0.258           RFX5     ** RFX5 (directAnnotation). 
4:   Bcell cisbp__M0830  7.02 0.150           RFX5                                 
                                                                                             TF_lowConf nEnrGenes rankAtMax
1: RFX1; RFX2; RFX3 (inferredBy_MotifSimilarity). RFX6; RFX7 (inferredBy_MotifSimilarity_n_Orthology).         10       184
2:                                                      RFX6 (inferredBy_MotifSimilarity_n_Orthology).         10       184
3:                                                      RFX6 (inferredBy_MotifSimilarity_n_Orthology).         10       324
4:                                                                                                             16      3680
                                                                                                                    enrichedGenes
1:                                              CD74;HLA-DMA;HLA-DMB;HLA-DOB;HLA-DPA1;HLA-DPB1;HLA-DQB1;HLA-DRA;HLA-DRB1;HLA-DRB5
2:                                              CD74;HLA-DMA;HLA-DMB;HLA-DOB;HLA-DPA1;HLA-DPB1;HLA-DQB1;HLA-DRA;HLA-DRB1;HLA-DRB5
3:                                              CD74;HLA-DMA;HLA-DMB;HLA-DOB;HLA-DPA1;HLA-DPB1;HLA-DQB1;HLA-DRA;HLA-DRB1;HLA-DRB5
4: BANK1;EAF2;HLA-DMA;HLA-DMB;HLA-DOB;HLA-DPA1;HLA-DPB1;HLA-DQA1;HLA-DQA2;HLA-DQB1;HLA-DRA;HLA-DRB1;HLA-DRB5;IGJ;PPAPDC1B;TSPAN13

geneSetName <- "Bcell"
selectedMotifs <-sample(motifEnrichmentTable$motif, 3)
par(mfrow=c(2,2))
getSignificantGenes(geneLists[[geneSetName]], 
                    motifRankings,
                    signifRankingNames=selectedMotifs,
                    plotCurve=TRUE, maxRank=5000, genesFormat="none",
                    method="aprox")

圖中紅線為各motif的恢復曲線平均值热押，綠線為平均值+標準差，藍線為當前motif的恢復曲線斤寇。當前motif與綠色曲線的最大距離點(mean+sd)桶癣，為選擇的最大富集等級。所有等級較低的基因都被認為是富集的娘锁。

RcisTarget的最終輸出是一個包含以下領域中基序富集及其注釋信息的表:


    geneSet: Name of the gene set
    motif: ID of the motif
    NES: Normalized enrichment score of the motif in the gene-set
    AUC: Area Under the Curve (used to calculate the NES)
    TFinDB: Indicates whether the highlightedTFs are included within the high confidence annotation (two asterisks) or low confidence annotation (one asterisk).
    TF_highConf: Transcription factors annotated to the motif according to ‘motifAnnot_highConfCat’.
    TF_lowConf: Transcription factors annotated to the motif according to ‘motifAnnot_lowConfCat’.
    enrichedGenes: Genes that are highly ranked for the given motif.
    nErnGenes: Number of genes highly ranked
    rankAtMax: Ranking at the maximum enrichment, used to determine the number of enriched genes.

motifEnrichmentTable_wGenes_wLogo <- addLogo(motifEnrichmentTable_wGenes)

resultsSubset <- motifEnrichmentTable_wGenes_wLogo[1:10,]

library(DT)
datatable(resultsSubset[,-c("enrichedGenes", "TF_lowConf"), with=FALSE], 
          escape = FALSE, # To show the logo
          filter="top", options=list(pageLength=5))

TFs注釋富集的motif

注意牙寞，TFs是基于提供的motif注釋。它們可以作為選擇相關motif或對某些TFs進行排序的參考，但motif注釋并不意味著表中出現(xiàn)的所有TFs都對基因列表進行調控间雀。

anotatedTfs  <- lapply(split(motifEnrichmentTable_wGenes$TF_highConf,
                            motifEnrichmentTable$geneSet),
                     function(x) {
                           genes <- gsub(" \\(.*\\). ", "; ", x, fixed=FALSE)
                          genesSplit <- unique(unlist(strsplit(genes, "; ")))
                         return(genesSplit)
                      })
anotatedTfs  
$Bcell
 [1] "RFX5"   "HIVEP1" "RELA"   "NFKB1"  "GTF3A"  "ZXDA"   "ZXDB"   "ZXDC"   "EBF1"   "REL"    "RREB1"  "ZNF821" "SOX21"  "SOX9"  
[15] "GFI1B"  "NFKB2"  "PPARA"  "SMAD3"  "PRDM4"  "STAT3"  "NFYA"   "SMAD4"  "NKX2-1" "SRY"    "ETV5"   "MECOM"  "POU2F1" "HIVEP2"
[29] "HIVEP3" "ZNF831" "POU2F2" "SOX3"   "TBX3"   "DUX4"   "TBX1"   "SOX2"  

$NaiveCD4T
 [1] "ERG"    "ETS1"   "ERF"    "FLI1"   "GABPA"  "ELK4"   "ELF1"   "ETV2"   "ETV6"   "FEV"    "ELK1"   "ETV5"   "ETV3"   "ELK3"  
[15] "HSF1"   "ETV1"   "ETV4"   "FOXN1"  "FOXN2"  "FOXN3"  "FOXN4"  "FOXR1"  "FOXR2"  "NR2C2"  "ETV7"   "ELF2"   "ELF4"   "ESRRG" 
[29] "FOXO3"  "SP2"    "ESRRA"  "LEF1"   "TCF7"   "TCF7L1" "TCF7L2" "ZNF274"

到這里悔详，兩種細胞類型的基因集轉化為相應的轉錄因子集，也就是由基因表達層面轉到了轉錄調控層面惹挟。找到轉錄因子之后茄螃，我們可以在jaspar上面來查看進一步的信息。為什么Bcell 是這些轉錄因子连锯，而T 又是那些呢责蝠？關于人類轉錄因子，不得不提2018年的一篇綜述文章：

所以我們看到的大部分轉錄因子字母縮寫不認識可以或者不知道有什么關系萎庭，是因為我們的背景知識不夠扎實啊霜医。在這篇綜述中：

This review considers how TFs are identified and functionally characterized, principally through the lens of a catalog of over 1,600 likely human TFs and binding motifs for two-thirds of them.

創(chuàng)建網(wǎng)絡

signifMotifNames <- motifEnrichmentTable$motif[1:5]

incidenceMatrix <- getSignificantGenes(geneLists$Bcell, 
                                       motifRankings,
                                       signifRankingNames=signifMotifNames,
                                       plotCurve=TRUE, maxRank=5000, 
                                       genesFormat="incidMatrix",
                                       method="aprox")$incidMatrix
incidenceMatrix
             BANK1 BLK CD37 CD72 CD74 CD79A CD79B EAF2 FCER2 FCRLA GNG7 HLA-DMA HLA-DMB HLA-DOB HLA-DPA1 HLA-DPB1 HLA-DQA1
cisbp__M4510     0   0    0    0    1     0     0    0     0     0    0       1       1       1        1        1        0
cisbp__M4546     0   0    0    0    1     0     0    0     0     0    0       1       1       1        1        1        0
cisbp__M4476     0   0    0    0    1     0     0    0     0     0    0       1       1       1        1        1        0
cisbp__M0830     1   0    0    0    0     0     0    1     0     0    0       1       1       1        1        1        1
cisbp__M4575     0   0    0    0    1     0     0    0     0     0    0       1       1       1        1        1        1
             HLA-DQA2 HLA-DQB1 HLA-DRA HLA-DRB1 HLA-DRB5 HVCN1 IGJ IGLL5 IRF8 MS4A1 P2RX5 PKIG PPAPDC1B TCL1A TSPAN13 VPREB3
cisbp__M4510        0        1       1        1        1     0   0     0    0     0     0    0        0     0       0      0
cisbp__M4546        0        1       1        1        1     0   0     0    0     0     0    0        0     0       0      0
cisbp__M4476        0        1       1        1        1     0   0     0    0     0     0    0        0     0       0      0
cisbp__M0830        1        1       1        1        1     0   1     0    0     0     0    0        1     0       1      0
cisbp__M4575        0        1       1        1        1     0   0     0    0     0     0    0        1     0       0      0

看到incidenceMatrix 面熟嗎？我們可以用igraph直接構建網(wǎng)絡啊:

library(igraph)
mugh<- graph_from_incidence_matrix(incidenceMatrix, directed = F)
mugh

IGRAPH 7199158 UN-B 38 58 -- 
+ attr: type (v/l), name (v/c)
+ edges from 7199158 (vertex names):
 [1] cisbp__M4510--CD74     cisbp__M4510--HLA-DMA  cisbp__M4510--HLA-DMB  cisbp__M4510--HLA-DOB  cisbp__M4510--HLA-DPA1
 [6] cisbp__M4510--HLA-DPB1 cisbp__M4510--HLA-DQB1 cisbp__M4510--HLA-DRA  cisbp__M4510--HLA-DRB1 cisbp__M4510--HLA-DRB5
[11] cisbp__M4546--CD74     cisbp__M4546--HLA-DMA  cisbp__M4546--HLA-DMB  cisbp__M4546--HLA-DOB  cisbp__M4546--HLA-DPA1
[16] cisbp__M4546--HLA-DPB1 cisbp__M4546--HLA-DQB1 cisbp__M4546--HLA-DRA  cisbp__M4546--HLA-DRB1 cisbp__M4546--HLA-DRB5
[21] cisbp__M4476--CD74     cisbp__M4476--HLA-DMA  cisbp__M4476--HLA-DMB  cisbp__M4476--HLA-DOB  cisbp__M4476--HLA-DPA1
[26] cisbp__M4476--HLA-DPB1 cisbp__M4476--HLA-DQB1 cisbp__M4476--HLA-DRA  cisbp__M4476--HLA-DRB1 cisbp__M4476--HLA-DRB5
[31] cisbp__M0830--BANK1    cisbp__M0830--EAF2     cisbp__M0830--HLA-DMA  cisbp__M0830--HLA-DMB  cisbp__M0830--HLA-DOB 
[36] cisbp__M0830--HLA-DPA1 cisbp__M0830--HLA-DPB1 cisbp__M0830--HLA-DQA1 cisbp__M0830--HLA-DQA2 cisbp__M0830--HLA-DQB1
+ ... omitted several edges

更改一些屬性

V(mugh)[c(1:5)]$color  = 'red'
V(mugh)[c(6:38)]$color ='green'



E(mugh)[grep("cisbp__M4510",as_ids(E(mugh)))]$color <- "red"
E(mugh)[grep("cisbp__M4546",as_ids(E(mugh)))]$color <- "blue"
E(mugh)[grep("cisbp__M0830",as_ids(E(mugh)))]$color <- "green"
E(mugh)[grep("cisbp__M4575",as_ids(E(mugh)))]$color <- "yellow"
E(mugh)[grep("cisbp__M4476",as_ids(E(mugh)))]$color <- "pink"

畫出各種布局的調控網(wǎng)絡

layouts <- grep("^layout_", ls("package:igraph"), value=TRUE)[-1] 
# Remove layouts that do not apply to our graph.
layouts <- layouts[!grepl("bipartite|merge|norm|sugiyama|tree", layouts)]

length(layouts)
par(mfrow=c(3,5), mar=c(1,1,1,1))
for (layout in layouts) {
    print(layout)
    l <- do.call(layout, list(mugh)) 
    plot(mugh, edge.arrow.mode=0, layout=l, main=layout) }

我們看到有些基因與所有的motif都沒有鏈接驳规，我們把它們去掉肴敛。

library(reshape2)
edges <- melt(incidenceMatrix)
edges <- edges[which(edges[,3]==1),1:2]
colnames(edges) <- c("from","to")


library(visNetwork)
motifs <- unique(as.character(edges[,1]))
genes <- unique(as.character(edges[,2]))
nodes <- data.frame(id=c(motifs, genes),   
                    label=c(motifs, genes),    
                    title=c(motifs, genes), # tooltip 
                    shape=c(rep("diamond", length(motifs)), rep("elypse", length(genes))),
                    color=c(rep("purple", length(motifs)), rep("skyblue", length(genes))))
visNetwork(nodes, edges) %>% visOptions(highlightNearest = TRUE, 
                                        nodesIdSelection = TRUE)

Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, Chen X, Taipale J, Hughes TR, Weirauch MT. The Human Transcription Factors. Cell. 2018;175:598–9.
Cell-type–specific || 單細胞文章新范式
 細胞身份何以在分裂中得以保持？
https://www.affixes.org/alpha/c/cis-.html
https://www.cnblogs.com/wangprince2017/p/9813731.html
https://zhuanlan.zhihu.com/p/156613539
https://cloud.tencent.com/developer/article/1376739
https://www.cell.com/cell/fulltext/S0092-8674(18)30106-5