背景: 由于人和小鼠研究進(jìn)展差異抹竹,人類基因功能/注釋研究會更加深入,一些數(shù)據(jù)庫只有人的注釋稍算。又或者研究中通常采用小鼠模型進(jìn)行驗(yàn)證虱痕,這種情況下就會涉及一些基因 name / id 轉(zhuǎn)換。
下面就介紹下一般基因轉(zhuǎn)換方式钝尸,概括如下:
特殊基因 id/name 轉(zhuǎn)換: R包(biomaRt)
全基因組 id/name 轉(zhuǎn)換: 從Ensembl中直接下載對應(yīng)關(guān)系文件并進(jìn)行轉(zhuǎn)換
另一個(gè)有意思的R包(模式物種基因各大數(shù)據(jù)庫注釋查詢):? AnnotationDbi?
同源基因數(shù)據(jù)庫列表:?List of orthology databases
1. 基于R包(biomaRt)
安裝biomaRt包:
library("BiocManager")
BiocManager::install("biomaRt")
library("biomaRt")
listMarts()
##? ? ? ? ? ? ? biomart? ? ? ? ? ? ? ? version
##1 ENSEMBL_MART_ENSEMBL? ? ? Ensembl Genes 106
##2? ENSEMBL_MART_MOUSE? ? ? Mouse strains 106
##3? ? ENSEMBL_MART_SNP? Ensembl Variation 106
##4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 106
小鼠基因轉(zhuǎn)人類基因:
library("biomaRt")
human = useEnsembl(biomart="ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useEnsembl(biomart="ensembl", dataset = "mmusculus_gene_ensembl")
# Basic function to convert mouse to human gene names
convertMouseGeneList <- function(x){
? genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = x , mart = mouse, attributesL = c("hgnc_symbol"), martL = human, uniqueRows=T)
? humanx <- unique(genesV2[, 2])
? # Print the first 6 genes found to the screen
? return(humanx)
}
musGenes <- c("Hmmr", "Tlx3", "Cpeb4")
convertMouseGeneList(musGenes)
## 測試
musGenes <- c("Hmmr", "Tlx3", "Cpeb4")
convertMouseGeneList(musGenes)
## [1] "HMMR" "CPEB4" "TLX3"
#將代轉(zhuǎn)換基因放在文件中括享,并讀取
mmu_genes =? read.table("Gene.mmu",header = TRUE,sep= "\t")
head(mmu_genes$Gene)
## [1] "Xkr4"? ? "Gm1992"? "Gm19938" "Rp1"? ? "Sox17"? "Gm37587"
報(bào)錯(cuò):
##Error: biomaRt has encountered an unexpected server error.
##Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)
人類基因轉(zhuǎn)小鼠基因:
hsa = read.table("hsa.raw",header = TRUE,sep= "\t")
head(hsa)
##? ? Gene
##1? ? Xkr4
##2? Gm1992
convertHumanGeneList <- function(x){
? genesV2 = getLDS(attributes = c("hgnc_symbol"), filters = "hgnc_symbol", values = x , mart = human,? ? ? ? attributesL = c("mgi_symbol"), martL = mouse, uniqueRows=T)
? humanx <- unique(genesV2[, 2])
? # Print the first 6 genes found to the screen
? return(humanx)
}
humGenes <-hsa$Gene
convertHumanGeneList(humGenes)
## Error: biomaRt has encountered an unexpected server error.
##Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)
經(jīng)過上述嘗試發(fā)現(xiàn),輸入部分基因 name list 轉(zhuǎn)換可以很好的完成珍促;拿全部基因組的gene name做轉(zhuǎn)換還是會出現(xiàn)問題铃辖,具體討論解決方案可見: 鏈接。
2.?從Ensembl中直接下載對應(yīng)關(guān)系文件并進(jìn)行轉(zhuǎn)換
Step1:Enaembl 官網(wǎng)->BioMart; 選擇對應(yīng)基因組 : 鏈接
Step2: 屬性中選擇“Homologues”: Gene stable ID, Gene name ;
Step3:選擇對應(yīng)orthologs的物種(根據(jù)首字母)
Step4: 下載: Result -> Go
Step5: 查看下載結(jié)果猪叙,寫腳本自己轉(zhuǎn)換吧娇斩;
? ? 轉(zhuǎn)換結(jié)果:小鼠原始gene 數(shù)目:24784
? ? ? ? ? ? ? ? ? ? 轉(zhuǎn)換后gene數(shù)目:16412
3.??其他
另外,發(fā)現(xiàn)了一個(gè)比較有意思的R包穴翩,對于探索基因功能注釋以及富集分析會有幫助:AnnotationDbi 犬第, org.Hs.eg.db;
安裝:
Library(BiocManager)
BiocManager::install("Orthology.eg.db")
keytypes(org.Hs.eg.db)? ? ##查看基因注釋數(shù)據(jù)庫
## [1] "ACCNUM"? ? ? "ALIAS"? ? ? ? "ENSEMBL"? ? ? "ENSEMBLPROT"? "ENSEMBLTRANS"
## [6] "ENTREZID"? ? "ENZYME"? ? ? "EVIDENCE"? ? "EVIDENCEALL"? "GENENAME"? ?
## [11] "GO"? ? ? ? ? "GOALL"? ? ? ? "IPI"? ? ? ? ? "MAP"? ? ? ? ? "OMIM"? ? ? ?
## [16] "ONTOLOGY"? ? "ONTOLOGYALL"? "PATH"? ? ? ? "PFAM"? ? ? ? "PMID"? ? ? ?
## [21] "PROSITE"? ? ? "REFSEQ"? ? ? "SYMBOL"? ? ? "UCSCKG"? ? ? "UNIGENE"? ?
## [26] "UNIPROT"? ?
columns(org.Hs.eg.db)? #查看通用數(shù)據(jù)庫中id注釋
## [1] "ACCNUM"? ? ? "ALIAS"? ? ? ? "ENSEMBL"? ? ? "ENSEMBLPROT"? "ENSEMBLTRANS"
## [6] "ENTREZID"? ? "ENZYME"? ? ? "EVIDENCE"? ? "EVIDENCEALL"? "GENENAME"? ?
## [11] "GO"? ? ? ? ? "GOALL"? ? ? ? "IPI"? ? ? ? ? "MAP"? ? ? ? ? "OMIM"? ? ? ?
## [16] "ONTOLOGY"? ? "ONTOLOGYALL"? "PATH"? ? ? ? "PFAM"? ? ? ? "PMID"? ? ? ?
## [21] "PROSITE"? ? ? "REFSEQ"? ? ? "SYMBOL"? ? ? "UCSCKG"? ? ? "UNIGENE"? ?
## [26] "UNIPROT"
實(shí)施方案具體搜索哈~