使用序列進(jìn)行同源比對(duì)時(shí)需先根據(jù)gene.fa創(chuàng)建參考庫(kù)
有時(shí)提取的gene.fa中基因名有重復(fù)會(huì)報(bào)錯(cuò)
BLAST Database creation error: Error: Duplicate seq_ids are found(不識(shí)別基因大小寫)
解決辦法R :
library(Biostrings)
sequences <- readDNAStringSet("dre_gene.fa")
sequences@ranges@NAMES <- toupper(sequences@ranges@NAMES)###此處為都轉(zhuǎn)為大寫基因准脂,按需使用
unique_names=unique(sequences@ranges@NAMES )
提取unique信息
unique_sequences <- sequences[unique_names]
保存新文件
writeXStringSet(unique_sequences, "gene_unique.fa")
新文件即可用makeblastdb軟件構(gòu)建參考庫(kù)
linux命令:
makeblastdb -in gene_unique.fa -dbtype nucl -parse_seqids -out Anno_ref_dre