既然可以通過(guò)Trinity對(duì)所有的Reads進(jìn)行拼接后得到很多的轉(zhuǎn)錄本(Transcripts) , 因此很有必要對(duì)這些轉(zhuǎn)錄本進(jìn)行注釋。
注釋的工具有很多雾狈,我們可以通過(guò)Trinotate對(duì)拼接的轉(zhuǎn)錄本進(jìn)行注釋
首先我們安裝Trinotate
到下載頁(yè)下載最新的Trinotate
然后解壓后
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft$ wget https://github.com/Trinotate/Trinotate/archive/Trinotate-v3.1.1.zip
--2019-01-30 10:54:44-- https://github.com/Trinotate/Trinotate/archive/Trinotate-v3.1.1.zip
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://codeload.github.com/Trinotate/Trinotate/zip/Trinotate-v3.1.1 [following]
--2019-01-30 10:54:45-- https://codeload.github.com/Trinotate/Trinotate/zip/Trinotate-v3.1.1
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘Trinotate-v3.1.1.zip’
Trinotate-v3.1.1.zip [ <=> ] 28.59M 5.06MB/s in 7.5s
2019-01-30 10:54:53 (3.80 MB/s) - ‘Trinotate-v3.1.1.zip’ saved [29979458]
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft$ l
mafft_7.407-1_amd64.deb Trinotate-v3.1.1.zip
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft$ unzip Trinotate-v3.1.1.zip
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft$ cd Trinotate-Trinotate-v3.1.1/
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ l
admin/ Changelog.txt notes README.md resources/ sample_data/ Trinotate.github.io/ TrinotateWeb.conf/
auto/ LICENSE.txt PerlLib/ README.txt run_TrinotateWebserver.pl* Trinotate* TrinotateWeb/ util/
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ pwd
/home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ export TRINOTATE_HOME=/home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/
admin/ PerlLib/ run_TrinotateWebserver.pl Trinotate TrinotateWeb/ util/
auto/ resources/ sample_data/ Trinotate.github.io/ TrinotateWeb.conf/
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ echo 'export TRINOTATE_HOME=/home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1' >> ~/.bashrc
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ tail ~/.bashrc
fi
fi
export https_proxy='127.0.0.1:8118'
export http_proxy='127.0.0.1:8118'
export PATH=/usr/local/texlive/2018/bin/x86_64-linux:$PATH
export MANPATH=/usr/local/texlive/2018/texmf-dist/doc/man:$MANPATH
export INFOPATH=/usr/local/texlive/2018/texmf-dist/doc/info:$INFOPATH
export PATH=/home/yeyuntian/Biodata/trinitytest/trinityrnaseq-Trinity-v2.8.3:$PATH
export TRINITY_HOME=/home/yeyuntian/Biodata/trinitytest/trinityrnaseq-Trinity-v2.8.3
export TRINOTATE_HOME=/home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ source ~/.bashrc
根據(jù)Trinotate的操作說(shuō)明
需要安裝的軟件有
-
Trinity (用于生成拼接后的轉(zhuǎn)錄本fasta文件)
-
TransDecoder (用于預(yù)測(cè)轉(zhuǎn)錄本的蛋白編碼區(qū)域)
-
SQLite (用于整合數(shù)據(jù)庫(kù)數(shù)據(jù))
-
NCBI BLAST+ (用于比對(duì)Blast庫(kù))
-
HMMER/PFAM(用于通過(guò)HMMER工具注釋蛋白質(zhì)結(jié)構(gòu)域)
此外還推薦安裝的軟件:
- signalP v4(用于預(yù)測(cè)信號(hào)肽)
- tmhmm v2 (用于預(yù)測(cè)跨膜結(jié)構(gòu)域)
- RNAMMER (用于預(yù)測(cè)rRNA 轉(zhuǎn)錄本)
需要的數(shù)據(jù)有:
在安裝好了Trinotate后就可以開始運(yùn)行了寒跳。
首先是構(gòu)建后期所需要的數(shù)據(jù)庫(kù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
這個(gè)步驟會(huì)下載所需要的數(shù)據(jù)庫(kù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
-- Skipping CMD: wget "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz", checkpoint exists.
-- Skipping CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_swissprot_parser.pl uniprot_sprot.dat.gz Trinotate, checkpoint exists.
-- Skipping CMD: mv uniprot_sprot.dat.gz.pep uniprot_sprot.pep, checkpoint exists.
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --create
Can't locate DBI.pm in @INC (you may need to install the DBI module) (@INC contains: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/../../PerlLib /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/../../PerlLib/Sqlite_connect.pm line 15.
BEGIN failed--compilation aborted at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/../../PerlLib/Sqlite_connect.pm line 15.
Compilation failed in require at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl line 8.
BEGIN failed--compilation aborted at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl line 8.
Error, cmd: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --create died with ret 512 at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/../PerlLib/Pipeliner.pm line 102.
Pipeliner::run(Pipeliner=HASH(0x23cf860)) called at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/Build_Trinotate_Boilerplate_SQLite_db.pl line 119
但是我們?cè)谶@個(gè)地方遇到了麻煩彬坏,通過(guò)參看Google的幫助栓始。我們發(fā)現(xiàn)需要安裝一個(gè)perl 的模塊進(jìn)行補(bǔ)充血当。
perl -MCPAN -e shell
install DBD::SQLite
exit
具體運(yùn)行是這樣的,我們是初次安裝所以會(huì)出現(xiàn)其他的東西歹颓。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl -MCPAN -e shell
CPAN.pm requires configuration, but most of it can be done automatically.
If you answer 'no' below, you will enter an interactive dialog for each
configuration option instead.
Would you like to configure as much as possible automatically? [yes] yes
<install_help>
Warning: You do not have write permission for Perl library directories.
To install modules, you need to configure a local Perl library directory or
escalate your privileges. CPAN can help you by bootstrapping the local::lib
module or by configuring itself to use 'sudo' (if available). You may also
resolve this problem manually if you need to customize your setup.
What approach do you want? (Choose 'local::lib', 'sudo' or 'manual')
[local::lib]
Autoconfiguration complete.
Attempting to bootstrap local::lib...
Writing /home/yeyuntian/.cpan/CPAN/MyConfig.pm for bootstrap...
commit: wrote '/home/yeyuntian/.cpan/CPAN/MyConfig.pm'
Proxy must be specified as absolute URI; '127.0.0.1:8118' is not at /usr/share/perl/5.22/CPAN/FTP.pm line 351.
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl -MCPAN -e shell
Terminal does not support AddHistory.
cpan shell -- CPAN exploration and modules installation (v2.11)
Enter 'h' for help.
cpan[1]> install DBD::SQListe
Catching error: "Proxy must be specified as absolute URI; '127.0.0.1:8118' is not at /usr/share/perl/5.22/CPAN/FTP.pm line 351.\cJ" at /usr/share/perl/5.22/CPAN.pm line 391, <FIN> line 1.
CPAN::shell() called at -e line 1
Fetching with LWP:
http://www.cpan.org/authors/01mailrc.txt.gz
Reading '/home/yeyuntian/.cpan/sources/authors/01mailrc.txt.gz'
............................................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/02packages.details.txt.gz
Reading '/home/yeyuntian/.cpan/sources/modules/02packages.details.txt.gz'
Database was generated on Sat, 02 Feb 2019 03:55:12 GMT
.............
New CPAN.pm version (v2.22) available.
[Currently running version is v2.11]
You might want to try
install CPAN
reload cpan
to both upgrade CPAN.pm and run the new version without leaving
the current session.
...............................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/03modlist.data.gz
Reading '/home/yeyuntian/.cpan/sources/modules/03modlist.data.gz'
DONE
Writing /home/yeyuntian/.cpan/Metadata
cpan[2]> exit
Terminal does not support GetHistory.
Lockfile removed.
安裝好后就可以繼續(xù)了
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
-- Skipping CMD: wget "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz", checkpoint exists.
-- Skipping CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_swissprot_parser.pl uniprot_sprot.dat.gz Trinotate, checkpoint exists.
-- Skipping CMD: mv uniprot_sprot.dat.gz.pep uniprot_sprot.pep, checkpoint exists.
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --create
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/init_Trinotate_sqlite_db.pl --sqlite Trinotate.sqlite
-done creating database Trinotate.sqlite
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --uniprot_index Trinotate.UniprotIndex
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Trinotate.UniprotIndex UniprotIndex" | sqlite3 Trinotate.sqlite
sh: 5: sqlite3: not found
Error, cmd: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Trinotate.UniprotIndex UniprotIndex" | sqlite3 Trinotate.sqlite died with ret 32512 at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/../../PerlLib/Sqlite_connect.pm line 190.
Sqlite_connect::bulk_load_sqlite("Trinotate.sqlite", "UniprotIndex", "Trinotate.UniprotIndex") called at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl line 95
Error, cmd: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --uniprot_index Trinotate.UniprotIndex died with ret 32512 at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/../PerlLib/Pipeliner.pm line 102.
Pipeliner::run(Pipeliner=HASH(0x19d2860)) called at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/Build_Trinotate_Boilerplate_SQLite_db.pl line 119
又存在一個(gè)報(bào)錯(cuò)领跛,SourceForge上面的解決方案參考一下撤奸,認(rèn)為是sqlite3沒(méi)有安裝上吠昭。
因此我們采用apt-get安裝就好了
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ sudo apt-get install sqlite3
[sudo] password for yeyuntian:
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
sqlite3-doc
The following NEW packages will be installed:
sqlite3
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 515 kB of archives.
After this operation, 1,938 kB of additional disk space will be used.
Get:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu xenial/main amd64 sqlite3 amd64 3.11.0-1ubuntu1 [515 kB]
Fetched 515 kB in 8s (62.2 kB/s)
Selecting previously unselected package sqlite3.
(Reading database ... 282659 files and directories currently installed.)
Preparing to unpack .../sqlite3_3.11.0-1ubuntu1_amd64.deb ...
Unpacking sqlite3 (3.11.0-1ubuntu1) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up sqlite3 (3.11.0-1ubuntu1) ...
然后就繼續(xù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
-- Skipping CMD: wget "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz", checkpoint exists.
-- Skipping CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_swissprot_parser.pl uniprot_sprot.dat.gz Trinotate, checkpoint exists.
-- Skipping CMD: mv uniprot_sprot.dat.gz.pep uniprot_sprot.pep, checkpoint exists.
-- Skipping CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --create, checkpoint exists.
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --uniprot_index Trinotate.UniprotIndex
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Trinotate.UniprotIndex UniprotIndex" | sqlite3 Trinotate.sqlite
memory
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --taxonomy_index Trinotate.TaxonomyIndex
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Trinotate.TaxonomyIndex TaxonomyIndex" | sqlite3 Trinotate.sqlite
memory
* Running CMD: wget "http://eggnogdb.embl.de/download/latest/data/NOG/NOG.annotations.tsv.gz"
--2019-02-02 15:11:16-- http://eggnogdb.embl.de/download/latest/data/NOG/NOG.annotations.tsv.gz
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 1911409 (1.8M) [application/octet-stream]
Saving to: ‘NOG.annotations.tsv.gz’
NOG.annotations.tsv.gz 100%[==================================================================>] 1.82M 236KB/s in 7.9s
2019-02-02 15:11:26 (236 KB/s) - ‘NOG.annotations.tsv.gz’ saved [1911409/1911409]
* Running CMD: gunzip -c NOG.annotations.tsv.gz | /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/print.pl 1 5 > NOG.annotations.tsv.gz.bulk_load
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --eggnog NOG.annotations.tsv.gz.bulk_load
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import NOG.annotations.tsv.gz.bulk_load eggNOGIndex" | sqlite3 Trinotate.sqlite
memory
* Running CMD: wget "http://purl.obolibrary.org/obo/go/go-basic.obo"
--2019-02-02 15:11:27-- http://purl.obolibrary.org/obo/go/go-basic.obo
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: http://snapshot.geneontology.org/ontology/go-basic.obo [following]
--2019-02-02 15:11:29-- http://snapshot.geneontology.org/ontology/go-basic.obo
Reusing existing connection to 127.0.0.1:8118.
Proxy request sent, awaiting response... 200 OK
Length: 31348362 (30M) [text/obo]
Saving to: ‘go-basic.obo’
go-basic.obo 100%[==================================================================>] 29.90M 4.98MB/s in 5.1s
2019-02-02 15:11:37 (5.88 MB/s) - ‘go-basic.obo’ saved [31348362/31348362]
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/obo_to_tab.pl go-basic.obo > go-basic.obo.tab
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --go_obo_tab go-basic.obo.tab
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/obo_tab_to_sqlite_db.pl Trinotate.sqlite go-basic.obo.tab
[47000]
done.
* Running CMD: wget "ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz"
--2019-02-02 15:11:38-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
=> ‘Pfam-A.hmm.gz’
Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.192.4
Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.192.4|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/databases/Pfam/current_release ... done.
==> SIZE Pfam-A.hmm.gz ... 270995712
==> PASV ... done. ==> RETR Pfam-A.hmm.gz ... done.
Length: 270995712 (258M) (unauthoritative)
Pfam-A.hmm.gz 100%[==================================================================>] 258.44M 239KB/s in 33m 56s
2019-02-02 15:45:41 (130 KB/s) - ‘Pfam-A.hmm.gz’ saved [270995712]
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/PFAM_dat_parser.pl Pfam-A.hmm.gz
[17900] * Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --pfam Pfam-A.hmm.gz.pfam_sqlite_bulk_load
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Pfam-A.hmm.gz.pfam_sqlite_bulk_load PFAMreference" | sqlite3 Trinotate.sqlite
memory
* Running CMD: wget "http://www.geneontology.org/external2go/pfam2go"
--2019-02-02 15:45:49-- http://www.geneontology.org/external2go/pfam2go
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 700762 (684K) [text/plain]
Saving to: ‘pfam2go’
pfam2go 100%[==================================================================>] 684.34K 436KB/s in 1.6s
2019-02-02 15:45:52 (436 KB/s) - ‘pfam2go’ saved [700762/700762]
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/PFAMtoGoParser.pl pfam2go > pfam2go.tab
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --pfam2go pfam2go.tab
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import pfam2go.tab pfam2go" | sqlite3 Trinotate.sqlite
memory
最后到這個(gè)地方就完成了。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ l -alt
total 1417012
drwxrwxr-x 11 yeyuntian yeyuntian 4096 2月 2 15:45 ./
-rw-r--r-- 1 yeyuntian yeyuntian 366116864 2月 2 15:45 Trinotate.sqlite
-rw-rw-r-- 1 yeyuntian yeyuntian 270995712 2月 2 15:45 Pfam-A.hmm.gz
-rw-rw-r-- 1 yeyuntian yeyuntian 237871496 2月 1 19:11 uniprot_sprot.pep
-rw-rw-r-- 1 yeyuntian yeyuntian 575923689 2月 1 18:57 uniprot_sprot.dat.gz
這四個(gè)文件就是下載好的的數(shù)據(jù)庫(kù)文件胧瓜。
接下來(lái)就是進(jìn)行比對(duì)了
首先是blast比對(duì)矢棚,這個(gè)安裝我們?cè)谄渌恼轮刑峒斑^(guò)因此這里建議參考Ubuntu 下Blast+工具的安裝與環(huán)境配置
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ makeblastdb -in uniprot_sprot.pep -dbtype prot
Building a new DB, current time: 02/02/2019 16:13:53
New DB name: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/uniprot_sprot.pep
New DB title: uniprot_sprot.pep
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 559077 sequences in 12.2485 seconds.
接著我們準(zhǔn)備HMMER的數(shù)據(jù)庫(kù),軟件安裝方法參考我的另外一篇文章。
Ubuntu 環(huán)境下的 HMMER軟件安裝與基因家族成員挖掘
同樣是構(gòu)建數(shù)據(jù)庫(kù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ hmmpress Pfam-A.hmm
Working... done.
Pressed and indexed 17929 HMMs (17929 names and 17929 accessions).
Models pressed into binary file: Pfam-A.hmm.h3m
SSI index for binary model file: Pfam-A.hmm.h3i
Profiles (MSV part) pressed into: Pfam-A.hmm.h3f
Profiles (remainder) pressed into: Pfam-A.hmm.h3p
接下來(lái)府喳,我們開始往下做蒲肋。在轉(zhuǎn)錄組分析實(shí)戰(zhàn)第二節(jié):無(wú)參考基因轉(zhuǎn)錄組拼接中我們構(gòu)建得到了轉(zhuǎn)錄本的編碼蛋白的蛋白序列。我們?cè)谶@個(gè)地方會(huì)調(diào)取這個(gè)文件。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ l -alt
total 4463240
-rw-rw-r-- 1 yeyuntian yeyuntian 56068696 2月 2 20:08 Trinity.fasta.transdecoder.pep
序列比對(duì)
接下來(lái)的工作就是通過(guò)之前下載配置好的Blast+和HMMER進(jìn)行比對(duì)和預(yù)測(cè)(當(dāng)然兩者工具采用的算法會(huì)有差異)
這個(gè)過(guò)程會(huì)消耗大量的時(shí)間兜粘,因此建議采用服務(wù)器配合screen命令進(jìn)行托管運(yùn)行申窘。
因此在此就將命令放出來(lái),結(jié)果我這邊運(yùn)算了接近兩天才完成孔轴。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ blastx -query Trinity.fasta -db uniprot_sprot.pep -num_threads 8 -max_target_seqs 1 -outfmt 6 -evalue 1e-3 > blastx.outfmt6
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ blastp -query transdecoder.pep -db uniprot_sprot.pep -num_threads 8 -max_target_seqs 1 -outfmt 6 -evalue 1e-3 > blastp.outfmt6
這兩條命令中有一個(gè)參數(shù) -num_threads 是關(guān)于采用運(yùn)算CPU核心數(shù)的剃法,可以通過(guò)以下命令獲取當(dāng)前可以用的核心數(shù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ nproc
4
當(dāng)然我這個(gè)是筆記本電腦的情況,實(shí)際在服務(wù)器上的會(huì)更多路鹰。
以上兩條命令都是采用的Blast比對(duì)算法進(jìn)行的贷洲,還有一種是HMMER基于隱馬爾科夫鏈的算法進(jìn)行的。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ hmmscan --cpu 12 --domtblout TrinotatePFAM.out Pfam-A.hmm transdecoder.pep > pfam.log
因此經(jīng)過(guò)運(yùn)算后會(huì)得到三個(gè)結(jié)果文件晋柱,在Trinotate的使用指南上還有采用其他的軟件對(duì)蛋白信號(hào)肽序列和跨膜結(jié)構(gòu)域進(jìn)行了預(yù)測(cè)优构,我們這里沒(méi)有進(jìn)行演示。
得到的結(jié)果如下:
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ ll -alt
total 5636108
-rw-rw-r-- 1 yeyuntian yeyuntian 10296365 2月 6 14:37 blastx.outfmt6
drwxrwxr-x 11 yeyuntian yeyuntian 4096 2月 6 14:37 ./
-rw-rw-r-- 1 yeyuntian yeyuntian 7231505 2月 6 14:37 blastp.outfmt6
-rw-rw-r-- 1 yeyuntian yeyuntian 770576424 2月 6 14:35 pfam.log
-rw-rw-r-- 1 yeyuntian yeyuntian 168149360 2月 6 14:29 Trinit_TrinotatePFAM.out
drwxrwxr-x 4 yeyuntian yeyuntian 4096 2月 3 11:15 ../
那這樣完了過(guò)后趣斤,我們需要把這些結(jié)果合并起來(lái)俩块。
導(dǎo)入SQL數(shù)據(jù)庫(kù)中
我們之前獲得了很多的結(jié)果黎休,我們需要統(tǒng)一的管理
因此需要將之前得到的結(jié)果放到Trinotate SQLite數(shù)據(jù)庫(kù)中進(jìn)行合并浓领。
首先是獲取幫助
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate -h
usage: Trinotate <sqlite.db> <command> <input> [...]
<commands>:
* Initial import of transcriptome and protein data:
init --gene_trans_map <file> --transcript_fasta <file> --transdecoder_pep <file>
* Transdecoder protein search results:
LOAD_swissprot_blastp <file>
LOAD_pfam <file>
LOAD_tmhmm <file>
LOAD_signalp <file>
* Trinity transcript search results:
LOAD_swissprot_blastx <file>
LOAD_rnammer <file>
* Load custom blast results using any searchable database
LOAD_custom_blast --outfmt6 <file> --prog <blastp|blastx> --dbtype <database_name>
* report generation:
report [ -E (default: 1e-5) ] [--pfam_cutoff DNC|DGC|DTC|SNC|SGC|STC (default: DNC=domain noise cutoff)]
因此我們需要把以下的幾個(gè)結(jié)果一一進(jìn)行Load到SQLite db中联贩,在我們這個(gè)工作中的database就是Trinotate.sqlite
- 轉(zhuǎn)錄本和蛋白數(shù)據(jù)
- Blast的結(jié)果(這個(gè)包括蛋白比對(duì)和核酸比對(duì)的兩個(gè)結(jié)果)
- HMMER比對(duì)的Pfam結(jié)果
那么接下來(lái)就是采用以下命令進(jìn)行(這個(gè)要明白Trinotate就是一個(gè)把數(shù)據(jù)裝到數(shù)據(jù)庫(kù)的工具)
導(dǎo)入蛋白和轉(zhuǎn)錄本結(jié)果
注意有三個(gè)參數(shù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate \#開始運(yùn)行調(diào)運(yùn)的程序這個(gè)地方是用perl運(yùn)行
Trinotate.sqlite \#導(dǎo)入的數(shù)據(jù)庫(kù)
init \#表示這個(gè)工作是初始工作
--gene_trans_map Trinity.fasta.gene_trans_map \#導(dǎo)入的是轉(zhuǎn)錄本與基因的關(guān)系文件
--transcript_fasta Trinity.fasta \#導(dǎo)入轉(zhuǎn)錄本文件
--transdecoder_pep Trinity.fasta.transdecoder.pep #導(dǎo)入轉(zhuǎn)錄本編碼蛋白質(zhì)文件
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/util/trinotateSeqLoader/TrinotateSeqLoader.pl --sqlite Trinotate.sqlite --gene_trans_map Trinity.fasta.gene_trans_map --transcript_fasta Trinity.fasta --transdecoder_pep Trinity.fasta.transdecoder.pep --bulk_load
-parsing gene/trans map file.... done.
-loading Transcripts.
[220400]
done.
-loading ORFs.
[210000]
done.
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.Transcript.bulk_load Transcript" | sqlite3 Trinotate.sqlite
memory
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.ORF.bulk_load ORF" | sqlite3 Trinotate.sqlite
memory
Loading complete..
接下來(lái)導(dǎo)入Blast比對(duì)結(jié)果:
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate \#調(diào)用的工具
Trinotate.sqlite \#處理的數(shù)據(jù)庫(kù)
LOAD_swissprot_blastp blastp.outfmt6 #導(dǎo)入的是blastp的結(jié)果
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/util/trinotateSeqLoader/Trinotate_BLAST_loader.pl --sqlite Trinotate.sqlite --outfmt6 blastp.outfmt6 --prog blastp --dbtype Swissprot
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.blast_bulk_load.15830 BlastDbase" | sqlite3 Trinotate.sqlite
memory
BlastDbase loading complete..
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate Trinotate.sqlite LOAD_swissprot_blastx blastx.outfmt6
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/util/trinotateSeqLoader/Trinotate_BLAST_loader.pl --sqlite Trinotate.sqlite --outfmt6 blastx.outfmt6 --prog blastx --dbtype Swissprot
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.blast_bulk_load.16407 BlastDbase" | sqlite3 Trinotate.sqlite
memory
BlastDbase loading complete..
最后導(dǎo)入HMMER比對(duì)結(jié)果
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate Trinotate.sqlite LOAD_pfam Trinit_TrinotatePFAM.out
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/util/trinotateSeqLoader/Trinotate_PFAM_loader.pl --sqlite Trinotate.sqlite --pfam Trinit_TrinotatePFAM.out
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.pfam_bulk_load.16631 HMMERDbase" | sqlite3 Trinotate.sqlite
memory
Loading complete..
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ ll *.sq*
-rw-r--r-- 1 yeyuntian yeyuntian 926683136 2月 6 20:21 Trinotate.sqlite
這樣我們就將之前的結(jié)果導(dǎo)入到了我們 Trinotate.sqlite 這個(gè)文件中。
之后在其他的地方我們會(huì)使用到這個(gè)祸泪,而且這個(gè)數(shù)據(jù)庫(kù)也可以通過(guò)以下命令直接進(jìn)行調(diào)用:
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate report Trinotate.sqlite > database_report.xls