轉(zhuǎn)錄組分析實(shí)戰(zhàn)第七節(jié):采用Trinotate對(duì)拼接結(jié)果進(jìn)行注釋

既然可以通過(guò)Trinity對(duì)所有的Reads進(jìn)行拼接后得到很多的轉(zhuǎn)錄本(Transcripts) , 因此很有必要對(duì)這些轉(zhuǎn)錄本進(jìn)行注釋。

注釋的工具有很多雾狈,我們可以通過(guò)Trinotate對(duì)拼接的轉(zhuǎn)錄本進(jìn)行注釋

首先我們安裝Trinotate

到下載頁(yè)下載最新的Trinotate
然后解壓后
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft$ wget https://github.com/Trinotate/Trinotate/archive/Trinotate-v3.1.1.zip
--2019-01-30 10:54:44--  https://github.com/Trinotate/Trinotate/archive/Trinotate-v3.1.1.zip
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://codeload.github.com/Trinotate/Trinotate/zip/Trinotate-v3.1.1 [following]
--2019-01-30 10:54:45--  https://codeload.github.com/Trinotate/Trinotate/zip/Trinotate-v3.1.1
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘Trinotate-v3.1.1.zip’

Trinotate-v3.1.1.zip                                    [                               <=>                                                                               ]  28.59M  5.06MB/s    in 7.5s    

2019-01-30 10:54:53 (3.80 MB/s) - ‘Trinotate-v3.1.1.zip’ saved [29979458]

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft$ l
mafft_7.407-1_amd64.deb  Trinotate-v3.1.1.zip
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft$ unzip Trinotate-v3.1.1.zip 
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft$ cd Trinotate-Trinotate-v3.1.1/
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ l
admin/  Changelog.txt  notes     README.md   resources/                  sample_data/  Trinotate.github.io/  TrinotateWeb.conf/
auto/   LICENSE.txt    PerlLib/  README.txt  run_TrinotateWebserver.pl*  Trinotate*    TrinotateWeb/         util/
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ pwd
/home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ export TRINOTATE_HOME=/home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/
admin/                     PerlLib/                   run_TrinotateWebserver.pl  Trinotate                  TrinotateWeb/              util/                      
auto/                      resources/                 sample_data/               Trinotate.github.io/       TrinotateWeb.conf/   
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ echo 'export TRINOTATE_HOME=/home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1' >> ~/.bashrc
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ tail ~/.bashrc 
 fi
fi
export https_proxy='127.0.0.1:8118'
export http_proxy='127.0.0.1:8118'
export PATH=/usr/local/texlive/2018/bin/x86_64-linux:$PATH
export MANPATH=/usr/local/texlive/2018/texmf-dist/doc/man:$MANPATH
export INFOPATH=/usr/local/texlive/2018/texmf-dist/doc/info:$INFOPATH
export PATH=/home/yeyuntian/Biodata/trinitytest/trinityrnaseq-Trinity-v2.8.3:$PATH
export TRINITY_HOME=/home/yeyuntian/Biodata/trinitytest/trinityrnaseq-Trinity-v2.8.3
export TRINOTATE_HOME=/home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ source ~/.bashrc 
根據(jù)Trinotate的操作說(shuō)明
需要安裝的軟件有
  1. Trinity (用于生成拼接后的轉(zhuǎn)錄本fasta文件)
  2. TransDecoder (用于預(yù)測(cè)轉(zhuǎn)錄本的蛋白編碼區(qū)域)
  3. SQLite (用于整合數(shù)據(jù)庫(kù)數(shù)據(jù))
  4. NCBI BLAST+ (用于比對(duì)Blast庫(kù))
  5. HMMER/PFAM(用于通過(guò)HMMER工具注釋蛋白質(zhì)結(jié)構(gòu)域)
此外還推薦安裝的軟件:
  1. signalP v4(用于預(yù)測(cè)信號(hào)肽)
  2. tmhmm v2 (用于預(yù)測(cè)跨膜結(jié)構(gòu)域)
  3. RNAMMER (用于預(yù)測(cè)rRNA 轉(zhuǎn)錄本)
需要的數(shù)據(jù)有:
在安裝好了Trinotate后就可以開始運(yùn)行了寒跳。
首先是構(gòu)建后期所需要的數(shù)據(jù)庫(kù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
這個(gè)步驟會(huì)下載所需要的數(shù)據(jù)庫(kù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
-- Skipping CMD: wget "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz", checkpoint exists.
-- Skipping CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_swissprot_parser.pl uniprot_sprot.dat.gz Trinotate, checkpoint exists.
-- Skipping CMD: mv uniprot_sprot.dat.gz.pep uniprot_sprot.pep, checkpoint exists.
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --create
Can't locate DBI.pm in @INC (you may need to install the DBI module) (@INC contains: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/../../PerlLib /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/../../PerlLib/Sqlite_connect.pm line 15.
BEGIN failed--compilation aborted at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/../../PerlLib/Sqlite_connect.pm line 15.
Compilation failed in require at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl line 8.
BEGIN failed--compilation aborted at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl line 8.
Error, cmd: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --create died with ret 512 at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/../PerlLib/Pipeliner.pm line 102.
    Pipeliner::run(Pipeliner=HASH(0x23cf860)) called at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/Build_Trinotate_Boilerplate_SQLite_db.pl line 119
但是我們?cè)谶@個(gè)地方遇到了麻煩彬坏,通過(guò)參看Google的幫助栓始。我們發(fā)現(xiàn)需要安裝一個(gè)perl 的模塊進(jìn)行補(bǔ)充血当。
perl -MCPAN -e shell
    install DBD::SQLite
    exit
具體運(yùn)行是這樣的,我們是初次安裝所以會(huì)出現(xiàn)其他的東西歹颓。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl -MCPAN -e shell

CPAN.pm requires configuration, but most of it can be done automatically.
If you answer 'no' below, you will enter an interactive dialog for each
configuration option instead.

Would you like to configure as much as possible automatically? [yes] yes

 <install_help>

Warning: You do not have write permission for Perl library directories.

To install modules, you need to configure a local Perl library directory or
escalate your privileges.  CPAN can help you by bootstrapping the local::lib
module or by configuring itself to use 'sudo' (if available).  You may also
resolve this problem manually if you need to customize your setup.

What approach do you want?  (Choose 'local::lib', 'sudo' or 'manual')
 [local::lib] 


Autoconfiguration complete.

Attempting to bootstrap local::lib...

Writing /home/yeyuntian/.cpan/CPAN/MyConfig.pm for bootstrap...
commit: wrote '/home/yeyuntian/.cpan/CPAN/MyConfig.pm'
Proxy must be specified as absolute URI; '127.0.0.1:8118' is not at /usr/share/perl/5.22/CPAN/FTP.pm line 351.
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl -MCPAN -e shell
Terminal does not support AddHistory.

cpan shell -- CPAN exploration and modules installation (v2.11)
Enter 'h' for help.

cpan[1]> install DBD::SQListe   
Catching error: "Proxy must be specified as absolute URI; '127.0.0.1:8118' is not at /usr/share/perl/5.22/CPAN/FTP.pm line 351.\cJ" at /usr/share/perl/5.22/CPAN.pm line 391, <FIN> line 1.
    CPAN::shell() called at -e line 1
Fetching with LWP:
http://www.cpan.org/authors/01mailrc.txt.gz
Reading '/home/yeyuntian/.cpan/sources/authors/01mailrc.txt.gz'
............................................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/02packages.details.txt.gz
Reading '/home/yeyuntian/.cpan/sources/modules/02packages.details.txt.gz'
  Database was generated on Sat, 02 Feb 2019 03:55:12 GMT
.............
  New CPAN.pm version (v2.22) available.
  [Currently running version is v2.11]
  You might want to try
    install CPAN
    reload cpan
  to both upgrade CPAN.pm and run the new version without leaving
  the current session.


...............................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/03modlist.data.gz
Reading '/home/yeyuntian/.cpan/sources/modules/03modlist.data.gz'
DONE
Writing /home/yeyuntian/.cpan/Metadata

cpan[2]> exit
Terminal does not support GetHistory.
Lockfile removed.
安裝好后就可以繼續(xù)了
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
-- Skipping CMD: wget "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz", checkpoint exists.
-- Skipping CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_swissprot_parser.pl uniprot_sprot.dat.gz Trinotate, checkpoint exists.
-- Skipping CMD: mv uniprot_sprot.dat.gz.pep uniprot_sprot.pep, checkpoint exists.
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --create
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/init_Trinotate_sqlite_db.pl --sqlite Trinotate.sqlite
-done creating database Trinotate.sqlite

* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --uniprot_index Trinotate.UniprotIndex
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Trinotate.UniprotIndex UniprotIndex" | sqlite3 Trinotate.sqlite
sh: 5: sqlite3: not found
Error, cmd: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Trinotate.UniprotIndex UniprotIndex" | sqlite3 Trinotate.sqlite died with ret 32512 at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/../../PerlLib/Sqlite_connect.pm line 190.
    Sqlite_connect::bulk_load_sqlite("Trinotate.sqlite", "UniprotIndex", "Trinotate.UniprotIndex") called at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl line 95
Error, cmd: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --uniprot_index Trinotate.UniprotIndex died with ret 32512 at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/../PerlLib/Pipeliner.pm line 102.
    Pipeliner::run(Pipeliner=HASH(0x19d2860)) called at /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/Build_Trinotate_Boilerplate_SQLite_db.pl line 119
又存在一個(gè)報(bào)錯(cuò)领跛,SourceForge上面的解決方案參考一下撤奸,認(rèn)為是sqlite3沒(méi)有安裝上吠昭。
因此我們采用apt-get安裝就好了
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ sudo apt-get install sqlite3
[sudo] password for yeyuntian: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  sqlite3-doc
The following NEW packages will be installed:
  sqlite3
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 515 kB of archives.
After this operation, 1,938 kB of additional disk space will be used.
Get:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu xenial/main amd64 sqlite3 amd64 3.11.0-1ubuntu1 [515 kB]
Fetched 515 kB in 8s (62.2 kB/s) 
Selecting previously unselected package sqlite3.
(Reading database ... 282659 files and directories currently installed.)
Preparing to unpack .../sqlite3_3.11.0-1ubuntu1_amd64.deb ...
Unpacking sqlite3 (3.11.0-1ubuntu1) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up sqlite3 (3.11.0-1ubuntu1) ...
然后就繼續(xù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
-- Skipping CMD: wget "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz", checkpoint exists.
-- Skipping CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_swissprot_parser.pl uniprot_sprot.dat.gz Trinotate, checkpoint exists.
-- Skipping CMD: mv uniprot_sprot.dat.gz.pep uniprot_sprot.pep, checkpoint exists.
-- Skipping CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --create, checkpoint exists.
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --uniprot_index Trinotate.UniprotIndex
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Trinotate.UniprotIndex UniprotIndex" | sqlite3 Trinotate.sqlite
memory
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --taxonomy_index Trinotate.TaxonomyIndex
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Trinotate.TaxonomyIndex TaxonomyIndex" | sqlite3 Trinotate.sqlite
memory
* Running CMD: wget "http://eggnogdb.embl.de/download/latest/data/NOG/NOG.annotations.tsv.gz"
--2019-02-02 15:11:16--  http://eggnogdb.embl.de/download/latest/data/NOG/NOG.annotations.tsv.gz
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 1911409 (1.8M) [application/octet-stream]
Saving to: ‘NOG.annotations.tsv.gz’

NOG.annotations.tsv.gz              100%[==================================================================>]   1.82M   236KB/s    in 7.9s    

2019-02-02 15:11:26 (236 KB/s) - ‘NOG.annotations.tsv.gz’ saved [1911409/1911409]

* Running CMD: gunzip -c NOG.annotations.tsv.gz | /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/print.pl 1 5 > NOG.annotations.tsv.gz.bulk_load
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --eggnog NOG.annotations.tsv.gz.bulk_load
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import NOG.annotations.tsv.gz.bulk_load eggNOGIndex" | sqlite3 Trinotate.sqlite
memory
* Running CMD: wget "http://purl.obolibrary.org/obo/go/go-basic.obo"
--2019-02-02 15:11:27--  http://purl.obolibrary.org/obo/go/go-basic.obo
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: http://snapshot.geneontology.org/ontology/go-basic.obo [following]
--2019-02-02 15:11:29--  http://snapshot.geneontology.org/ontology/go-basic.obo
Reusing existing connection to 127.0.0.1:8118.
Proxy request sent, awaiting response... 200 OK
Length: 31348362 (30M) [text/obo]
Saving to: ‘go-basic.obo’

go-basic.obo                        100%[==================================================================>]  29.90M  4.98MB/s    in 5.1s    

2019-02-02 15:11:37 (5.88 MB/s) - ‘go-basic.obo’ saved [31348362/31348362]

* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/obo_to_tab.pl go-basic.obo > go-basic.obo.tab
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --go_obo_tab go-basic.obo.tab
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/obo_tab_to_sqlite_db.pl Trinotate.sqlite go-basic.obo.tab
[47000]   

done.

* Running CMD: wget "ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz"
--2019-02-02 15:11:38--  ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
           => ‘Pfam-A.hmm.gz’
Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.192.4
Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.192.4|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/databases/Pfam/current_release ... done.
==> SIZE Pfam-A.hmm.gz ... 270995712
==> PASV ... done.    ==> RETR Pfam-A.hmm.gz ... done.
Length: 270995712 (258M) (unauthoritative)

Pfam-A.hmm.gz                       100%[==================================================================>] 258.44M   239KB/s    in 33m 56s 

2019-02-02 15:45:41 (130 KB/s) - ‘Pfam-A.hmm.gz’ saved [270995712]

* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/PFAM_dat_parser.pl Pfam-A.hmm.gz
[17900]  * Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --pfam Pfam-A.hmm.gz.pfam_sqlite_bulk_load
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import Pfam-A.hmm.gz.pfam_sqlite_bulk_load PFAMreference" | sqlite3 Trinotate.sqlite
memory
* Running CMD: wget "http://www.geneontology.org/external2go/pfam2go" 
--2019-02-02 15:45:49--  http://www.geneontology.org/external2go/pfam2go
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 700762 (684K) [text/plain]
Saving to: ‘pfam2go’

pfam2go                             100%[==================================================================>] 684.34K   436KB/s    in 1.6s    

2019-02-02 15:45:52 (436 KB/s) - ‘pfam2go’ saved [700762/700762]

* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/PFAMtoGoParser.pl pfam2go > pfam2go.tab
* Running CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinotate.sqlite --pfam2go pfam2go.tab
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import pfam2go.tab pfam2go" | sqlite3 Trinotate.sqlite
memory
最后到這個(gè)地方就完成了。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ l -alt
total 1417012
drwxrwxr-x 11 yeyuntian yeyuntian      4096 2月   2 15:45 ./
-rw-r--r--  1 yeyuntian yeyuntian 366116864 2月   2 15:45 Trinotate.sqlite
-rw-rw-r--  1 yeyuntian yeyuntian 270995712 2月   2 15:45 Pfam-A.hmm.gz
-rw-rw-r--  1 yeyuntian yeyuntian 237871496 2月   1 19:11 uniprot_sprot.pep
-rw-rw-r--  1 yeyuntian yeyuntian 575923689 2月   1 18:57 uniprot_sprot.dat.gz
這四個(gè)文件就是下載好的的數(shù)據(jù)庫(kù)文件胧瓜。
接下來(lái)就是進(jìn)行比對(duì)了
首先是blast比對(duì)矢棚,這個(gè)安裝我們?cè)谄渌恼轮刑峒斑^(guò)因此這里建議參考Ubuntu 下Blast+工具的安裝與環(huán)境配置
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ makeblastdb -in uniprot_sprot.pep -dbtype prot 


Building a new DB, current time: 02/02/2019 16:13:53
New DB name:   /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/uniprot_sprot.pep
New DB title:  uniprot_sprot.pep
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 559077 sequences in 12.2485 seconds.
接著我們準(zhǔn)備HMMER的數(shù)據(jù)庫(kù),軟件安裝方法參考我的另外一篇文章。

Ubuntu 環(huán)境下的 HMMER軟件安裝與基因家族成員挖掘

同樣是構(gòu)建數(shù)據(jù)庫(kù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ hmmpress Pfam-A.hmm 
Working...    done.
Pressed and indexed 17929 HMMs (17929 names and 17929 accessions).
Models pressed into binary file:   Pfam-A.hmm.h3m
SSI index for binary model file:   Pfam-A.hmm.h3i
Profiles (MSV part) pressed into:  Pfam-A.hmm.h3f
Profiles (remainder) pressed into: Pfam-A.hmm.h3p
接下來(lái)府喳,我們開始往下做蒲肋。在轉(zhuǎn)錄組分析實(shí)戰(zhàn)第二節(jié):無(wú)參考基因轉(zhuǎn)錄組拼接中我們構(gòu)建得到了轉(zhuǎn)錄本的編碼蛋白的蛋白序列。我們?cè)谶@個(gè)地方會(huì)調(diào)取這個(gè)文件。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ l -alt
total 4463240
-rw-rw-r--  1 yeyuntian yeyuntian   56068696 2月   2 20:08 Trinity.fasta.transdecoder.pep

序列比對(duì)

接下來(lái)的工作就是通過(guò)之前下載配置好的Blast+和HMMER進(jìn)行比對(duì)和預(yù)測(cè)(當(dāng)然兩者工具采用的算法會(huì)有差異)
這個(gè)過(guò)程會(huì)消耗大量的時(shí)間兜粘,因此建議采用服務(wù)器配合screen命令進(jìn)行托管運(yùn)行申窘。
因此在此就將命令放出來(lái),結(jié)果我這邊運(yùn)算了接近兩天才完成孔轴。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ blastx -query Trinity.fasta -db uniprot_sprot.pep -num_threads 8 -max_target_seqs 1 -outfmt 6 -evalue 1e-3 > blastx.outfmt6
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ blastp -query transdecoder.pep -db uniprot_sprot.pep -num_threads 8 -max_target_seqs 1 -outfmt 6 -evalue 1e-3 > blastp.outfmt6
這兩條命令中有一個(gè)參數(shù) -num_threads 是關(guān)于采用運(yùn)算CPU核心數(shù)的剃法,可以通過(guò)以下命令獲取當(dāng)前可以用的核心數(shù)
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ nproc
4
當(dāng)然我這個(gè)是筆記本電腦的情況,實(shí)際在服務(wù)器上的會(huì)更多路鹰。
以上兩條命令都是采用的Blast比對(duì)算法進(jìn)行的贷洲,還有一種是HMMER基于隱馬爾科夫鏈的算法進(jìn)行的。
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ hmmscan --cpu 12 --domtblout TrinotatePFAM.out Pfam-A.hmm transdecoder.pep > pfam.log
因此經(jīng)過(guò)運(yùn)算后會(huì)得到三個(gè)結(jié)果文件晋柱,在Trinotate的使用指南上還有采用其他的軟件對(duì)蛋白信號(hào)肽序列和跨膜結(jié)構(gòu)域進(jìn)行了預(yù)測(cè)优构,我們這里沒(méi)有進(jìn)行演示。
得到的結(jié)果如下:
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ ll -alt 
total 5636108
-rw-rw-r--  1 yeyuntian yeyuntian   10296365 2月   6 14:37 blastx.outfmt6
drwxrwxr-x 11 yeyuntian yeyuntian       4096 2月   6 14:37 ./
-rw-rw-r--  1 yeyuntian yeyuntian    7231505 2月   6 14:37 blastp.outfmt6
-rw-rw-r--  1 yeyuntian yeyuntian  770576424 2月   6 14:35 pfam.log
-rw-rw-r--  1 yeyuntian yeyuntian  168149360 2月   6 14:29 Trinit_TrinotatePFAM.out
drwxrwxr-x  4 yeyuntian yeyuntian       4096 2月   3 11:15 ../
那這樣完了過(guò)后趣斤,我們需要把這些結(jié)果合并起來(lái)俩块。

導(dǎo)入SQL數(shù)據(jù)庫(kù)中

我們之前獲得了很多的結(jié)果黎休,我們需要統(tǒng)一的管理
因此需要將之前得到的結(jié)果放到Trinotate SQLite數(shù)據(jù)庫(kù)中進(jìn)行合并浓领。
首先是獲取幫助
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate -h


   usage: Trinotate <sqlite.db> <command> <input> [...]


     <commands>: 

         * Initial import of transcriptome and protein data:

             init --gene_trans_map <file> --transcript_fasta <file> --transdecoder_pep <file>

         * Transdecoder protein search results:

             LOAD_swissprot_blastp <file>
             LOAD_pfam <file>
             LOAD_tmhmm <file>
             LOAD_signalp <file>

          * Trinity transcript search results:

             LOAD_swissprot_blastx <file>
             LOAD_rnammer <file>
             

          * Load custom blast results using any searchable database


             LOAD_custom_blast --outfmt6 <file> --prog <blastp|blastx> --dbtype <database_name>


          * report generation:

             report [ -E (default: 1e-5) ] [--pfam_cutoff DNC|DGC|DTC|SNC|SGC|STC (default: DNC=domain noise cutoff)]

因此我們需要把以下的幾個(gè)結(jié)果一一進(jìn)行Load到SQLite db中联贩,在我們這個(gè)工作中的database就是Trinotate.sqlite
  1. 轉(zhuǎn)錄本和蛋白數(shù)據(jù)
  2. Blast的結(jié)果(這個(gè)包括蛋白比對(duì)和核酸比對(duì)的兩個(gè)結(jié)果)
  3. HMMER比對(duì)的Pfam結(jié)果
    那么接下來(lái)就是采用以下命令進(jìn)行(這個(gè)要明白Trinotate就是一個(gè)把數(shù)據(jù)裝到數(shù)據(jù)庫(kù)的工具)
導(dǎo)入蛋白和轉(zhuǎn)錄本結(jié)果

注意有三個(gè)參數(shù)

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate \#開始運(yùn)行調(diào)運(yùn)的程序這個(gè)地方是用perl運(yùn)行 
Trinotate.sqlite \#導(dǎo)入的數(shù)據(jù)庫(kù)
init \#表示這個(gè)工作是初始工作
--gene_trans_map Trinity.fasta.gene_trans_map \#導(dǎo)入的是轉(zhuǎn)錄本與基因的關(guān)系文件
--transcript_fasta Trinity.fasta \#導(dǎo)入轉(zhuǎn)錄本文件
--transdecoder_pep Trinity.fasta.transdecoder.pep #導(dǎo)入轉(zhuǎn)錄本編碼蛋白質(zhì)文件 
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/util/trinotateSeqLoader/TrinotateSeqLoader.pl --sqlite Trinotate.sqlite --gene_trans_map Trinity.fasta.gene_trans_map --transcript_fasta Trinity.fasta --transdecoder_pep Trinity.fasta.transdecoder.pep --bulk_load
-parsing gene/trans map file.... done.
-loading Transcripts.
[220400]   
done.
-loading ORFs.
[210000]   
done.

CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.Transcript.bulk_load Transcript" | sqlite3 Trinotate.sqlite
memory
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.ORF.bulk_load ORF" | sqlite3 Trinotate.sqlite
memory


Loading complete..
接下來(lái)導(dǎo)入Blast比對(duì)結(jié)果:
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate \#調(diào)用的工具
Trinotate.sqlite \#處理的數(shù)據(jù)庫(kù)
LOAD_swissprot_blastp blastp.outfmt6 #導(dǎo)入的是blastp的結(jié)果
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/util/trinotateSeqLoader/Trinotate_BLAST_loader.pl --sqlite Trinotate.sqlite --outfmt6 blastp.outfmt6 --prog blastp --dbtype Swissprot
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.blast_bulk_load.15830 BlastDbase" | sqlite3 Trinotate.sqlite
memory

BlastDbase loading complete..
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate Trinotate.sqlite LOAD_swissprot_blastx blastx.outfmt6 
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/util/trinotateSeqLoader/Trinotate_BLAST_loader.pl --sqlite Trinotate.sqlite --outfmt6 blastx.outfmt6 --prog blastx --dbtype Swissprot
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.blast_bulk_load.16407 BlastDbase" | sqlite3 Trinotate.sqlite
memory


BlastDbase loading complete..
最后導(dǎo)入HMMER比對(duì)結(jié)果
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate Trinotate.sqlite LOAD_pfam Trinit_TrinotatePFAM.out 
CMD: /home/yeyuntian/Biosoft/Trinotate-Trinotate-v3.1.1/util/trinotateSeqLoader/Trinotate_PFAM_loader.pl --sqlite Trinotate.sqlite --pfam Trinit_TrinotatePFAM.out
CMD: echo "pragma journal_mode=memory;
pragma synchronous=0;
pragma cache_size=4000000;
.mode tabs
.import tmp.pfam_bulk_load.16631 HMMERDbase" | sqlite3 Trinotate.sqlite
memory


Loading complete..
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ ll *.sq*
-rw-r--r-- 1 yeyuntian yeyuntian 926683136 2月   6 20:21 Trinotate.sqlite
這樣我們就將之前的結(jié)果導(dǎo)入到了我們 Trinotate.sqlite 這個(gè)文件中。
之后在其他的地方我們會(huì)使用到這個(gè)祸泪,而且這個(gè)數(shù)據(jù)庫(kù)也可以通過(guò)以下命令直接進(jìn)行調(diào)用:
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biosoft/Trinotate-Trinotate-v3.1.1$ perl Trinotate report Trinotate.sqlite > database_report.xls 

做個(gè)小結(jié):

1. 我們通過(guò)Trinotate將Blast和HMMER預(yù)測(cè)的結(jié)果進(jìn)行了整合没隘,因此需要注意的是:Trinotate做個(gè)perl腳本在這里的作用就是作為數(shù)據(jù)庫(kù)導(dǎo)入和查看的工具右蒲,而真正比較耗時(shí)間的是Blast和HMMER的比對(duì)瑰妄。

2. 此外我們可以看到在大數(shù)據(jù)整合過(guò)程中SQL數(shù)據(jù)庫(kù)的管理是非常重要的映砖。

3. 做個(gè)SQL數(shù)據(jù)庫(kù)同樣可以在其他腳本中被調(diào)用。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末涨醋,一起剝皮案震驚了整個(gè)濱河市浴骂,隨后出現(xiàn)的幾起案子溯警,更是在濱河造成了極大的恐慌梯轻,老刑警劉巖尽棕,帶你破解...
    沈念sama閱讀 222,183評(píng)論 6 516
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件伊诵,死亡現(xiàn)場(chǎng)離奇詭異回官,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)笛坦,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,850評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門版扩,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)礁芦,“玉大人宴偿,你說(shuō)我怎么就攤上這事窄刘∶浼” “怎么了?”我有些...
    開封第一講書人閱讀 168,766評(píng)論 0 361
  • 文/不壞的土叔 我叫張陵材泄,是天一觀的道長(zhǎng)拉宗。 經(jīng)常有香客問(wèn)我旦事,道長(zhǎng)急灭,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 59,854評(píng)論 1 299
  • 正文 為了忘掉前任,我火速辦了婚禮畴嘶,結(jié)果婚禮上掠廓,老公的妹妹穿的比我還像新娘甩恼。我一直安慰自己条摸,他們只是感情好钉蒲,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,871評(píng)論 6 398
  • 文/花漫 我一把揭開白布顷啼。 她就那樣靜靜地躺著钙蒙,像睡著了一般躬厌。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上鸿捧,一...
    開封第一講書人閱讀 52,457評(píng)論 1 311
  • 那天堆巧,我揣著相機(jī)與錄音泼菌,去河邊找鬼。 笑死谣沸,一個(gè)胖子當(dāng)著我的面吹牛乳附,可吹牛的內(nèi)容都是我干的赋除。 我是一名探鬼主播非凌,決...
    沈念sama閱讀 40,999評(píng)論 3 422
  • 文/蒼蘭香墨 我猛地睜開眼颁糟,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼棱貌!你這毒婦竟也來(lái)了箕肃?” 一聲冷哼從身側(cè)響起勺像,我...
    開封第一講書人閱讀 39,914評(píng)論 0 277
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤吟宦,失蹤者是張志新(化名)和其女友劉穎督函,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體锋叨,經(jīng)...
    沈念sama閱讀 46,465評(píng)論 1 319
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡娃磺,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,543評(píng)論 3 342
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了豺瘤。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片坐求。...
    茶點(diǎn)故事閱讀 40,675評(píng)論 1 353
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡桥嗤,死狀恐怖仔蝌,靈堂內(nèi)的尸體忽然破棺而出敛惊,到底是詐尸還是另有隱情,我是刑警寧澤瞧挤,帶...
    沈念sama閱讀 36,354評(píng)論 5 351
  • 正文 年R本政府宣布员辩,位于F島的核電站鸵鸥,受9級(jí)特大地震影響丹皱,放射性物質(zhì)發(fā)生泄漏摊崭。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 42,029評(píng)論 3 335
  • 文/蒙蒙 一乏屯、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧辰晕,春花似錦含友、人聲如沸窘问。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,514評(píng)論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)鸵闪。三九已至概疆,卻和暖如春岔冀,著一層夾襖步出監(jiān)牢的瞬間使套,已是汗流浹背鞠柄。 一陣腳步聲響...
    開封第一講書人閱讀 33,616評(píng)論 1 274
  • 我被黑心中介騙來(lái)泰國(guó)打工厌杜, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留夯尽,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 49,091評(píng)論 3 378
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像圈纺,于是被迫代替她去往敵國(guó)和親麦射。 傳聞我的和親對(duì)象是個(gè)殘疾皇子法褥,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,685評(píng)論 2 360

推薦閱讀更多精彩內(nèi)容