kraken2 手冊:https://github.com/DerrickWood/kraken2/wiki/Manual
1 kraken2 report是這樣子的
想把各分類都注釋上去呢
2 python3代碼
思路:
(1) 根據(jù)第四列判斷domain phylum class order family genus species兴蒸,將分類信息賦值給變量,不考慮含數(shù)字的分類昆雀。
(2) 根據(jù)條件將分類信息加到第六列輸出即可闹炉。
(U)nclassified, (R)oot, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies
#!/usr/bin/env python3
import os, sys, re
ms, infile_name, outfile_name = sys.argv
with open(infile_name, 'r') as infile:
with open(outfile_name, 'w') as o:
Domain = ""
for line in infile:
line = line.strip()
lines = re.split(r'\t', line)
lines[5] = re.split(r'^[ ]*', lines[5])[1]
if lines[3] == "D":
Domain = lines[5]
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "P":
Phylum = lines[5]
lines[5] = Domain+";"+Phylum
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "C":
Class = lines[5]
lines[5] = Domain+";"+Phylum+";"+Class
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "O":
Order = lines[5]
lines[5] = Domain+";"+Phylum+";"+Class+";"+Order
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "F":
Family = lines[5]
lines[5] = Domain+";"+Phylum+";"+Class+";"+Order+";"+Family
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "G":
Genus = lines[5]
lines[5] = Domain+";"+Phylum+";"+Class+";"+Order+";"+Family+";"+Genus
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "S":
Species = lines[5]
lines[5] = Domain+";"+Phylum+";"+Class+";"+Order+";"+Family+";"+Genus+";"+Species
o.write("\t".join(lines))
o.write("\n")
上面漏掉了kingdom的信息纹因,尤其是真菌是Kingdom在結(jié)果中完全看不到真菌。奇怪的是Kraken注釋,只有部分Domain有kingdom再芋,修改代碼。后續(xù)數(shù)據(jù)處理不變坚冀。
#!/usr/bin/env python3
import os, sys, re
ms, infile_name, outfile_name = sys.argv
with open(infile_name, 'r') as infile:
with open(outfile_name, 'w') as o:
Domain = ""
for line in infile:
line = line.strip()
lines = re.split(r'\t', line)
lines[5] = re.split(r'^[ ]*', lines[5])[1]
if lines[3] == "D":
Domain = lines[5]
o.write("\t".join(lines))
o.write("\n")
Kingdom = "NG" ## Kingdom非必須济赎,Domain必須,在這里賦空值
elif lines[3] == "K":
Kingdom = lines[5]
lines[5] = Domain+";"+Kingdom
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "P":
Phylum = lines[5]
lines[5] = Domain+";"+Kingdom+";"+Phylum
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "C":
Class = lines[5]
lines[5] = Domain+";"+Kingdom+";"+Phylum+";"+Class
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "O":
Order = lines[5]
lines[5] = Domain+";"+Kingdom+";"+Phylum+";"+Class+";"+Order
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "F":
Family = lines[5]
lines[5] = Domain+";"+Kingdom+";"+Phylum+";"+Class+";"+Order+";"+Family
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "G":
Genus = lines[5]
lines[5] = Domain+";"+Kingdom+";"+Phylum+";"+Class+";"+Order+";"+Family+";"+Genus
o.write("\t".join(lines))
o.write("\n")
elif lines[3] == "S":
Species = lines[5]
lines[5] = Domain+";"+Kingdom+";"+Phylum+";"+Class+";"+Order+";"+Family+";"+Genus+";"+Species
o.write("\t".join(lines))
o.write("\n")
3 運(yùn)行腳本,查看結(jié)果
任意找一個(gè)樣本的kraken2結(jié)果司训,包含所有物種物種注釋結(jié)果构捡,即使reads count是0。行數(shù)均為31030壳猜。
conda activate python3.7
python3 ../script/kraken_anno.py sample_report.tsv kraken_anno.txt
4 抽提種水平注釋
cat kraken_anno.txt | awk -F"\t" '{if($4=="S") print $6}' > kraken_anno2.txt
行數(shù)為17091勾徽,該數(shù)據(jù)庫中各物種總數(shù)。
為保證處理結(jié)果與kraken2的合并結(jié)果species完全一致统扳,還需做:
' -> 空
[] -> 空
5 統(tǒng)計(jì)各門中基因組數(shù)量
四大域/界:古菌喘帚,細(xì)菌,真核咒钟,病毒
kraken2數(shù)據(jù)庫的基因組統(tǒng)計(jì)吹由,門水平
1 Archaea Candidatus Geothermarchaeota
1 Archaea Candidatus Korarchaeota
1 Archaea Candidatus Lokiarchaeota
1 Archaea Candidatus Micrarchaeota
57 Archaea Crenarchaeota
230 Archaea Euryarchaeota
24 Archaea Thaumarchaeota
17 Bacteria Acidobacteria
1164 Bacteria Actinobacteria
17 Bacteria Aquificae
3 Bacteria Armatimonadetes
484 Bacteria Bacteroidetes
1 Bacteria Balneolaeota
1 Bacteria Caldiserica
1 Bacteria Calditrichaeota
1 Bacteria Candidatus Bipolaricaulota
1 Bacteria Candidatus Cloacimonetes
1 Bacteria Candidatus Omnitrophica
4 Bacteria Candidatus Saccharibacteria
22 Bacteria Chlamydiae
14 Bacteria Chlorobi
19 Bacteria Chloroflexi
1 Bacteria Chrysiogenetes
8 Bacteria Coprothermobacterota
190 Bacteria Cyanobacteria
5 Bacteria Deferribacteres
36 Bacteria Deinococcus-Thermus
2 Bacteria Dictyoglomi
3 Bacteria Elusimicrobia
1 Bacteria Fibrobacteres
1001 Bacteria Firmicutes
25 Bacteria Fusobacteria
4 Bacteria Gemmatimonadetes
2 Bacteria Ignavibacteriae
1 Bacteria Kiritimatiellaeota
9 Bacteria Nitrospirae
35 Bacteria Planctomycetes
3102 Bacteria Proteobacteria
60 Bacteria Spirochaetes
8 Bacteria Synergistetes
152 Bacteria Tenericutes
7 Bacteria Thermodesulfobacteria
32 Bacteria Thermotogae
16 Bacteria Verrucomicrobia
24 Eukaryota Apicomplexa
52 Eukaryota Ascomycota
2 Eukaryota Bacillariophyta
5 Eukaryota Basidiomycota
1 Eukaryota Cercozoa
4 Eukaryota Chlorophyta
1 Eukaryota Chordata
1 Eukaryota Ciliophora
7 Eukaryota Euglenozoa
2 Eukaryota Evosea
4 Eukaryota Microsporidia
4 Eukaryota Rhodophyta
69 Eukaryota Streptophyta
729 Viruses Artverviricota
410 Viruses Cossaviricota
903 Viruses Cressdnaviricota
7 Viruses Dividoviricota
222 Viruses Duplornaviricota
46 Viruses Hofneiviricota
789 Viruses Kitrinoviricota
1153 Viruses Lenarviricota
645 Viruses Negarnaviricota
127 Viruses Nucleocytoviricota
111 Viruses Peploviricota
61 Viruses Phixviricota
894 Viruses Pisuviricota
94 Viruses Preplasmiviricota
417 Viruses Saleviricota
3542 Viruses Uroviricota