VCF文件的變異分布圖,除了使用Circos圖之外,統(tǒng)計特定材料間的變異分布情況時可能會用到染色體分布圖,為了節(jié)省時間壳坪,實現(xiàn)統(tǒng)計及繪圖的自動化,現(xiàn)提供統(tǒng)計數(shù)量用的Python腳本和繪圖使用的R腳本掰烟,僅供參考爽蝴。如有其它更快捷的方法,或覺得腳本太拙劣纫骑,歡迎留言批評交流蝎亚。
PYTHON腳本煤墙,使用腳本整理100Kb滑窗中變異位點數(shù)量:
# variation_number_in_bin.py
import sys
input_file = sys.argv[1]
step_len = 100000
result = {}
chrom_stats = {
"Chr1":43270923,
"Chr2":35937250,
"Chr3":36413819,
"Chr4":35502694,
"Chr5":29958434,
"Chr6":31248787,
"Chr7":29697621,
"Chr8":28443022,
"Chr9":23012720,
"Chr10":23207287,
"Chr11":29021106,
"Chr12":27531856
}
for chrom in chrom_stats.keys():
step= 0
step_start = 0
result[chrom]={}
result[chrom][step]=0
for line in open(input_file):
if line.startswith(chrom):
pos = line.split()[1]
if step_start < int(pos) < (step_start + step_len):
result[chrom][step] += 1
if int(pos) >= (step_start + step_len):
step += 1
step_start += step_len
result[chrom][step] = 0
result[chrom][step] += 1
for steps,nums in result[chrom].items():
print(chrom, steps, nums, sep="\t")
下面畫圖梅惯,使用R腳本如下:
library(ggplot2)
library(Cairo)
var_density <- read.table("snp.distribution.stats", sep = "\t", header = T)
names(var_density) <- c("chrom", "region", "variation_count")
var_density$chrom <- factor(var_density$chrom, levels=c(
"Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "Chr6",
"Chr7", "Chr8", "Chr9", "Chr10", "Chr11", "Chr12"))
CairoPNG(filename = "variation_distribution.png", width = 12000, height = 7500, res = 600)
distribution <- ggplot(var_density, aes(x=region, y=variation_count))
distribution + geom_line(colour = "darkorange4", size = 0.35) +
guides(fill=FALSE) +
theme(axis.line = element_line(size=0.5),
panel.background = element_rect(fill="white",size=rel(20)),
panel.grid.minor = element_line(colour = "gray"),
axis.text = element_text(family="Times New Roman",size=16,face="bold"),
axis.title = element_text(family="Times New Roman",size=18,face="bold"),
strip.text = element_text(family="Times New Roman",size=12,face="bold")) +
facet_wrap(~chrom, ncol=2)
dev.off()
個人覺得這個圖還湊合。