小RNA測序數(shù)據(jù)長度分布規(guī)律:
1懒浮、24nt>21nt>22nt>23nt
2锚国、>60%數(shù)據(jù)在20-24nt之間
一会喝、數(shù)據(jù)為fasta格式列荔,首先寫一個(gè)腳本統(tǒng)計(jì)各種長度的sRNA的數(shù)量
#!/usr/bin/env python3
import sys
import collections
inFile = open(sys.argv[1],'r')
outFile = open ('sRNA_count.csv', 'w')
lenlist = []
while True:
line = inFile.readline()
if not line:break
if ">" not in line:
line = line.rstrip()
lenlist.append(len(line))
lenlist.sort()
lencount = collections.Counter(lenlist)
for length in lencount:
outFile.write(str(length) + "\t" + str(lencount[length]) + "\n")
inFile.close()
#運(yùn)行命令
python sRNA_count.py sample.sRNA.data.fa
sRNA_count.csv
二、手動(dòng)將sRNA_count.csv進(jìn)行分列加表頭
sRNA_count.csv
三肾请、用R語言ggplot2繪制直方圖
install.packages('gcookbook')
library(ggplot2)
library(gcookbook)
sRNACount <- read.csv("J:/myProject/sRNA_count/sRNA_count.csv", header = TRUE)
sRNACount
ggplot(sRNACount, aes(x=sRNA, y=NUM)) + scale_y_log10() + xlim(15,30) + geom_bar(stat="identity", fill="lightblue", colour="black")
ggplot(sRNACount, aes(x=sRNA, y=NUM)) + scale_y_log10() + scale_x_continuous(limits=c(14,31),breaks=seq(15,30,1)) + geom_bar(stat="identity", fill="lightblue", colour="black")
sRNA長度分布圖