本例子展示的是如何在內(nèi)含子和外顯子中統(tǒng)計(jì)reads數(shù)量董虱。不但包括了這個(gè)內(nèi)容,還包括了其他的命名展示申鱼。例如
1愤诱、BAM file support (for more, see?Working with BAM files)
2、indexing into Interval objects (for more, see?Intervals)
3润讥、filtering (for more, see?Filtering)
4转锈、streaming (for more, see?Using BedTool objects as iterators/generators)
5、ability to use parallel processing
下面是含有注釋信息的命令行展示
import multiprocessing
import pybedtools
# get?example GFF and BAM filenames 獲得例子中的GFF和BAM的內(nèi)容gff = pybedtools.example_filename('gdc.gff')
bam = pybedtools.example_filename('gdc.bam')
# Some?GFF files have invalid entries -- like chromosomes with negative coords GFF文件中含有不實(shí)的內(nèi)容楚殿,例如染色為負(fù)數(shù)坐標(biāo)的信息
# orfeatures of length = 0.? This lineremoves them and saves the result in a?tempfile 或者特征值的長(zhǎng)度為0撮慨,可以使用命令行去除這些無(wú)意義的內(nèi)容,并且存儲(chǔ)為另外一個(gè)文件脆粥。g = pybedtools.BedTool(gff).remove_invalid().saveas()
# Next,?we create a function to pass only features for a particular?featuretype.? This is similar to a"grep" operation when applied to every#feature in a BedTool
隨后砌溺,我們構(gòu)建了一個(gè)函數(shù)用來(lái)只顯示符合要求的特征,類似于grep命令def featuretype_filter(feature, featuretype):
??? if feature[2] == featuretype:
??????? return True
??? return False
# This?function will eventually be run in parallel, applying the filter above?to?several different BedTools simultaneously?多種不同的BedTools同時(shí)選擇def subset_featuretypes(featuretype):
??? result = g.filter(featuretype_filter, featuretype).saveas()
??? return pybedtools.BedTool(result.fn)
# Thisfunction performs the intersection of a BAM file with a GFF file and?returns the total number of hits.? Itwill eventually be run in parallel.def count_reads_in_features(features_fn):
??? """
??? Callback function to count reads infeatures
??? """
??# BAM files are auto-detected; no need foran `abam` argument.? Here we?# construct a new BedTool out of the BAM?file and intersect it with the?features filename.?We use stream=True so that no?intermediate tempfile is?created, and bed=True so that the.count() method can iterate through the?resulting streamed BedTool.??? return pybedtools.BedTool(bam).intersect(b=features_fn,?stream=True).count()
# Set?up a pool of workers for parallel processing pool = multiprocessing.Pool()
#?Create separate files for introns and exons, using the function we defined?abovefeaturetypes = ('intron', 'exon')
introns, exons = pool.map(subset_featuretypes, featuretypes)
#?Perform some genome algebra to get unique and shared intron/exon regions.?Here?we keep only the filename of the results, which is safer than an entire BedTool for passing around in parallel computations.exon_only = exons.subtract(introns).merge().remove_invalid().saveas().fn
intron_only = introns.subtract(exons).merge().remove_invalid().saveas().fn
intron_and_exon = exons.intersect(introns).merge().remove_invalid().saveas().fn
# Do?intersections with BAM file in parallel, using the other function we defined abovefeatures = (exon_only, intron_only, intron_and_exon)
results = pool.map(count_reads_in_features, features)
# Print?the resultslabels = (' exon only:',?' intron only:',?'intron and exon:')
for label, reads in zip(labels, results):
??? sys.stdout.write('%s %s\n' % (label, reads))