需求
客戶反映,完整的基因組太大打不開,要我將之按各條染色體和scaffold拆分蹋偏。如何快速實現(xiàn)便斥?
方法一
借助工具:
$ pip install pyfaidx
$ faidx -x sequences.fa
方法二
自己寫腳本:split.pl
#!/usr/bin/perl
$f = $ARGV[0]; #get the file name
open (INFILE, "<$f")
or die "Can't open: $f $!";
while (<INFILE>) {
$line = $_;
chomp $line;
if ($line =~ /\>/) { #if has fasta >
close OUTFILE;
$new_file = substr($line,1);
$new_file .= ".fa";
open (OUTFILE, ">$new_file")
or die "Can't open: $new_file $!";
}
print OUTFILE "$line\n";
}
close OUTFILE;
運行:perl split.pl sequences.fa
放到一個目錄中,gzip -r dir
一并發(fā)給客戶威始。
https://www.biostars.org/p/173723/
http://seqanswers.com/forums/archive/index.php/t-32162.html