I have several files, each file has four cloumns: gene_id, counts, fpkm and tpm Can you write a perl script to merge the tpm coulms of these files basing on the gene_id? The script should be able to calculate the file number from the commandlines.
#!/usr/bin/perl
use strict;
use warnings;
# Get file names from command line arguments
my @files = @ARGV;
# Initialize hash to store gene ID TPM values
my %tpm_values;
# Process each file
foreach my $file (@files) {
open my $fh, '<', $file or die "Can't open $file: $!";
# Get column name from file name
my ($column_name) = $file =~ /(\S+)\.count$/;
# Skip header line
my $header = <$fh>;
while (my $line = <$fh>) {
chomp $line;
# Split line into columns
my ($gene_id, $counts, $fpkm, $tpm) = split /\t/, $line;
# Add TPM value to hash for this gene ID and column name
$tpm_values{$gene_id}{$column_name} = $tpm;
}
close $fh;
}
# Print merged TPM values
# Print header row with column names
print "Gene_ID\t";
for my $file (@files) {
my ($column_name) = $file =~ /(\S+)\.count$/;
print "$column_name\t";
}
print "\n";
foreach my $gene_id (sort keys %tpm_values) {
print "$gene_id\t";
# Print TPM value from each file for this gene ID
for my $file (@files) {
my ($column_name) = $file =~ /(\S+)\.count$/;
print exists $tpm_values{$gene_id}{$column_name} ? "$tpm_values{$gene_id}{$column_name}\t" : "NA\t";
}
print "\n";
}
合并前文件內(nèi)容:
gene_id counts fpkm tpm
LOC_Os01g01010 248 10.6260353400409 17.4281762622683
LOC_Os01g01019 1 0.115196235653905 0.18893785268724
LOC_Os01g01030 31 1.63336480724046 2.67894551921027
LOC_Os01g01040 275 13.4319764240168 22.0303100053021
LOC_Os01g01050 362 23.0490775108712 37.8036937284081
LOC_Os01g01060 179 25.2596545730101 41.4293476479946
LOC_Os01g01070 200 13.7092035461406 22.4850010536979
LOC_Os01g01080 713 44.8696317769905 73.5924384220478
LOC_Os01g01090 1 0.0538921368127652 0.0883906019005892
LOC_Os01g01100 0 0 0
LOC_Os01g01110 1 0.14570836990118 0.238981997731223
LOC_Os01g01115 10 0.53360525105611 0.875186847425071
LOC_Os01g01120 232 24.9748495514202 40.9622277902293
LOC_Os01g01130 53 3.43695621970201 5.63708635307769
運(yùn)行:
perl Merge_files_FPKM.pl BPT_0d_RNA_TPM.count BPT_1d_RNA_TPM.count BPT_2d_RNA_TPM.count BPT_5d_RNA_TPM.count >ALL_FPKM
合并后:
Gene_ID BPT_0d_RNA_TPM BPT_1d_RNA_TPM BPT_2d_RNA_TPM BPT_5d_RNA_TPM
ChrSy.fgenesh.gene.1 0 0 0 0
ChrSy.fgenesh.gene.10 0 0 0 0
ChrSy.fgenesh.gene.11 0 0 0 0
ChrSy.fgenesh.gene.12 0.0312652903977292 0 0.066672911473648 0.0229874436731277
ChrSy.fgenesh.gene.13 0.181866321164529 0 0 0
ChrSy.fgenesh.gene.14 0.433521572989911 0.405008007567322 0.462240156579073 1.27496691870357
ChrSy.fgenesh.gene.15 0 0 0 0
ChrSy.fgenesh.gene.16 0 0 0.177648218071233 0
ChrSy.fgenesh.gene.17 0 0 0 0.022022270106722
ChrSy.fgenesh.gene.18 0.255806921435966 0 0 0
ChrSy.fgenesh.gene.19 0.139531048055982 0.195530725416455 0.223161397907665 0.102588591599082
ChrSy.fgenesh.gene.2 0 0 0 0
ChrSy.fgenesh.gene.20 0 0 0 0
ChrSy.fgenesh.gene.21 0 0 0.0705026870674347 0
ChrSy.fgenesh.gene.22 0 0 0 0