Linux 20題-生信技能樹

Linux 20題-生信技能樹

一葵袭、在任意文件夾下面創(chuàng)建形如 1/2/3/4/5/6/7/8/9 格式的文件夾系列


vip52@VM-0-15-ubuntu:~/test$ mkdir -p 1/2/3/4/5/6/7/8/9
vip52@VM-0-15-ubuntu:~/test$ tree
.
├── 1
│   └── 2
│       └── 3
│           └── 4
│               └── 5
│                   └── 6
│                       └── 7
│                           └── 8
│                               └── 9

mkdir filename
mkdir -p directory1/directory2/...#

知識(shí)點(diǎn):mkdir -p

二缀台、在創(chuàng)建好的文件夾下面逢勾,比如我的是 /Users/jimmy/tmp/1/2/3/4/5/6/7/8/9 酷宵,里面創(chuàng)建文本文件 me.txt

vip52@VM-0-15-ubuntu:~/test$ cd 1/2/3/4/5/6/7/8/9/
vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ touch me.txt
vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ ls -h
me.txt

touch filename#創(chuàng)建文件
知識(shí)點(diǎn):touch的用法

三或粮、在文本文件 me.txt 里面輸入內(nèi)容:

vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ vim me.txt 
vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ cat me.txt 
Go to: http://www.biotrainee.com/
I love bioinfomatics.
And you ?

知識(shí)點(diǎn):vim or vi or sed 編輯文本

vim filename #編輯filename文件
i #進(jìn)入編輯模式幽歼,insert插入朵锣,輸入內(nèi)容,Ctrl C甸私、Ctrl V或者手動(dòng)輸入刪除
esc #編輯完成后退出編輯模式
wq#保存并退出
wq!#不保存退出

四 诚些、 刪除上面創(chuàng)建的文件夾 1/2/3/4/5/6/7/8/9 及文本文件 me.txt

vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ cd ~/test/
vip52@VM-0-15-ubuntu:~/test$ ls
1
vip52@VM-0-15-ubuntu:~/test$ rm -rf 1
vip52@VM-0-15-ubuntu:~/test$ ls -l
total 0

知識(shí)點(diǎn):rm 命令

rm (-r) filename or file.txt#刪除(徹底)文件或文件夾,由于Linux系統(tǒng)沒有垃圾桶,要慎用刪除文件诬烹,可以rm -i filename刪除前詢問

五砸烦、在任意文件夾下面創(chuàng)建 folder1~5這5個(gè)文件夾,然后每個(gè)文件夾下面繼續(xù)創(chuàng)建 folder1~5這5個(gè)文件夾绞吁,效果如下:

vip52@VM-0-15-ubuntu:~/test$ ls
vip52@VM-0-15-ubuntu:~/test$ mkdir -p folder{1..5}/folder{1..5}
vip52@VM-0-15-ubuntu:~/test$ ls -lh
total 20K
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder1
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder2
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder3
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder4
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder5
vip52@VM-0-15-ubuntu:~/test$ tree
.
├── folder1
│   ├── folder1
│   ├── folder2
│   ├── folder3
│   ├── folder4
│   └── folder5
├── folder2
│   ├── folder1
│   ├── folder2
│   ├── folder3
│   ├── folder4
│   └── folder5
├── folder3
│   ├── folder1
│   ├── folder2
│   ├── folder3
│   ├── folder4
│   └── folder5
├── folder4
│   ├── folder1
│   ├── folder2
│   ├── folder3
│   ├── folder4
│   └── folder5
└── folder5
    ├── folder1
    ├── folder2
    ├── folder3
    ├── folder4
    └── folder5

30 directories, 0 files

知識(shí)點(diǎn):mkdir -p folder{1..5}/../..

六幢痘、在第五題創(chuàng)建的每一個(gè)文件夾下面都 創(chuàng)建第二題文本文件 me.txt ,內(nèi)容也要一樣家破。

第一種方法:

vip52@VM-0-15-ubuntu:~/test$ touch me.txt
vip52@VM-0-15-ubuntu:~/test$ vim me.txt 
vip52@VM-0-15-ubuntu:~/test$ echo folder{1..5}/folder{1..5} |xargs -n 1 cp me.txt 
vip52@VM-0-15-ubuntu:~/test$ tree
.
├── folder1
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
├── folder2
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
├── folder3
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
├── folder4
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
├── folder5
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
└── me.txt

30 directories, 26 files

知識(shí)點(diǎn): xargs的用法xargs命令

第二種方法:

vip52@VM-0-15-ubuntu:~/test$ for dirs in folder{1..5}/folder{1..5}; do cp me.txt $dirs; done
vip52@VM-0-15-ubuntu:~/test$ tree
.
├── folder1
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
├── folder2
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
├── folder3
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
├── folder4
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
├── folder5
│   ├── folder1
│   │   └── me.txt
│   ├── folder2
│   │   └── me.txt
│   ├── folder3
│   │   └── me.txt
│   ├── folder4
│   │   └── me.txt
│   └── folder5
│       └── me.txt
└── me.txt

30 directories, 26 files

知識(shí)點(diǎn):for循環(huán)

for ...; do ...; done

七颜说,再次刪除掉前面幾個(gè)步驟建立的文件夾及文件

vip52@VM-0-15-ubuntu:~/test$ ls *
me.txt

folder1:
folder1  folder2  folder3  folder4  folder5

folder2:
folder1  folder2  folder3  folder4  folder5

folder3:
folder1  folder2  folder3  folder4  folder5

folder4:
folder1  folder2  folder3  folder4  folder5

folder5:
folder1  folder2  folder3  folder4  folder5
vip52@VM-0-15-ubuntu:~/test$ rm -rf folder* 
vip52@VM-0-15-ubuntu:~/test$ rm -r me.txt 

知識(shí)點(diǎn):rm -rf filename

八、下載 http://www.biotrainee.com/jmzeng/igv/test.bed 文件汰聋,后在里面選擇含有 H3K4me3 的那一行是第幾行门粪,該文件總共有幾行。


vip52@VM-0-15-ubuntu:~/test$ wget -c http://www.biotrainee.com/jmzeng/igv/test.bed
--2019-05-11 09:43:18--  http://www.biotrainee.com/jmzeng/igv/test.bed
Resolving www.biotrainee.com (www.biotrainee.com)... 123.206.72.184
Connecting to www.biotrainee.com (www.biotrainee.com)|123.206.72.184|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3099 (3.0K)
Saving to: ‘test.bed’

test.bed                          100%[===========================================================>]   3.03K  --.-KB/s    in 0s      

2019-05-11 09:43:18 (526 MB/s) - ‘test.bed’ saved [3099/3099]

 vip52@VM-0-15-ubuntu:~/test$ cat test.bed | grep -n 'H3K4me3'
8:chr1  9810    10438   ID=SRX387603;Name=H3K4me3%20(@%20HMLE);Title=GSM1280527:%20HMLE%20Twist3D%20H3K4me3%20rep2%3B%20Homo%20sapiens%3B%20ChIP-Seq;Cell%20group=Breast;<br>source_name=HMLE_Twist3D_H3K4me3;cell%20type=human%20mammary%20epithelial%20cells;transfected%20with=Twist1;culture%20type=sphere;chip%20antibody=H3K4me3;chip%20antibody%20vendor=Millipore;  222 .   9810    10438   0,226,255
vip52@VM-0-15-ubuntu:~/test$ cat test.bed | wc
     10      88    3099

知識(shí)點(diǎn):grep用法 grep命令 和 wc用法 wc命令

九马僻、下載 http://www.biotrainee.com/jmzeng/rmDuplicate.zip 文件庄拇,并且解壓,查看里面的文件夾結(jié)構(gòu)

#下載和解壓
vip52@VM-0-15-ubuntu:~/test$ wget http://www.biotrainee.com/jmzeng/rmDuplicate.zip
--2019-05-11 09:59:00--  http://www.biotrainee.com/jmzeng/rmDuplicate.zip
Resolving www.biotrainee.com (www.biotrainee.com)... 123.206.72.184
Connecting to www.biotrainee.com (www.biotrainee.com)|123.206.72.184|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104931 (102K) [application/zip]
Saving to: ‘rmDuplicate.zip’

rmDuplicate.zip                   100%[===========================================================>] 102.47K   523KB/s    in 0.2s    

2019-05-11 09:59:01 (523 KB/s) - ‘rmDuplicate.zip’ saved [104931/104931]
vip52@VM-0-15-ubuntu:~/test$ ls -lhsa
total 116K
4.0K drwxrwxr-x  2 vip52 vip52   4.0K May 11 09:59 .
4.0K drwxr-xr-x 25 vip52 student 4.0K May 10 23:44 ..
104K -rw-rw-r--  1 vip52 vip52   103K Nov 12  2016 rmDuplicate.zip
4.0K -rw-rw-r--  1 vip52 vip52   3.1K May 18  2017 test.bed
vip52@VM-0-15-ubuntu:~/test$ unzip rmDuplicate.zip 
Archive:  rmDuplicate.zip
   creating: rmDuplicate/
   creating: rmDuplicate/picard/
   creating: rmDuplicate/picard/paired/
  inflating: rmDuplicate/picard/paired/readme.txt  
  inflating: rmDuplicate/picard/paired/tmp.header  
  inflating: rmDuplicate/picard/paired/tmp.MarkDuplicates.log  
  inflating: rmDuplicate/picard/paired/tmp.metrics  
  inflating: rmDuplicate/picard/paired/tmp.rmdup.bai  
  inflating: rmDuplicate/picard/paired/tmp.rmdup.bam  
  inflating: rmDuplicate/picard/paired/tmp.sam  
  inflating: rmDuplicate/picard/paired/tmp.sorted.bam  
   creating: rmDuplicate/picard/single/
  inflating: rmDuplicate/picard/single/.MarkDuplicates.log  
  inflating: rmDuplicate/picard/single/readme.txt  
  inflating: rmDuplicate/picard/single/tmp.header  
  inflating: rmDuplicate/picard/single/tmp.MarkDuplicates.log  
  inflating: rmDuplicate/picard/single/tmp.metrics  
  inflating: rmDuplicate/picard/single/tmp.rmdup.bai  
  inflating: rmDuplicate/picard/single/tmp.rmdup.bam  
  inflating: rmDuplicate/picard/single/tmp.sam  
  inflating: rmDuplicate/picard/single/tmp.sorted.bam  
   creating: rmDuplicate/samtools/
   creating: rmDuplicate/samtools/paired/
  inflating: rmDuplicate/samtools/paired/readme.txt  
  inflating: rmDuplicate/samtools/paired/tmp.header  
  inflating: rmDuplicate/samtools/paired/tmp.rmdup.bam  
  inflating: rmDuplicate/samtools/paired/tmp.rmdup.vcf.gz  
  inflating: rmDuplicate/samtools/paired/tmp.sam  
  inflating: rmDuplicate/samtools/paired/tmp.sorted.bam  
  inflating: rmDuplicate/samtools/paired/tmp.sorted.vcf.gz  
   creating: rmDuplicate/samtools/single/
  inflating: rmDuplicate/samtools/single/readme.txt  
  inflating: rmDuplicate/samtools/single/tmp.header  
  inflating: rmDuplicate/samtools/single/tmp.rmdup.bam  
  inflating: rmDuplicate/samtools/single/tmp.rmdup.vcf.gz  
  inflating: rmDuplicate/samtools/single/tmp.sam  
  inflating: rmDuplicate/samtools/single/tmp.sorted.bam  
  inflating: rmDuplicate/samtools/single/tmp.sorted.vcf.gz  
vip52@VM-0-15-ubuntu:~/test$ ls -lh 
total 112K
drwxrwxr-x 4 vip52 vip52 4.0K Nov 12  2016 rmDuplicate
-rw-rw-r-- 1 vip52 vip52 103K Nov 12  2016 rmDuplicate.zip
-rw-rw-r-- 1 vip52 vip52 3.1K May 18  2017 test.bed
 #查看文件結(jié)構(gòu)
vip52@VM-0-15-ubuntu:~/test$ cd rmDuplicate/
vip52@VM-0-15-ubuntu:~/test/rmDuplicate$ tree
.
├── picard
│   ├── paired
│   │   ├── readme.txt
│   │   ├── tmp.header
│   │   ├── tmp.MarkDuplicates.log
│   │   ├── tmp.metrics
│   │   ├── tmp.rmdup.bai
│   │   ├── tmp.rmdup.bam
│   │   ├── tmp.sam
│   │   └── tmp.sorted.bam
│   └── single
│       ├── readme.txt
│       ├── tmp.header
│       ├── tmp.MarkDuplicates.log
│       ├── tmp.metrics
│       ├── tmp.rmdup.bai
│       ├── tmp.rmdup.bam
│       ├── tmp.sam
│       └── tmp.sorted.bam
└── samtools
    ├── paired
    │   ├── readme.txt
    │   ├── tmp.header
    │   ├── tmp.rmdup.bam
    │   ├── tmp.rmdup.vcf.gz
    │   ├── tmp.sam
    │   ├── tmp.sorted.bam
    │   └── tmp.sorted.vcf.gz
    └── single
        ├── readme.txt
        ├── tmp.header
        ├── tmp.rmdup.bam
        ├── tmp.rmdup.vcf.gz
        ├── tmp.sam
        ├── tmp.sorted.bam
        └── tmp.sorted.vcf.gz

6 directories, 30 files 

知識(shí)點(diǎn):unzip的用法unzip命令

十韭邓、打開第九題解壓的文件措近,進(jìn)入 rmDuplicate/samtools/single 文件夾里面,查看后綴為 .sam 的文件女淑,搞清楚 生物信息學(xué)里面的SAM/BAM 定義是什么瞭郑。

vip52@VM-0-15-ubuntu:~/test/rmDuplicate/samtools/single$ less -S tmp.sam 
vip52@VM-0-15-ubuntu:~/test/rmDuplicate/samtools/single$ cat tmp.sam
SRR1042600.42157053 0   chr1    629895  42  51M *   0   0   ATAACCAATACTACCAATCANTACTCATCATTAATAATCATAATGGCTATA CCCFFFFFHHHHHJJJJJJJ#4AGHJJIIJJIIIIIJJJJIJIIIIJJIJI AS:i:-6 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:11C8A30    YT:Z:UU
SRR1042600.42212881 0   chr1    629895  42  51M *   0   0   ATAACCAATACTACCAATCANTACTCATCATTAATAATCATAATGGCTATA @@<FDFFBFDHHFJEIIGJI#3AFHGEHEIJIIGIIGGIJIIJIGIIGIIJ AS:i:-6 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:11C8A30    YT:Z:UU
SRR1042600.12010763 16  chr1    629895  24  51M   

知識(shí)點(diǎn):cat 和less -S

SAM 是帶有比對信息的序列文件(即告訴你這個(gè)reads在染色體上的位置等),用于儲(chǔ)存序列數(shù)據(jù)(SAM format is a generic format for storing large nucleotide sequence alignments/mappings. )鸭你。
BAM is the compressed binary version of the Sequence Alignment/Map (SAM) format. 生物信息中的二進(jìn)制文件主要是為了節(jié)約空間屈张,計(jì)算機(jī)機(jī)可讀「ぞ蓿可以用samtools工具實(shí)現(xiàn)sam和bam文件之間的轉(zhuǎn)化阁谆。

十一、安裝 samtools 軟件

#創(chuàng)建samtools文件夾并下載samtoolsa安裝包
vip52@VM-0-15-ubuntu:~/test/samtools$ vip52@VM-0-15-ubuntu:~/test$ mkdir samtools  && cd samtools
vip52@VM-0-15-ubuntu:~/test/samtools$ wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
--2019-05-11 10:14:32--  https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
Resolving github.com (github.com)... 13.229.188.59
Connecting to github.com (github.com)|13.229.188.59|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/3666841/fe586164-8a73-11e8-84ad-bb90bbd3b7c0?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190511%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190511T021433Z&X-Amz-Expires=300&X-Amz-Signature=535a1ad535409ad3b0b8e33e3716bf2462981f2db3356658bf0dfcfc9e2cc05d&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dsamtools-1.9.tar.bz2&response-content-type=application%2Foctet-stream [following]
--2019-05-11 10:14:33--  https://github-production-release-asset-2e65be.s3.amazonaws.com/3666841/fe586164-8a73-11e8-84ad-bb90bbd3b7c0?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190511%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190511T021433Z&X-Amz-Expires=300&X-Amz-Signature=535a1ad535409ad3b0b8e33e3716bf2462981f2db3356658bf0dfcfc9e2cc05d&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dsamtools-1.9.tar.bz2&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.229.139
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.229.139|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4440405 (4.2M) [application/octet-stream]
Saving to: ‘samtools-1.9.tar.bz2’

samtools-1.9.tar.bz2              100%[===========================================================>]   4.23M  2.02MB/s    in 2.1s    

2019-05-11 10:14:37 (2.02 MB/s) - ‘samtools-1.9.tar.bz2’ saved [4440405/4440405]
#解壓
vip52@VM-0-15-ubuntu:~/test/samtools$ tar jvxf samtools-1.9.tar.bz2
vip52@VM-0-15-ubuntu:~/test/samtools$./configure  #進(jìn)行編譯
vip52@VM-0-15-ubuntu:~/test/samtools$ make #進(jìn)行make
vip52@VM-0-15-ubuntu:~/test/samtools$ make prefix=~/biosoft/samtools-1.9 install #安裝愉老,后面~/biotools/是自己的安裝位置场绿,依據(jù)個(gè)人需要更改
vip52@VM-0-15-ubuntu:~/test/samtools$ echo 'export PATH=$PATH:~/biosoft/samtools-1.9/bin' >> ~/profile #加入環(huán)境變量
vip52@VM-0-15-ubuntu:~/test/samtools$ source ~/profile #刷新profile文件,立即生效嫉入。 #刷新profile文件焰盗,立即生效 

知識(shí)點(diǎn):tar用法和軟件的安裝

也可以在conda里面安裝

十二、打開 后綴為BAM 的文件咒林,找到產(chǎn)生該文件的命令熬拒。 提示一下命令是:

/home/jianmingzeng/biosoft/bowtie/bowtie2-2.2.9/bowtie2-align-s --wrapper basic-0 -p 20 -x /home/jianmingzeng/reference/index/bowtie/hg38 -S /home/jianmingzeng/data/public/allMouse/alignment/WT_rep2_Input.sam -U /tmp/41440.unp  
#找到rmDuplicate/ 下的bam格式的文件
vip52@VM-0-15-ubuntu:~/test$  find rmDuplicate/ -name *.bam
rmDuplicate/picard/paired/tmp.rmdup.bam
rmDuplicate/picard/paired/tmp.sorted.bam
rmDuplicate/picard/single/tmp.rmdup.bam
rmDuplicate/picard/single/tmp.sorted.bam
rmDuplicate/samtools/paired/tmp.rmdup.bam
rmDuplicate/samtools/paired/tmp.sorted.bam
rmDuplicate/samtools/single/tmp.rmdup.bam
rmDuplicate/samtools/single/tmp.sorted.bam ```
#然后進(jìn)入文件夾,cat + grep命令一個(gè)一個(gè)文件查找垫竞,不方便澎粟。首先,查找find和grep的用法。
#問題:查找多個(gè)目錄下包含指定的“內(nèi)容”捌议;解決方法:grep在多個(gè)目錄下搜索或者*grep遞歸搜索文件*哼拔,在多級目錄中對文本進(jìn)行遞歸搜索
grep 
#
grep "bowtie2" rmDuplicate/ -r -n
rmDuplicate/picard/single/tmp.header:457:@PG    ID:bowtie2  PN:bowtie2  VN:2.2.9    CL:"/home/jianmingzeng/biosoft/bowtie/bowtie2-2.2.9/bowtie2-align-s --wrapper basic-0 -p 20 -x /home/jianmingzeng/reference/index/bowtie/hg38 -S /home/jianmingzeng/data/public/allMouse/alignment/WT_rep2_Input.sam -U /tmp/41440.unp"
rmDuplicate/samtools/single/tmp.header:457:@PG  ID:bowtie2  PN:bowtie2  VN:2.2.9    CL:"/home/jianmingzeng/biosoft/bowtie/bowtie2-2.2.9/bowtie2-align-s --wrapper basic-0 -p 20 -x /home/jianmingzeng/reference/index/bowtie/hg38 -S /home/jianmingzeng/data/public/allMouse/alignment/WT_rep2_Input.sam -U /tmp/41440.unp"#本命令用bowtie2代替了整句命令,建議用整句命令瓣颅。

知識(shí)點(diǎn):grep的遞歸搜索grep 命令

十三題倦逐、根據(jù)上面的命令,找到我使用的參考基因組 /home/jianmingzeng/reference/index/bowtie/hg38 具體有多少條染色體

vip52@VM-0-15-ubuntu:~$ cd test
vip52@VM-0-15-ubuntu:~/test$ samtools view -H ~/test/rmDuplicate/samtools/single/tmp.sorted.bam |awk '{print $2}'|cut -c4-9|sort -n|uniq -c|grep -v '_'
      1 bowtie
      1 chr1
      1 chr10
      1 chr11
      1 chr12
      1 chr13
      1 chr14
      1 chr15
      1 chr16
      1 chr17
      1 chr18
      1 chr19
      1 chr2
      1 chr20
      1 chr21
      1 chr22
      1 chr3
      1 chr4
      1 chr5
      1 chr6
      1 chr7
      1 chr8
      1 chr9
      1 chrM
      1 chrX
      1 chrY
      1 1.0

知識(shí)點(diǎn):awk awk用法宫补、cut cut用法檬姥、sort sort用法、uniq uniq用法粉怕、grep

awk健民、cut、sort贫贝、grep為Linux命令的重點(diǎn)和難點(diǎn)

十四題秉犹、上面的后綴為BAM 的文件的第二列,只有 0 和 16 兩個(gè)數(shù)字稚晚,用 cut/sort/uniq等命令統(tǒng)計(jì)它們的個(gè)數(shù)崇堵。

vip52@VM-0-15-ubuntu:~/test/rmDuplicate/samtools/single$ samtools view tmp.sorted.bam |cut -f2|sort |uniq -c
     29 0
     24 16

十五題、重新打開 rmDuplicate/samtools/paired 文件夾下面的后綴為BAM 的文件客燕,再次查看第二列鸳劳,并且統(tǒng)計(jì)

vip52@VM-0-15-ubuntu:~/test/rmDuplicate/picard/single$ samtools view  ~/test/rmDuplicate/samtools/single/tmp.sorted.bam |cut -f2|sort|uniq -c
     29 0
     24 16  

知識(shí)點(diǎn):cut、sort也搓、uniq

十六題赏廓、下載 http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip 文件,并且解壓傍妒,查看里面的文件夾結(jié)構(gòu)幔摸, 這個(gè)文件有2.3M,注意留心下載時(shí)間及下載速度颤练。

vip52@VM-0-15-ubuntu:~/test$ wget http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip
vip52@VM-0-15-ubuntu:~/test$ unzip sickle-results.zip
vip52@VM-0-15-ubuntu:~/test$ cd sickle-results/
vip52@VM-0-15-ubuntu:~/test/sickle-results$ tree
.
├── command.txt
├── single_tmp_fastqc.html
├── single_tmp_fastqc.zip
├── test1_fastqc.html
├── test1_fastqc.zip
├── test2_fastqc.html
├── test2_fastqc.zip
├── trimmed_output_file1_fastqc.html
├── trimmed_output_file1_fastqc.zip
├── trimmed_output_file2_fastqc.html
└── trimmed_output_file2_fastqc.zip

0 directories, 11 files  

知識(shí)點(diǎn):unzip tree

十七題抚太、解壓 sickle-results/single_tmp_fastqc.zip 文件,并且進(jìn)入解壓后的文件夾昔案,找到 fastqc_data.txt 文件,并且搜索該文本文件以 >>開頭的有多少行电媳?

vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | grep -c "^>>" | wc 
      1       1       3
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | grep -c "^>>" | wc -l
1
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | grep "^>>" | wc -l
24
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | awk '/^>>/{print $0}'|wc -l
24
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ less -S fastqc_data.txt | grep "^>>" |wc -l
24
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ more fastqc_data.txt |grep "^>>" |wc -l

知識(shí)點(diǎn):cat /less /more ;awk/grep

十八題踏揣、下載 http://www.biotrainee.com/jmzeng/tmp/hg38.tss 文件,去NCBI找到TP53/BRCA1等自己感興趣的基因?qū)?yīng)的 refseq數(shù)據(jù)庫 ID匾乓,然后找到它們的hg38.tss 文件的哪一行捞稿。

https://www.ncbi.nlm.nih.gov/gene/7157

vip52@VM-0-15-ubuntu:~/test$ wget http://www.biotrainee.com/jmzeng/tmp/hg38.tss
--2019-05-11 19:45:29--  http://www.biotrainee.com/jmzeng/tmp/hg38.tss
Resolving www.biotrainee.com (www.biotrainee.com)... 123.206.72.184
Connecting to www.biotrainee.com (www.biotrainee.com)|123.206.72.184|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2625188 (2.5M)
Saving to: ‘hg38.tss’

hg38.tss                          100%[===========================================================>]   2.50M   271KB/s    in 25s     

2019-05-11 19:45:55 (102 KB/s) - ‘hg38.tss’ saved [2625188/2625188]

vip52@VM-0-15-ubuntu:~/test$ ls -lh
total 5.0M
-rw-rw-r-- 1 vip52 vip52 2.6M Jan 11  2017 hg38.tss
drwxrwxr-x 4 vip52 vip52 4.0K Nov 12  2016 rmDuplicate
-rw-rw-r-- 1 vip52 vip52 103K Nov 12  2016 rmDuplicate.zip
drwxrwxr-x 3 vip52 vip52 4.0K May 11 16:47 samtools
drwxrwxr-x 3 vip52 vip52 4.0K May 11 19:37 sickle-results
-rw-rw-r-- 1 vip52 vip52 2.3M Oct  6  2016 sickle-results.zip
-rw-rw-r-- 1 vip52 vip52 3.1K May 18  2017 test.bed
vip52@VM-0-15-ubuntu:~/test$ cat hg38.tss | grep -n 'NM_001126113'
29346:NM_001126113  chr17   7685550 7689550 1

知識(shí)點(diǎn):cat/grep

十九題、解析hg38.tss 文件,統(tǒng)計(jì)每條染色體的基因個(gè)數(shù)娱局。

vip52@VM-0-15-ubuntu:~/test$ cat hg38.tss|awk '{print $2}' |sort |uniq -c|grep -v '_'
   6050 chr1
   2824 chr10
   3449 chr11
   2931 chr12
   1122 chr13
   1883 chr14
   2168 chr15
   2507 chr16
   3309 chr17
    873 chr18
   3817 chr19
   4042 chr2
   1676 chr20
    868 chr21
   1274 chr22
   3277 chr3
   2250 chr4
   2684 chr5
   3029 chr6
   2720 chr7
   2069 chr8
   2301 chr9
      2 chrM
   2553 chrX
    414 chrY
#也可以用這個(gè)命令
vip52@VM-0-15-ubuntu:~/test$ cat hg38.tss |cut -f2|sort|uniq -c | grep -v '_'
   6050 chr1
   2824 chr10
   3449 chr11
   2931 chr12
   1122 chr13
   1883 chr14
   2168 chr15
   2507 chr16
   3309 chr17
    873 chr18
   3817 chr19
   4042 chr2
   1676 chr20
    868 chr21
   1274 chr22
   3277 chr3
   2250 chr4
   2684 chr5
   3029 chr6
   2720 chr7
   2069 chr8
   2301 chr9
      2 chrM
   2553 chrX
    414 chrY

知識(shí)點(diǎn):cut /sort /grep/awk/uniq

二十彰亥、解析hg38.tss 文件,統(tǒng)計(jì)NM和NR開頭的序列衰齐,了解NM和NR開頭的含義任斋。

 vip52@VM-0-15-ubuntu:~/test$ cat hg38.tss |awk '{print$1}'|cut -c1-2|sort|uniq -c 
  51064 NM
  15954 NR

知識(shí)點(diǎn):cat/grep/awk/cut/sort/uniq

原題目鏈接:http://www.bio-info-trainee.com/2900.html


生信技能樹-jimmy的個(gè)人空間 - 嗶哩嗶哩 ( ゜- ゜)つロ 乾杯~ Bilibili https://space.bilibili.com/338686099/#/
生信技能樹-免費(fèi)自學(xué)-生物信息視頻課程!

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末耻涛,一起剝皮案震驚了整個(gè)濱河市废酷,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌抹缕,老刑警劉巖澈蟆,帶你破解...
    沈念sama閱讀 222,946評論 6 518
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異卓研,居然都是意外死亡趴俘,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 95,336評論 3 399
  • 文/潘曉璐 我一進(jìn)店門奏赘,熙熙樓的掌柜王于貴愁眉苦臉地迎上來寥闪,“玉大人,你說我怎么就攤上這事志珍〕裙福” “怎么了?”我有些...
    開封第一講書人閱讀 169,716評論 0 364
  • 文/不壞的土叔 我叫張陵伦糯,是天一觀的道長柜某。 經(jīng)常有香客問我,道長敛纲,這世上最難降的妖魔是什么喂击? 我笑而不...
    開封第一講書人閱讀 60,222評論 1 300
  • 正文 為了忘掉前任,我火速辦了婚禮淤翔,結(jié)果婚禮上翰绊,老公的妹妹穿的比我還像新娘。我一直安慰自己旁壮,他們只是感情好监嗜,可當(dāng)我...
    茶點(diǎn)故事閱讀 69,223評論 6 398
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著抡谐,像睡著了一般裁奇。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上麦撵,一...
    開封第一講書人閱讀 52,807評論 1 314
  • 那天刽肠,我揣著相機(jī)與錄音溃肪,去河邊找鬼。 笑死音五,一個(gè)胖子當(dāng)著我的面吹牛惫撰,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播躺涝,決...
    沈念sama閱讀 41,235評論 3 424
  • 文/蒼蘭香墨 我猛地睜開眼厨钻,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了诞挨?” 一聲冷哼從身側(cè)響起莉撇,我...
    開封第一講書人閱讀 40,189評論 0 277
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎惶傻,沒想到半個(gè)月后棍郎,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 46,712評論 1 320
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡银室,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,775評論 3 343
  • 正文 我和宋清朗相戀三年涂佃,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片蜈敢。...
    茶點(diǎn)故事閱讀 40,926評論 1 353
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡辜荠,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出抓狭,到底是詐尸還是另有隱情伯病,我是刑警寧澤,帶...
    沈念sama閱讀 36,580評論 5 351
  • 正文 年R本政府宣布否过,位于F島的核電站午笛,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏苗桂。R本人自食惡果不足惜药磺,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 42,259評論 3 336
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望煤伟。 院中可真熱鬧癌佩,春花似錦、人聲如沸便锨。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,750評論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽放案。三九已至姚建,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間卿叽,已是汗流浹背桥胞。 一陣腳步聲響...
    開封第一講書人閱讀 33,867評論 1 274
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留考婴,地道東北人贩虾。 一個(gè)月前我還...
    沈念sama閱讀 49,368評論 3 379
  • 正文 我出身青樓,卻偏偏與公主長得像沥阱,于是被迫代替她去往敵國和親缎罢。 傳聞我的和親對象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,930評論 2 361

推薦閱讀更多精彩內(nèi)容