Linux 20題-生信技能樹
一葵袭、在任意文件夾下面創(chuàng)建形如 1/2/3/4/5/6/7/8/9 格式的文件夾系列
vip52@VM-0-15-ubuntu:~/test$ mkdir -p 1/2/3/4/5/6/7/8/9
vip52@VM-0-15-ubuntu:~/test$ tree
.
├── 1
│ └── 2
│ └── 3
│ └── 4
│ └── 5
│ └── 6
│ └── 7
│ └── 8
│ └── 9
mkdir filename
mkdir -p directory1/directory2/...#
知識(shí)點(diǎn):mkdir -p
二缀台、在創(chuàng)建好的文件夾下面逢勾,比如我的是 /Users/jimmy/tmp/1/2/3/4/5/6/7/8/9 酷宵,里面創(chuàng)建文本文件 me.txt
vip52@VM-0-15-ubuntu:~/test$ cd 1/2/3/4/5/6/7/8/9/
vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ touch me.txt
vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ ls -h
me.txt
touch filename#創(chuàng)建文件
知識(shí)點(diǎn):touch的用法
三或粮、在文本文件 me.txt 里面輸入內(nèi)容:
vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ vim me.txt
vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ cat me.txt
Go to: http://www.biotrainee.com/
I love bioinfomatics.
And you ?
知識(shí)點(diǎn):vim or vi or sed 編輯文本
vim filename #編輯filename文件
i #進(jìn)入編輯模式幽歼,insert插入朵锣,輸入內(nèi)容,Ctrl C甸私、Ctrl V或者手動(dòng)輸入刪除
esc #編輯完成后退出編輯模式
wq#保存并退出
wq!#不保存退出
四 诚些、 刪除上面創(chuàng)建的文件夾 1/2/3/4/5/6/7/8/9 及文本文件 me.txt
vip52@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ cd ~/test/
vip52@VM-0-15-ubuntu:~/test$ ls
1
vip52@VM-0-15-ubuntu:~/test$ rm -rf 1
vip52@VM-0-15-ubuntu:~/test$ ls -l
total 0
知識(shí)點(diǎn):rm 命令
rm (-r) filename or file.txt#刪除(徹底)文件或文件夾,由于Linux系統(tǒng)沒有垃圾桶,要慎用刪除文件诬烹,可以rm -i filename刪除前詢問
五砸烦、在任意文件夾下面創(chuàng)建 folder1~5這5個(gè)文件夾,然后每個(gè)文件夾下面繼續(xù)創(chuàng)建 folder1~5這5個(gè)文件夾绞吁,效果如下:
vip52@VM-0-15-ubuntu:~/test$ ls
vip52@VM-0-15-ubuntu:~/test$ mkdir -p folder{1..5}/folder{1..5}
vip52@VM-0-15-ubuntu:~/test$ ls -lh
total 20K
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder1
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder2
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder3
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder4
drwxrwxr-x 7 vip52 vip52 4.0K May 10 23:29 folder5
vip52@VM-0-15-ubuntu:~/test$ tree
.
├── folder1
│ ├── folder1
│ ├── folder2
│ ├── folder3
│ ├── folder4
│ └── folder5
├── folder2
│ ├── folder1
│ ├── folder2
│ ├── folder3
│ ├── folder4
│ └── folder5
├── folder3
│ ├── folder1
│ ├── folder2
│ ├── folder3
│ ├── folder4
│ └── folder5
├── folder4
│ ├── folder1
│ ├── folder2
│ ├── folder3
│ ├── folder4
│ └── folder5
└── folder5
├── folder1
├── folder2
├── folder3
├── folder4
└── folder5
30 directories, 0 files
知識(shí)點(diǎn):mkdir -p folder{1..5}/../..
六幢痘、在第五題創(chuàng)建的每一個(gè)文件夾下面都 創(chuàng)建第二題文本文件 me.txt ,內(nèi)容也要一樣家破。
第一種方法:
vip52@VM-0-15-ubuntu:~/test$ touch me.txt
vip52@VM-0-15-ubuntu:~/test$ vim me.txt
vip52@VM-0-15-ubuntu:~/test$ echo folder{1..5}/folder{1..5} |xargs -n 1 cp me.txt
vip52@VM-0-15-ubuntu:~/test$ tree
.
├── folder1
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
├── folder2
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
├── folder3
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
├── folder4
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
├── folder5
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
└── me.txt
30 directories, 26 files
知識(shí)點(diǎn): xargs的用法xargs命令
第二種方法:
vip52@VM-0-15-ubuntu:~/test$ for dirs in folder{1..5}/folder{1..5}; do cp me.txt $dirs; done
vip52@VM-0-15-ubuntu:~/test$ tree
.
├── folder1
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
├── folder2
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
├── folder3
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
├── folder4
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
├── folder5
│ ├── folder1
│ │ └── me.txt
│ ├── folder2
│ │ └── me.txt
│ ├── folder3
│ │ └── me.txt
│ ├── folder4
│ │ └── me.txt
│ └── folder5
│ └── me.txt
└── me.txt
30 directories, 26 files
知識(shí)點(diǎn):for循環(huán)
for ...; do ...; done
七颜说,再次刪除掉前面幾個(gè)步驟建立的文件夾及文件
vip52@VM-0-15-ubuntu:~/test$ ls *
me.txt
folder1:
folder1 folder2 folder3 folder4 folder5
folder2:
folder1 folder2 folder3 folder4 folder5
folder3:
folder1 folder2 folder3 folder4 folder5
folder4:
folder1 folder2 folder3 folder4 folder5
folder5:
folder1 folder2 folder3 folder4 folder5
vip52@VM-0-15-ubuntu:~/test$ rm -rf folder*
vip52@VM-0-15-ubuntu:~/test$ rm -r me.txt
知識(shí)點(diǎn):rm -rf filename
八、下載 http://www.biotrainee.com/jmzeng/igv/test.bed 文件汰聋,后在里面選擇含有 H3K4me3 的那一行是第幾行门粪,該文件總共有幾行。
vip52@VM-0-15-ubuntu:~/test$ wget -c http://www.biotrainee.com/jmzeng/igv/test.bed
--2019-05-11 09:43:18-- http://www.biotrainee.com/jmzeng/igv/test.bed
Resolving www.biotrainee.com (www.biotrainee.com)... 123.206.72.184
Connecting to www.biotrainee.com (www.biotrainee.com)|123.206.72.184|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3099 (3.0K)
Saving to: ‘test.bed’
test.bed 100%[===========================================================>] 3.03K --.-KB/s in 0s
2019-05-11 09:43:18 (526 MB/s) - ‘test.bed’ saved [3099/3099]
vip52@VM-0-15-ubuntu:~/test$ cat test.bed | grep -n 'H3K4me3'
8:chr1 9810 10438 ID=SRX387603;Name=H3K4me3%20(@%20HMLE);Title=GSM1280527:%20HMLE%20Twist3D%20H3K4me3%20rep2%3B%20Homo%20sapiens%3B%20ChIP-Seq;Cell%20group=Breast;<br>source_name=HMLE_Twist3D_H3K4me3;cell%20type=human%20mammary%20epithelial%20cells;transfected%20with=Twist1;culture%20type=sphere;chip%20antibody=H3K4me3;chip%20antibody%20vendor=Millipore; 222 . 9810 10438 0,226,255
vip52@VM-0-15-ubuntu:~/test$ cat test.bed | wc
10 88 3099
知識(shí)點(diǎn):grep用法 grep命令 和 wc用法 wc命令
九马僻、下載 http://www.biotrainee.com/jmzeng/rmDuplicate.zip 文件庄拇,并且解壓,查看里面的文件夾結(jié)構(gòu)
#下載和解壓
vip52@VM-0-15-ubuntu:~/test$ wget http://www.biotrainee.com/jmzeng/rmDuplicate.zip
--2019-05-11 09:59:00-- http://www.biotrainee.com/jmzeng/rmDuplicate.zip
Resolving www.biotrainee.com (www.biotrainee.com)... 123.206.72.184
Connecting to www.biotrainee.com (www.biotrainee.com)|123.206.72.184|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104931 (102K) [application/zip]
Saving to: ‘rmDuplicate.zip’
rmDuplicate.zip 100%[===========================================================>] 102.47K 523KB/s in 0.2s
2019-05-11 09:59:01 (523 KB/s) - ‘rmDuplicate.zip’ saved [104931/104931]
vip52@VM-0-15-ubuntu:~/test$ ls -lhsa
total 116K
4.0K drwxrwxr-x 2 vip52 vip52 4.0K May 11 09:59 .
4.0K drwxr-xr-x 25 vip52 student 4.0K May 10 23:44 ..
104K -rw-rw-r-- 1 vip52 vip52 103K Nov 12 2016 rmDuplicate.zip
4.0K -rw-rw-r-- 1 vip52 vip52 3.1K May 18 2017 test.bed
vip52@VM-0-15-ubuntu:~/test$ unzip rmDuplicate.zip
Archive: rmDuplicate.zip
creating: rmDuplicate/
creating: rmDuplicate/picard/
creating: rmDuplicate/picard/paired/
inflating: rmDuplicate/picard/paired/readme.txt
inflating: rmDuplicate/picard/paired/tmp.header
inflating: rmDuplicate/picard/paired/tmp.MarkDuplicates.log
inflating: rmDuplicate/picard/paired/tmp.metrics
inflating: rmDuplicate/picard/paired/tmp.rmdup.bai
inflating: rmDuplicate/picard/paired/tmp.rmdup.bam
inflating: rmDuplicate/picard/paired/tmp.sam
inflating: rmDuplicate/picard/paired/tmp.sorted.bam
creating: rmDuplicate/picard/single/
inflating: rmDuplicate/picard/single/.MarkDuplicates.log
inflating: rmDuplicate/picard/single/readme.txt
inflating: rmDuplicate/picard/single/tmp.header
inflating: rmDuplicate/picard/single/tmp.MarkDuplicates.log
inflating: rmDuplicate/picard/single/tmp.metrics
inflating: rmDuplicate/picard/single/tmp.rmdup.bai
inflating: rmDuplicate/picard/single/tmp.rmdup.bam
inflating: rmDuplicate/picard/single/tmp.sam
inflating: rmDuplicate/picard/single/tmp.sorted.bam
creating: rmDuplicate/samtools/
creating: rmDuplicate/samtools/paired/
inflating: rmDuplicate/samtools/paired/readme.txt
inflating: rmDuplicate/samtools/paired/tmp.header
inflating: rmDuplicate/samtools/paired/tmp.rmdup.bam
inflating: rmDuplicate/samtools/paired/tmp.rmdup.vcf.gz
inflating: rmDuplicate/samtools/paired/tmp.sam
inflating: rmDuplicate/samtools/paired/tmp.sorted.bam
inflating: rmDuplicate/samtools/paired/tmp.sorted.vcf.gz
creating: rmDuplicate/samtools/single/
inflating: rmDuplicate/samtools/single/readme.txt
inflating: rmDuplicate/samtools/single/tmp.header
inflating: rmDuplicate/samtools/single/tmp.rmdup.bam
inflating: rmDuplicate/samtools/single/tmp.rmdup.vcf.gz
inflating: rmDuplicate/samtools/single/tmp.sam
inflating: rmDuplicate/samtools/single/tmp.sorted.bam
inflating: rmDuplicate/samtools/single/tmp.sorted.vcf.gz
vip52@VM-0-15-ubuntu:~/test$ ls -lh
total 112K
drwxrwxr-x 4 vip52 vip52 4.0K Nov 12 2016 rmDuplicate
-rw-rw-r-- 1 vip52 vip52 103K Nov 12 2016 rmDuplicate.zip
-rw-rw-r-- 1 vip52 vip52 3.1K May 18 2017 test.bed
#查看文件結(jié)構(gòu)
vip52@VM-0-15-ubuntu:~/test$ cd rmDuplicate/
vip52@VM-0-15-ubuntu:~/test/rmDuplicate$ tree
.
├── picard
│ ├── paired
│ │ ├── readme.txt
│ │ ├── tmp.header
│ │ ├── tmp.MarkDuplicates.log
│ │ ├── tmp.metrics
│ │ ├── tmp.rmdup.bai
│ │ ├── tmp.rmdup.bam
│ │ ├── tmp.sam
│ │ └── tmp.sorted.bam
│ └── single
│ ├── readme.txt
│ ├── tmp.header
│ ├── tmp.MarkDuplicates.log
│ ├── tmp.metrics
│ ├── tmp.rmdup.bai
│ ├── tmp.rmdup.bam
│ ├── tmp.sam
│ └── tmp.sorted.bam
└── samtools
├── paired
│ ├── readme.txt
│ ├── tmp.header
│ ├── tmp.rmdup.bam
│ ├── tmp.rmdup.vcf.gz
│ ├── tmp.sam
│ ├── tmp.sorted.bam
│ └── tmp.sorted.vcf.gz
└── single
├── readme.txt
├── tmp.header
├── tmp.rmdup.bam
├── tmp.rmdup.vcf.gz
├── tmp.sam
├── tmp.sorted.bam
└── tmp.sorted.vcf.gz
6 directories, 30 files
知識(shí)點(diǎn):unzip的用法unzip命令
十韭邓、打開第九題解壓的文件措近,進(jìn)入 rmDuplicate/samtools/single 文件夾里面,查看后綴為 .sam 的文件女淑,搞清楚 生物信息學(xué)里面的SAM/BAM 定義是什么瞭郑。
vip52@VM-0-15-ubuntu:~/test/rmDuplicate/samtools/single$ less -S tmp.sam
vip52@VM-0-15-ubuntu:~/test/rmDuplicate/samtools/single$ cat tmp.sam
SRR1042600.42157053 0 chr1 629895 42 51M * 0 0 ATAACCAATACTACCAATCANTACTCATCATTAATAATCATAATGGCTATA CCCFFFFFHHHHHJJJJJJJ#4AGHJJIIJJIIIIIJJJJIJIIIIJJIJI AS:i:-6 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:11C8A30 YT:Z:UU
SRR1042600.42212881 0 chr1 629895 42 51M * 0 0 ATAACCAATACTACCAATCANTACTCATCATTAATAATCATAATGGCTATA @@<FDFFBFDHHFJEIIGJI#3AFHGEHEIJIIGIIGGIJIIJIGIIGIIJ AS:i:-6 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:11C8A30 YT:Z:UU
SRR1042600.12010763 16 chr1 629895 24 51M
知識(shí)點(diǎn):cat 和less -S
SAM 是帶有比對信息的序列文件(即告訴你這個(gè)reads在染色體上的位置等),用于儲(chǔ)存序列數(shù)據(jù)(SAM format is a generic format for storing large nucleotide sequence alignments/mappings. )鸭你。
BAM is the compressed binary version of the Sequence Alignment/Map (SAM) format. 生物信息中的二進(jìn)制文件主要是為了節(jié)約空間屈张,計(jì)算機(jī)機(jī)可讀「ぞ蓿可以用samtools工具實(shí)現(xiàn)sam和bam文件之間的轉(zhuǎn)化阁谆。
十一、安裝 samtools 軟件
#創(chuàng)建samtools文件夾并下載samtoolsa安裝包
vip52@VM-0-15-ubuntu:~/test/samtools$ vip52@VM-0-15-ubuntu:~/test$ mkdir samtools && cd samtools
vip52@VM-0-15-ubuntu:~/test/samtools$ wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
--2019-05-11 10:14:32-- https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
Resolving github.com (github.com)... 13.229.188.59
Connecting to github.com (github.com)|13.229.188.59|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/3666841/fe586164-8a73-11e8-84ad-bb90bbd3b7c0?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190511%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190511T021433Z&X-Amz-Expires=300&X-Amz-Signature=535a1ad535409ad3b0b8e33e3716bf2462981f2db3356658bf0dfcfc9e2cc05d&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dsamtools-1.9.tar.bz2&response-content-type=application%2Foctet-stream [following]
--2019-05-11 10:14:33-- https://github-production-release-asset-2e65be.s3.amazonaws.com/3666841/fe586164-8a73-11e8-84ad-bb90bbd3b7c0?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190511%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190511T021433Z&X-Amz-Expires=300&X-Amz-Signature=535a1ad535409ad3b0b8e33e3716bf2462981f2db3356658bf0dfcfc9e2cc05d&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dsamtools-1.9.tar.bz2&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.229.139
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.229.139|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4440405 (4.2M) [application/octet-stream]
Saving to: ‘samtools-1.9.tar.bz2’
samtools-1.9.tar.bz2 100%[===========================================================>] 4.23M 2.02MB/s in 2.1s
2019-05-11 10:14:37 (2.02 MB/s) - ‘samtools-1.9.tar.bz2’ saved [4440405/4440405]
#解壓
vip52@VM-0-15-ubuntu:~/test/samtools$ tar jvxf samtools-1.9.tar.bz2
vip52@VM-0-15-ubuntu:~/test/samtools$./configure #進(jìn)行編譯
vip52@VM-0-15-ubuntu:~/test/samtools$ make #進(jìn)行make
vip52@VM-0-15-ubuntu:~/test/samtools$ make prefix=~/biosoft/samtools-1.9 install #安裝愉老,后面~/biotools/是自己的安裝位置场绿,依據(jù)個(gè)人需要更改
vip52@VM-0-15-ubuntu:~/test/samtools$ echo 'export PATH=$PATH:~/biosoft/samtools-1.9/bin' >> ~/profile #加入環(huán)境變量
vip52@VM-0-15-ubuntu:~/test/samtools$ source ~/profile #刷新profile文件,立即生效嫉入。 #刷新profile文件焰盗,立即生效
知識(shí)點(diǎn):tar用法和軟件的安裝
也可以在conda里面安裝
十二、打開 后綴為BAM 的文件咒林,找到產(chǎn)生該文件的命令熬拒。 提示一下命令是:
/home/jianmingzeng/biosoft/bowtie/bowtie2-2.2.9/bowtie2-align-s --wrapper basic-0 -p 20 -x /home/jianmingzeng/reference/index/bowtie/hg38 -S /home/jianmingzeng/data/public/allMouse/alignment/WT_rep2_Input.sam -U /tmp/41440.unp
#找到rmDuplicate/ 下的bam格式的文件
vip52@VM-0-15-ubuntu:~/test$ find rmDuplicate/ -name *.bam
rmDuplicate/picard/paired/tmp.rmdup.bam
rmDuplicate/picard/paired/tmp.sorted.bam
rmDuplicate/picard/single/tmp.rmdup.bam
rmDuplicate/picard/single/tmp.sorted.bam
rmDuplicate/samtools/paired/tmp.rmdup.bam
rmDuplicate/samtools/paired/tmp.sorted.bam
rmDuplicate/samtools/single/tmp.rmdup.bam
rmDuplicate/samtools/single/tmp.sorted.bam ```
#然后進(jìn)入文件夾,cat + grep命令一個(gè)一個(gè)文件查找垫竞,不方便澎粟。首先,查找find和grep的用法。
#問題:查找多個(gè)目錄下包含指定的“內(nèi)容”捌议;解決方法:grep在多個(gè)目錄下搜索或者*grep遞歸搜索文件*哼拔,在多級目錄中對文本進(jìn)行遞歸搜索
grep
#
grep "bowtie2" rmDuplicate/ -r -n
rmDuplicate/picard/single/tmp.header:457:@PG ID:bowtie2 PN:bowtie2 VN:2.2.9 CL:"/home/jianmingzeng/biosoft/bowtie/bowtie2-2.2.9/bowtie2-align-s --wrapper basic-0 -p 20 -x /home/jianmingzeng/reference/index/bowtie/hg38 -S /home/jianmingzeng/data/public/allMouse/alignment/WT_rep2_Input.sam -U /tmp/41440.unp"
rmDuplicate/samtools/single/tmp.header:457:@PG ID:bowtie2 PN:bowtie2 VN:2.2.9 CL:"/home/jianmingzeng/biosoft/bowtie/bowtie2-2.2.9/bowtie2-align-s --wrapper basic-0 -p 20 -x /home/jianmingzeng/reference/index/bowtie/hg38 -S /home/jianmingzeng/data/public/allMouse/alignment/WT_rep2_Input.sam -U /tmp/41440.unp"#本命令用bowtie2代替了整句命令,建議用整句命令瓣颅。
知識(shí)點(diǎn):grep的遞歸搜索grep 命令
十三題倦逐、根據(jù)上面的命令,找到我使用的參考基因組 /home/jianmingzeng/reference/index/bowtie/hg38 具體有多少條染色體
vip52@VM-0-15-ubuntu:~$ cd test
vip52@VM-0-15-ubuntu:~/test$ samtools view -H ~/test/rmDuplicate/samtools/single/tmp.sorted.bam |awk '{print $2}'|cut -c4-9|sort -n|uniq -c|grep -v '_'
1 bowtie
1 chr1
1 chr10
1 chr11
1 chr12
1 chr13
1 chr14
1 chr15
1 chr16
1 chr17
1 chr18
1 chr19
1 chr2
1 chr20
1 chr21
1 chr22
1 chr3
1 chr4
1 chr5
1 chr6
1 chr7
1 chr8
1 chr9
1 chrM
1 chrX
1 chrY
1 1.0
知識(shí)點(diǎn):awk awk用法宫补、cut cut用法檬姥、sort sort用法、uniq uniq用法粉怕、grep
awk健民、cut、sort贫贝、grep為Linux命令的重點(diǎn)和難點(diǎn)
十四題秉犹、上面的后綴為BAM 的文件的第二列,只有 0 和 16 兩個(gè)數(shù)字稚晚,用 cut/sort/uniq等命令統(tǒng)計(jì)它們的個(gè)數(shù)崇堵。
vip52@VM-0-15-ubuntu:~/test/rmDuplicate/samtools/single$ samtools view tmp.sorted.bam |cut -f2|sort |uniq -c
29 0
24 16
十五題、重新打開 rmDuplicate/samtools/paired 文件夾下面的后綴為BAM 的文件客燕,再次查看第二列鸳劳,并且統(tǒng)計(jì)
vip52@VM-0-15-ubuntu:~/test/rmDuplicate/picard/single$ samtools view ~/test/rmDuplicate/samtools/single/tmp.sorted.bam |cut -f2|sort|uniq -c
29 0
24 16
知識(shí)點(diǎn):cut、sort也搓、uniq
十六題赏廓、下載 http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip 文件,并且解壓傍妒,查看里面的文件夾結(jié)構(gòu)幔摸, 這個(gè)文件有2.3M,注意留心下載時(shí)間及下載速度颤练。
vip52@VM-0-15-ubuntu:~/test$ wget http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip
vip52@VM-0-15-ubuntu:~/test$ unzip sickle-results.zip
vip52@VM-0-15-ubuntu:~/test$ cd sickle-results/
vip52@VM-0-15-ubuntu:~/test/sickle-results$ tree
.
├── command.txt
├── single_tmp_fastqc.html
├── single_tmp_fastqc.zip
├── test1_fastqc.html
├── test1_fastqc.zip
├── test2_fastqc.html
├── test2_fastqc.zip
├── trimmed_output_file1_fastqc.html
├── trimmed_output_file1_fastqc.zip
├── trimmed_output_file2_fastqc.html
└── trimmed_output_file2_fastqc.zip
0 directories, 11 files
知識(shí)點(diǎn):unzip tree
十七題抚太、解壓 sickle-results/single_tmp_fastqc.zip 文件,并且進(jìn)入解壓后的文件夾昔案,找到 fastqc_data.txt 文件,并且搜索該文本文件以 >>開頭的有多少行电媳?
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | grep -c "^>>" | wc
1 1 3
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | grep -c "^>>" | wc -l
1
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | grep "^>>" | wc -l
24
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | awk '/^>>/{print $0}'|wc -l
24
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ less -S fastqc_data.txt | grep "^>>" |wc -l
24
vip52@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ more fastqc_data.txt |grep "^>>" |wc -l
知識(shí)點(diǎn):cat /less /more ;awk/grep
十八題踏揣、下載 http://www.biotrainee.com/jmzeng/tmp/hg38.tss 文件,去NCBI找到TP53/BRCA1等自己感興趣的基因?qū)?yīng)的 refseq數(shù)據(jù)庫 ID匾乓,然后找到它們的hg38.tss 文件的哪一行捞稿。
https://www.ncbi.nlm.nih.gov/gene/7157
vip52@VM-0-15-ubuntu:~/test$ wget http://www.biotrainee.com/jmzeng/tmp/hg38.tss
--2019-05-11 19:45:29-- http://www.biotrainee.com/jmzeng/tmp/hg38.tss
Resolving www.biotrainee.com (www.biotrainee.com)... 123.206.72.184
Connecting to www.biotrainee.com (www.biotrainee.com)|123.206.72.184|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2625188 (2.5M)
Saving to: ‘hg38.tss’
hg38.tss 100%[===========================================================>] 2.50M 271KB/s in 25s
2019-05-11 19:45:55 (102 KB/s) - ‘hg38.tss’ saved [2625188/2625188]
vip52@VM-0-15-ubuntu:~/test$ ls -lh
total 5.0M
-rw-rw-r-- 1 vip52 vip52 2.6M Jan 11 2017 hg38.tss
drwxrwxr-x 4 vip52 vip52 4.0K Nov 12 2016 rmDuplicate
-rw-rw-r-- 1 vip52 vip52 103K Nov 12 2016 rmDuplicate.zip
drwxrwxr-x 3 vip52 vip52 4.0K May 11 16:47 samtools
drwxrwxr-x 3 vip52 vip52 4.0K May 11 19:37 sickle-results
-rw-rw-r-- 1 vip52 vip52 2.3M Oct 6 2016 sickle-results.zip
-rw-rw-r-- 1 vip52 vip52 3.1K May 18 2017 test.bed
vip52@VM-0-15-ubuntu:~/test$ cat hg38.tss | grep -n 'NM_001126113'
29346:NM_001126113 chr17 7685550 7689550 1
知識(shí)點(diǎn):cat/grep
十九題、解析hg38.tss 文件,統(tǒng)計(jì)每條染色體的基因個(gè)數(shù)娱局。
vip52@VM-0-15-ubuntu:~/test$ cat hg38.tss|awk '{print $2}' |sort |uniq -c|grep -v '_'
6050 chr1
2824 chr10
3449 chr11
2931 chr12
1122 chr13
1883 chr14
2168 chr15
2507 chr16
3309 chr17
873 chr18
3817 chr19
4042 chr2
1676 chr20
868 chr21
1274 chr22
3277 chr3
2250 chr4
2684 chr5
3029 chr6
2720 chr7
2069 chr8
2301 chr9
2 chrM
2553 chrX
414 chrY
#也可以用這個(gè)命令
vip52@VM-0-15-ubuntu:~/test$ cat hg38.tss |cut -f2|sort|uniq -c | grep -v '_'
6050 chr1
2824 chr10
3449 chr11
2931 chr12
1122 chr13
1883 chr14
2168 chr15
2507 chr16
3309 chr17
873 chr18
3817 chr19
4042 chr2
1676 chr20
868 chr21
1274 chr22
3277 chr3
2250 chr4
2684 chr5
3029 chr6
2720 chr7
2069 chr8
2301 chr9
2 chrM
2553 chrX
414 chrY
知識(shí)點(diǎn):cut /sort /grep/awk/uniq
二十彰亥、解析hg38.tss 文件,統(tǒng)計(jì)NM和NR開頭的序列衰齐,了解NM和NR開頭的含義任斋。
vip52@VM-0-15-ubuntu:~/test$ cat hg38.tss |awk '{print$1}'|cut -c1-2|sort|uniq -c
51064 NM
15954 NR
知識(shí)點(diǎn):cat/grep/awk/cut/sort/uniq
原題目鏈接:http://www.bio-info-trainee.com/2900.html
生信技能樹-jimmy的個(gè)人空間 - 嗶哩嗶哩 ( ゜- ゜)つロ 乾杯~ Bilibili https://space.bilibili.com/338686099/#/
生信技能樹-免費(fèi)自學(xué)-生物信息視頻課程!