生信人的linux考試20題
一劣光、 在任意文件夾下面創(chuàng)建形如 1/2/3/4/5/6/7/8/9 格式的文件夾系列
vip39@VM-0-15-ubuntu:~/test$ mkdir -p 1/2/3/4/5/6/7/8/9
vip39@VM-0-15-ubuntu:~/test$ ls
1
vip39@VM-0-15-ubuntu:~/test$ tree
.
└── 1
└── 2
└── 3
└── 4
└── 5
└── 6
└── 7
└── 8
└── 9
9 directories, 0 files
- 這個(gè)題目考的
mkdir -p
的用法,詳見Linux Day1: cd/pwd/mkdir/rmdir
二、在創(chuàng)建好的文件夾下面褐墅,比如我的是 /Users/jimmy/tmp/1/2/3/4/5/6/7/8/9 ,里面創(chuàng)建文本文件 me.txt
三、在文本文件 me.txt 里面輸入內(nèi)容:
vip39@VM-0-15-ubuntu:~/test$ cd ./1/2/3/4/5/6/7/8/9/
vip39@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ touch me.txt
vip39@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ ls
me.txt
vip39@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ vim me.txt
vip39@VM-0-15-ubuntu:~/test/1/2/3/4/5/6/7/8/9$ cat me.txt
Go to: http://www.biotrainee.com/
I love bioinfomatics.
And you ?
-
cd ./1/2/3/4/5/6/7/8/9/
善用Tab鍵補(bǔ)全 -
touch
、vi
/vim
、cat
的使用,詳見:
Linux Day4: more/less/touch
Linux Day20:Vim
Linux Day3: cat/nl/head/tail - 這個(gè)題也可以使用:
vip39@VM-0-15-ubuntu:~/test$ cat > me.txt
Go to: http://www.biotrainee.com/
I love bioinfomatics.
And you ?
^C
vip39@VM-0-15-ubuntu:~/test$ cat me.txt
Go to: http://www.biotrainee.com/
I love bioinfomatics.
And you ?
-
>
的用法可以參考:Liunx Day15:管道和重定向
四床估、刪除上面創(chuàng)建的文件夾 1/2/3/4/5/6/7/8/9
及文本文件 me.txt
vip39@VM-0-15-ubuntu:~$ cd test/
vip39@VM-0-15-ubuntu:~/test$ ls
1
vip39@VM-0-15-ubuntu:~/test$ rm -r 1/
vip39@VM-0-15-ubuntu:~/test$ ls
vip39@VM-0-15-ubuntu:~/test$ tree
.
0 directories, 0 files
-
rm
的用法,詳見:Linux Day2: ls/cp/rm/mv
五诱渤、在任意文件夾下面創(chuàng)建 folder1~5這5個(gè)文件夾丐巫,然后每個(gè)文件夾下面繼續(xù)創(chuàng)建 folder1~5這5個(gè)文件夾,效果如下:
vip39@VM-0-15-ubuntu:~/test$ mkdir -p folder{1..5}/folder{1..5}
vip39@VM-0-15-ubuntu:~/test$ ls *
folder1:
folder1 folder2 folder3 folder4 folder5
folder2:
folder1 folder2 folder3 folder4 folder5
folder3:
folder1 folder2 folder3 folder4 folder5
folder4:
folder1 folder2 folder3 folder4 folder5
folder5:
folder1 folder2 folder3 folder4 folder5
六勺美、在第五題創(chuàng)建的每一個(gè)文件夾下面都 創(chuàng)建第二題文本文件 me.txt 递胧,內(nèi)容也要一樣。
vip39@VM-0-15-ubuntu:~/test$ echo folder{1..5}/folder{1..5}|xargs -n 1 cp me.txt
vip39@VM-0-15-ubuntu:~/test$ tree
.
|-- folder1
| |-- folder1
| | `-- me.txt
| |-- folder2
| | `-- me.txt
| |-- folder3
| | `-- me.txt
| |-- folder4
| | `-- me.txt
| `-- folder5
| `-- me.txt
|-- folder2
| |-- folder1
| | `-- me.txt
| |-- folder2
| | `-- me.txt
| |-- folder3
| | `-- me.txt
| |-- folder4
| | `-- me.txt
| `-- folder5
| `-- me.txt
|-- folder3
| |-- folder1
| | `-- me.txt
| |-- folder2
| | `-- me.txt
| |-- folder3
| | `-- me.txt
| |-- folder4
| | `-- me.txt
| `-- folder5
| `-- me.txt
|-- folder4
| |-- folder1
| | `-- me.txt
| |-- folder2
| | `-- me.txt
| |-- folder3
| | `-- me.txt
| |-- folder4
| | `-- me.txt
| `-- folder5
| `-- me.txt
|-- folder5
| |-- folder1
| | `-- me.txt
| |-- folder2
| | `-- me.txt
| |-- folder3
| | `-- me.txt
| |-- folder4
| | `-- me.txt
| `-- folder5
| `-- me.txt
`-- me.txt
30 directories, 26 files
-
xargs
的用法xargs命令
七赡茸,再次刪除掉前面幾個(gè)步驟建立的文件夾及文件
vip39@VM-0-15-ubuntu:~/test$ ls
folder1 folder2 folder3 folder4 folder5 me.txt
vip39@VM-0-15-ubuntu:~/test$ rm -rf folder*
vip39@VM-0-15-ubuntu:~/test$ ls
me.txt
vip39@VM-0-15-ubuntu:~/test$ rm me.txt
vip39@VM-0-15-ubuntu:~/test$ ls
vip39@VM-0-15-ubuntu:~/test$
八缎脾、下載 http://www.biotrainee.com/jmzeng/igv/test.bed 文件,后在里面選擇含有 H3K4me3 的那一行是第幾行占卧,該文件總共有幾行遗菠。
vip39@VM-0-15-ubuntu:~/test$ wget -c http://www.biotrainee.com/jmzeng/igv/test.bed
--2018-12-11 20:14:50-- http://www.biotrainee.com/jmzeng/igv/test.bed
Resolving www.biotrainee.com (www.biotrainee.com)... 123.206.72.184
Connecting to www.biotrainee.com (www.biotrainee.com)|123.206.72.184|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3099 (3.0K)
Saving to: ‘test.bed’
test.bed 100%[=========================>] 3.03K --.-KB/s in 0s
2018-12-11 20:14:50 (480 MB/s) - ‘test.bed’ saved [3099/3099]
vip39@VM-0-15-ubuntu:~/test$ ls
test.bed
vip39@VM-0-15-ubuntu:~/test$ cat test.bed | grep -n H3K4me3
8:chr1 9810 10438 ID=SRX387603;Name=H3K4me3%20(@%20HMLE);Title=GSM1280527:%20HMLE%20Twist3D%20H3K4me3%20rep2%3B%20Homo%20sapiens%3B%20ChIP-Seq;Cell%20group=Breast;<br>source_name=HMLE_Twist3D_H3K4me3;cell%20type=human%20mammary%20epithelial%20cells;transfected%20with=Twist1;culture%20type=sphere;chip%20antibody=H3K4me3;chip%20antibody%20vendor=Millipore; 222 . 9810 10438 0,226,255
vip39@VM-0-15-ubuntu:~/test$ cat test.bed |wc
10 88 3099
-
grep
及wc
的用法可見:
Linux Day21:grep/sed/awk
wc的用法
九联喘、下載 http://www.biotrainee.com/jmzeng/rmDuplicate.zip 文件,并且解壓辙纬,查看里面的文件夾結(jié)構(gòu)
# 下載
vip39@VM-0-15-ubuntu:~/test$ wget http://www.biotrainee.com/jmzeng/rmDuplicate.zip
--2018-12-11 21:02:42-- http://www.biotrainee.com/jmzeng/rmDuplicate.zip
Resolving www.biotrainee.com (www.biotrainee.com)... 123.206.72.184
Connecting to www.biotrainee.com (www.biotrainee.com)|123.206.72.184|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104931 (102K) [application/zip]
Saving to: ‘rmDuplicate.zip’
rmDuplicate.zip 100%[=========================>] 102.47K 523KB/s in 0.2s
2018-12-11 21:02:43 (523 KB/s) - ‘rmDuplicate.zip’ saved [104931/104931]
vip39@VM-0-15-ubuntu:~/test$ ls
rmDuplicate.zip test.bed
# 解壓
vip39@VM-0-15-ubuntu:~/test$ unzip rmDuplicate.zip
Archive: rmDuplicate.zip
creating: rmDuplicate/
creating: rmDuplicate/picard/
creating: rmDuplicate/picard/paired/
inflating: rmDuplicate/picard/paired/readme.txt
···
inflating: rmDuplicate/samtools/single/tmp.sorted.vcf.gz
# 查看文件結(jié)構(gòu)
vip39@VM-0-15-ubuntu:~/test$ cd rmDuplicate/
vip39@VM-0-15-ubuntu:~/test/rmDuplicate$ tree
.
├── picard
│ ├── paired
│ │ ├── readme.txt
│ │ ├── tmp.header
│ │ ├── tmp.MarkDuplicates.log
│ │ ├── tmp.metrics
│ │ ├── tmp.rmdup.bai
│ │ ├── tmp.rmdup.bam
│ │ ├── tmp.sam
│ │ └── tmp.sorted.bam
│ └── single
│ ├── readme.txt
│ ├── tmp.header
│ ├── tmp.MarkDuplicates.log
│ ├── tmp.metrics
│ ├── tmp.rmdup.bai
│ ├── tmp.rmdup.bam
│ ├── tmp.sam
│ └── tmp.sorted.bam
└── samtools
├── paired
│ ├── readme.txt
│ ├── tmp.header
│ ├── tmp.rmdup.bam
│ ├── tmp.rmdup.vcf.gz
│ ├── tmp.sam
│ ├── tmp.sorted.bam
│ └── tmp.sorted.vcf.gz
└── single
├── readme.txt
├── tmp.header
├── tmp.rmdup.bam
├── tmp.rmdup.vcf.gz
├── tmp.sam
├── tmp.sorted.bam
└── tmp.sorted.vcf.gz
6 directories, 30 files
- 解壓命令的使用Linux Day27:Linux 壓縮豁遭、解壓縮命令
十、打開第九題解壓的文件贺拣,進(jìn)入 rmDuplicate/samtools/single 文件夾里面蓖谢,查看后綴為 .sam 的文件,搞清楚 生物信息學(xué)里面的SAM/BAM 定義是什么
- 詳細(xì)可看淺談SAM格式
十一纵柿、安裝 samtools 軟件
vip39@VM-0-15-ubuntu:~/src$ source ~/miniconda3/bin/activate
(base) vip39@VM-0-15-ubuntu:~/src$ conda install samtools=1.8 y
# 檢查一下能否運(yùn)行
(base) vip39@VM-0-15-ubuntu:~/src$ samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.7 (using htslib 1.7)
Usage: samtools <command> [options]
- conda的安裝及使用:Conda的安裝
十二蜈抓、打開后綴為BAM 的文件,找到產(chǎn)生該文件的命令昂儒。 提示一下命令是:
十三題、根據(jù)上面的命令委可,找到我使用的參考基因組 /home/jianmingzeng/reference/index/bowtie/hg38 具體有多少條染色體
(base) vip39@VM-0-15-ubuntu:~$ samtools view -H ~/test/rmDuplicate/samtools/single/tmp.sorted.bam |awk '{print $2}'|cut -c4-9|sort -n|uniq -c|grep -v '_'
1 bowtie
1 chr1
1 chr10
1 chr11
1 chr12
1 chr13
1 chr14
1 chr15
1 chr16
1 chr17
1 chr18
1 chr19
1 chr2
1 chr20
1 chr21
1 chr22
1 chr3
1 chr4
1 chr5
1 chr6
1 chr7
1 chr8
1 chr9
1 chrM
1 chrX
1 chrY
1 1.0
(base) vip39@VM-0-15-ubuntu:~$ samtools view -H ~/test/rmDuplicate/samtools/single/tmp.sorted.bam |awk '{print $2}'|cut -c4-9|sort -n|uniq -c|grep -v '_'|wc
27 54 365
# 不算前兩個(gè)渊跋,應(yīng)該是25條
十四題、上面的后綴為BAM 的文件的第二列着倾,只有 0 和 16 兩個(gè)數(shù)字拾酝,用 cut/sort/uniq等命令統(tǒng)計(jì)它們的個(gè)數(shù)。
(base) vip39@VM-0-15-ubuntu:~$ samtools view ~/test/rmDuplicate/samtools/single/tmp.sorted.bam |cut -f2|sort|uniq -c
29 0
24 16
十五題卡者、重新打開 rmDuplicate/samtools/paired 文件夾下面的后綴為BAM 的文件蒿囤,再次查看第二列,并且統(tǒng)計(jì)
(base) vip39@VM-0-15-ubuntu:~/test/rmDuplicate/samtools/paired$ samtools view tmp.sorted.bam | cut -f2|sort -n |uniq -c
3 83
2 97
9 99
8 147
3 163
1 323
1 353
1 371
1 387
1 433
十六題崇决、下載 http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip 文件材诽,并且解壓,查看里面的文件夾結(jié)構(gòu)恒傻, 這個(gè)文件有2.3M脸侥,注意留心下載時(shí)間及下載速度。
(base) vip39@VM-0-15-ubuntu:~/test$ cd sickle-results/
(base) vip39@VM-0-15-ubuntu:~/test/sickle-results$ tree
.
├── command.txt
├── single_tmp_fastqc.html
├── single_tmp_fastqc.zip
├── test1_fastqc.html
├── test1_fastqc.zip
├── test2_fastqc.html
├── test2_fastqc.zip
├── trimmed_output_file1_fastqc.html
├── trimmed_output_file1_fastqc.zip
├── trimmed_output_file2_fastqc.html
└── trimmed_output_file2_fastqc.zip
十七題盈厘、解壓 sickle-results/single_tmp_fastqc.zip 文件睁枕,并且進(jìn)入解壓后的文件夾,找到 fastqc_data.txt 文件沸手,并且搜索該文本文件以 >>開頭的有多少行外遇?
(base) vip39@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | grep '^>>'|wc -l
24
# 也可以使用:
(base) vip39@VM-0-15-ubuntu:~/test/sickle-results/single_tmp_fastqc$ cat fastqc_data.txt | awk '/^>>/{print $0}'|wc -l
24
十八題、下載 http://www.biotrainee.com/jmzeng/tmp/hg38.tss 文件契吉,去NCBI找到TP53/BRCA1等自己感興趣的基因?qū)?yīng)的 refseq數(shù)據(jù)庫(kù) ID跳仿,然后找到它們的hg38.tss 文件的哪一行。
(base) vip39@VM-0-15-ubuntu:~/test$ cat hg38.tss | grep -n "NM_001126113"
29346:NM_001126113 chr17 7685550 7689550 1
十九題栅隐、解析hg38.tss 文件塔嬉,統(tǒng)計(jì)每條染色體的基因個(gè)數(shù)玩徊。
(base) vip39@VM-0-15-ubuntu:~/test$ cat hg38.tss |cut -f2|sort|uniq -c|grep -v '_'
6050 chr1
2824 chr10
3449 chr11
2931 chr12
1122 chr13
1883 chr14
2168 chr15
2507 chr16
3309 chr17
873 chr18
3817 chr19
4042 chr2
1676 chr20
868 chr21
1274 chr22
3277 chr3
2250 chr4
2684 chr5
3029 chr6
2720 chr7
2069 chr8
2301 chr9
2 chrM
2553 chrX
414 chrY
二十題、解析hg38.tss 文件谨究,統(tǒng)計(jì)NM和NR開頭的熟練恩袱,了解NM和NR開頭的含義。
(base) vip39@VM-0-15-ubuntu:~/test$ cat hg38.tss |awk '{print$1}'|cut -c1-2|sort|uniq -c
51064 NM
15954 NR
生信文件格式fastqc
資料推薦
- 生信菜鳥團(tuán)的淺談FastQ和FastA格式,以及測(cè)序數(shù)據(jù)質(zhì)量控制之FastQC
- 生信技能書論壇的blat簡(jiǎn)介與格式解讀
- 還有視頻講解:英語(yǔ)視頻胶哲;中文視頻
- 生信技能書系列視頻:https://www.bilibili.com/video/av28813815/?p=12
fasta和fastq格式文件的shell小練習(xí)
1)統(tǒng)計(jì)reads_1.fq 文件中共有多少條序列信息
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ ls
longreads.fq reads_12.fq reads_1.fq reads_2.fq simulate.pl
# 第一種:
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq
...
@r10000
GGTGATGCGCGGCTCCGTGCCGCCAAAGCCGTCCGGCACTGACTNGTCGCAG
+
E<**G2F;';H$%9>*0,;0%---<*9-4B7(5A!4C.C,<".5**$<6,:"
# 第二種:
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq | wc
40000 40000 228569
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq | paste - - - - | wc
10000 40000 2285692
# 我也試過這種:
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq |grep '^@' | wc
10219 10219 93042
# 嗯畔塔,還不知道問題出在哪里
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq |grep '^@'
···
@r9966
@(9=@B*;&G<4/F#51*>@B3&0H03@.90-"BHH.#7'*74/.?(&&145G'89#*?:?(!"+8@G02*6B<,#+CE9+?-&67*=1/&4A$:G<:;4965D;;)/B=*?B;'6F//1A#"%7+.1D@=/?93B:A3>.<D%69:/G'6),E4(F(41;'"3C)'?BEC;8$H7A?!5D%3D;-.B'%9>/88>9DEA"H8C6#4"5*63=
@r9967
···
# 嗯,然后及發(fā)現(xiàn)一些奇妙的東西
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq | paste - - - - | less -SN
2)輸出所有的reads_1.fq文件中的標(biāo)識(shí)符(即以@開頭的那一行)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq | paste - - - - | cut -f1
# 或者使用awk
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==1)print}' reads_1.fq
3) 輸出reads_1.fq文件中的 所有序列信息(即每個(gè)序列的第二行)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq | paste - - - - | cut -f2
4)輸出以‘+’及其后面的描述信息(即每個(gè)序列的第三行)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq | paste - - - - | cut -f3
5)輸出質(zhì)量值信息(即每個(gè)序列的第四行)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq | paste - - - - | cut -f4
# -c 計(jì)算符合范本樣式的列數(shù)鸯屿。
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==2)print}' reads_1.fq | grep -c N
6429
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==2)print}' reads_1.fq | grep N |wc
6429 6429 782897
7) 統(tǒng)計(jì)文件中reads_1.fq文件里面的序列的堿基總數(shù)
# -o 只輸出文件中匹配到的部分澈吨。
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==2)print}' reads_1.fq | grep -o [ATCGN]|wc
1088399 1088399 2176798
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==2)print length}' reads_1.fq | paste -s -d + |bc
8)計(jì)算reads_1.fq 所有的reads中N堿基的總數(shù)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==2)print}' reads_1.fq | grep -o N |wc
26001 26001 52002
9)統(tǒng)計(jì)reads_1.fq 中測(cè)序堿基質(zhì)量值恰好為Q20的個(gè)數(shù)
-
目前Illumina機(jī)器得到的基本是illumina 1.8方案。
堿基質(zhì)量與對(duì)應(yīng)的ASCII字符
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==0)print}' reads_1.fq | grep -o 5 |wc
21369 21369 42738
10)統(tǒng)計(jì)reads_1.fq 中測(cè)序堿基質(zhì)量值恰好為Q30的個(gè)數(shù)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==0)print}' reads_1.fq | grep -o ? |wc
21574 21574 43148
11)統(tǒng)計(jì)reads_1.fq 中所有序列的第一位堿基的ATCGNatcg分布情況
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ awk '{if(NR%4==2)print}' reads_1.fq | cut -c1 |sort|uniq -c
2184 A
2203 C
2219 G
1141 N
2253 T
12)將reads_1.fq 轉(zhuǎn)為reads_1.fa文件(即將fastq轉(zhuǎn)化為fasta)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fq | paste - - - - | cut -f1,2|tr '\t' '\n'|tr '@' '>' > reads_1.fa
13) 統(tǒng)計(jì)上述reads_1.fa文件中共有多少條序列
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ wc reads_1.fa
20000 20000 1167293 reads_1.fa
14)計(jì)算reads_1.fa文件中總的堿基序列的GC數(shù)量
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fa |grep -o G|wc
264740 264740 529480
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fa |grep -o C|wc
265243 265243 530486
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fa |grep -o [GC]|wc
529983 529983 1059966
15)刪除 reads_1.fa文件中的每條序列的N堿基
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fa |tr -d "N"
16)刪除 reads_1.fa文件中的含有N堿基的序列
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fa |paste - -|grep -v N | wc
3571 7142 340122
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fa |paste - -|grep -v N | tr '\t' '\n'
17) 刪除 reads_1.fa文件中的短于65bp的序列
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat reads_1.fa |paste - -|awk '{if (length($2)>65) print}'|wc
7076 14152 992399
18) 刪除 reads_1.fa文件每條序列的前后五個(gè)堿基
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ head reads_1.fa|paste - - | cut -f2|cut -c5-
# 上面是前5個(gè)
生信格式SAM寄摆、BAM
資料推薦
- 生信菜鳥團(tuán)的淺談SAM格式
- 生信媛的高通量數(shù)據(jù)分析必須知道的山姆大叔(SAM)
- 生信技能樹系列視頻數(shù)據(jù)格式,以及SAM谅辣、BAM練習(xí)題講解
- 生信菜鳥團(tuán)的sam和bam格式文件的shell小練習(xí)
sam和bam格式文件的shell小練習(xí)
1) 統(tǒng)計(jì)共多少條reads(pair-end reads這里算一條)參與了比對(duì)參考基因組
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@'|wc
20000 391929 7049181
# 左右兩個(gè)序列算作一條,所以為10000
2) 統(tǒng)計(jì)共有多少種比對(duì)的類型(即第二列數(shù)值有多少種)及其分布婶恼。
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@' | cut -f2|sort|uniq -c|sort -k1,1nr
4650 163
4650 83
4516 147
4516 99
213 141
213 77
165 137
165 69
153 133
153 73
136 165
136 89
125 101
125 153
24 129
24 65
16 113
16 177
2 161
2 81
- picard sam flag:https://broadinstitute.github.io/picard/explain-flags.html
picard sam flag
3)篩選出比對(duì)失敗的reads桑阶,看看序列特征。
# 第6列的* 代表為比對(duì)失敗
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@' |awk '{if($6=="*")print}'|wc
1005 12608 255140
4) 比對(duì)失敗的reads區(qū)分成單端失敗和雙端失敗情況勾邦,并且拿到序列ID
# 單端失敗
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@' |awk '{if($6=="*")print $1}'|sort|uniq -c|grep -w 1
# 雙端失敗
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@' |awk '{if($6=="*")print $1}'|sort|uniq -c|grep -w 2
5) 篩選出比對(duì)質(zhì)量值大于30的情況(看第5列)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@' |awk '{if($5>30)print}'|wc
18632 372088 6662664
6) 篩選出比對(duì)成功蚣录,但是并不是完全匹配的序列
“M”表示 match或 mismatch;
“I”表示 insert
“D”表示 deletion
“N”表示 skipped(跳過這段區(qū)域)
“S”表示 soft clipping(被剪切的序列存在于序列中)
“H”表示 hard clipping(被剪切的序列不存在于序列中)
“P”表示 padding
“=”表示 match
“X”表示 mismatch(錯(cuò)配眷篇,位置是一一對(duì)應(yīng)的)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@' |awk '{if($6!="*")print$6}'|grep "[IDNSHPX]"|wc
1900 1900 18522
7) 篩選出inset size長(zhǎng)度大于1250bp的 pair-end reads
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@' |awk '{if($7>1250)print}'|less -S
8) 統(tǒng)計(jì)參考基因組上面各條染色體的成功比對(duì)reads數(shù)量
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@' |cut -f3|sort -u
*
gi|9626243|ref|NC_001416.1|
9) 篩選出原始fq序列里面有N的比對(duì)情況
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@'|awk '{if($10~N)print}'|wc
20000 391929 7049181
10) 篩選出原始fq序列里面有N萎河,但是比對(duì)的時(shí)候卻是完全匹配的情況
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@'|awk '{if($10 ~ N)print}'|awk '{if($6 !~ "[IDNSHP]")print}'|awk '{if($6!="*")print}'|less -S
11) sam文件里面的頭文件行數(shù)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ grep '^@' tmp.sam|wc
3 19 262
12) sam文件里每一行的tags個(gè)數(shù)一樣嗎;13) sam文件里每一行的tags個(gè)數(shù)分別是多少個(gè)
14) sam文件里記錄的參考基因組染色體長(zhǎng)度分別是?
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ head tmp.sam | grep 'LN'
@SQ SN:gi|9626243|ref|NC_001416.1| LN:48502
15) 找到比對(duì)情況有insertion情況的
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@'|awk '{if($6~I)print}'|less -S
16) 找到比對(duì)情況有deletion情況的
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@'|awk '{if($6~D)print}'|less -S
17)取出位于參考基因組某區(qū)域的比對(duì)記錄蕉饼,比如 5013到50130 區(qū)域
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@'|awk '{if($4>5013 && $4 <50130)print}'|less -S
18) 把sam文件按照染色體以及起始坐標(biāo)排序
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.sam | grep -v '^@'|awk '{print $4}'|sort -n
# 還有點(diǎn)問題
19) 找到 102M3D11M 的比對(duì)情況虐杯,計(jì)算其reads片段長(zhǎng)度
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ grep 102M3D11M
tmp.sam |cut -f 10|wc
1 1 114
# 所以就114咯
20) 安裝samtools軟件后使用samtools軟件的各個(gè)功能嘗試把上述題目重新做一遍。
vip39@VM-0-15-ubuntu:~$ source miniconda3/bin/activate
(base) vip39@VM-0-15-ubuntu:~$ cd src/
(base) vip39@VM-0-15-ubuntu:~/src$ conda install samtools=1.7 y
生信格式VCF
資料推薦
- 生信菜鳥團(tuán)的vcf格式椎椰,略略略~~厦幅;還有我的基因組28-必須要理解vcf格式記錄的變異位點(diǎn)信息 ;以及轉(zhuǎn)載-VCF格式詳解.
- 生信技能樹系列視頻,詳見B站;
- 生信星球的VCF格式
VCF格式文件的shell小練習(xí)
1.把突變記錄的vcf文件區(qū)分成 INDEL和SNP條目
# SNP:
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.vcf | grep -v '^#'|less -S|awk '{if (length($4)==1 && length($5)==1) print}'
gi|9626243|ref|NC_001416.1| 1104 . C A 225 . DP=43;VDB=0.162843;SGB=-0.693079;MQSB=0.981133;MQ0F=0;AC=2;AN=2;DP4=0,0,8,21;MQ=41 GT:PL 1/1:255,84,0
gi|9626243|ref|NC_001416.1| 1344 . G T 225 . DP=37;VDB=0.273288;SGB=-0.690438;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,9,8;MQ=42 GT:PL 1/1:255,51,0
gi|9626243|ref|NC_001416.1| 2143 . C G 225 . DP=46;VDB=0.902087;SGB=-0.692831;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,10,14;MQ=42 GT:PL 1/1:255,72,0
gi|9626243|ref|NC_001416.1| 3316 . T C 225 . DP=59;VDB=0.712644;SGB=-0.69311;MQSB=0.899452;MQ0F=0;AC=2;AN=2;DP4=0,0,18,13;MQ=41 GT:PL 1/1:255,93,0
gi|9626243|ref|NC_001416.1| 3406 . G T 218 . DP=40;VDB=0.0470228;SGB=-0.69168;MQSB=0.920044;MQ0F=0;AC=2;AN=2;DP4=0,0,10,9;MQ=41 GT:PL 1/1:248,54,0
gi|9626243|ref|NC_001416.1| 5812 . A C 24.4299 . DP=13;VDB=0.0618664;SGB=-0.511536;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,2,1;MQ=30 GT:PL 1/1:54,9,0
gi|9626243|ref|NC_001416.1| 7089 . A C 208 . DP=26;VDB=0.135432;SGB=-0.688148;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,3,12;MQ=42 GT:PL 1/1:238,45,0
gi|9626243|ref|NC_001416.1| 9632 . T A 97 . DP=17;VDB=0.677364;SGB=-0.636426;MQSB=1.01283;MQ0F=0;AC=2;AN=2;DP4=0,0,3,4;MQ=42 GT:PL 1/1:154,44,29
gi|9626243|ref|NC_001416.1| 9642 . T A 132 . DP=20;VDB=0.959419;SGB=-0.670168;MQSB=1.00775;MQ0F=0;AC=2;AN=2;DP4=0,0,4,6;MQ=42 GT:PL 1/1:162,30,0
gi|9626243|ref|NC_001416.1| 12512 . G A 225 . DP=46;VDB=0.828249;SGB=-0.692562;MQSB=0.261423;MQ0F=0;AC=2;AN=2;DP4=0,0,14,8;MQ=41 GT:PL 1/1:255,66,0
gi|9626243|ref|NC_001416.1| 14897 . G C 225 . DP=38;VDB=0.449535;SGB=-0.689466;MQSB=0.976745;MQ0F=0;AC=2;AN=2;DP4=0,0,10,6;MQ=41 GT:PL 1/1:255,48,0
...
# indel:
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.vcf | grep -v '^#'|less -S|awk '{if (length($4)!=1 || length($5)!=1) print}'
gi|9626243|ref|NC_001416.1| 2 . GGCG GGCGCGGGGGCG 9.81282 . INDEL;IDV=1;IMF=0.5;DP=2;VDB=0.02;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=1,0,1,0;MQ=33 GT:PL 1/1:36,1,0
gi|9626243|ref|NC_001416.1| 245 . ATT AT 157 . INDEL;IDV=33;IMF=0.942857;DP=35;VDB=0.518706;SGB=-0.692562;MQSB=0.121547;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=6,7,13,9;MQ=36 GT:PL 0/1:192,0,23
gi|9626243|ref|NC_001416.1| 351 . ATGCTGAAATT A 108 . INDEL;IDV=1;IMF=0.0285714;DP=35;VDB=0.511365;SGB=-0.688148;MQSB=0.406871;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=8,12,8,7;MQ=28 GT:PL 0/1:141,0,110
gi|9626243|ref|NC_001416.1| 353 . GCTGAAATTGA G 215 . INDEL;IDV=24;IMF=0.685714;DP=35;VDB=0.58815;SGB=-0.692976;MQSB=0.711476;MQ0F=0;AC=2;AN=2;DP4=4,5,13,13;MQ=28 GT:PL 1/1:244,0,3
gi|9626243|ref|NC_001416.1| 2817 . GA G 175 . INDEL;IDV=41;IMF=0.911111;DP=45;VDB=0.367939;SGB=-0.693136;MQSB=0.45851;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=3,7,20,15;MQ=41 GT:PL 0/1:210,0,28
gi|9626243|ref|NC_001416.1| 2951 . ACCC A 159 . INDEL;IDV=1;IMF=0.0322581;DP=31;VDB=0.195299;SGB=-0.691153;MQSB=0.340099;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=5,8,9,9;MQ=38 GT:PL 0/1:193,0,16
gi|9626243|ref|NC_001416.1| 2952 . CCCACC CCC 228 . INDEL;IDV=28;IMF=0.903226;DP=31;VDB=0.210377;SGB=-0.692831;MQSB=0.340099;MQ0F=0;AC=2;AN=2;DP4=2,5,12,12;MQ=38 GT:PL 1/1:255,27,0
gi|9626243|ref|NC_001416.1| 3262 . GCC GC 152 . INDEL;IDV=1;IMF=0.0217391;DP=46;VDB=0.483903;SGB=-0.69311;MQSB=0.887966;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=7,8,13,18;MQ=41 GT:PL 0/1:187,0,17
gi|9626243|ref|NC_001416.1| 3264 . CA C 150 . INDEL;IDV=41;IMF=0.803922;DP=51;VDB=0.874867;SGB=-0.69312;MQSB=0.94394;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=10,9,14,18;MQ=41 GT:PL 0/1:184,0,26
gi|9626243|ref|NC_001416.1| 3634 . ACGC AC 228 . INDEL;IDV=25;IMF=0.892857;DP=28;VDB=0.621512;SGB=-0.692067;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=3,5,15,5;MQ=41 GT:PL 1/1:255,19,0
gi|9626243|ref|NC_001416.1| 6290 . GT G 167 . INDEL;IDV=38;IMF=0.926829;DP=41;VDB=0.910269;SGB=-0.693097;MQSB=0.903761;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=7,4
...
2.統(tǒng)計(jì)INDEL和SNP條目的各自的平均測(cè)序深度
3.把INDEL條目再區(qū)分成insertion和deletion情況
# insertion
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.vcf | grep -v '^#'|less -S|awk '{if (length($4)<length($5)) print}'
gi|9626243|ref|NC_001416.1| 2 . GGCG GGCGCGGGGGCG 9.81282 . INDEL;IDV=1;IMF=0.5;DP=2;VDB=0.02;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=1,0,1,0;MQ=33 GT:PL1/1:36,1,0
# deletion
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.vcf | grep -v '^#'|less -S|awk '{if (length($4)>length($5)) print}'
gi|9626243|ref|NC_001416.1| 245 . ATT AT 157 . INDEL;IDV=33;IMF=0.942857;DP=35;VDB=0.518706;SGB=-0.692562;MQSB=0.121547;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=6,7,13,9;MQ=36 GT:PL 0/1:192,0,23
gi|9626243|ref|NC_001416.1| 351 . ATGCTGAAATT A 108 . INDEL;IDV=1;IMF=0.0285714;DP=35;VDB=0.511365;SGB=-0.688148;MQSB=0.406871;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=8,12,8,7;MQ=28 GT:PL 0/1:141,0,110
gi|9626243|ref|NC_001416.1| 353 . GCTGAAATTGA G 215 . INDEL;IDV=24;IMF=0.685714;DP=35;VDB=0.58815;SGB=-0.692976;MQSB=0.711476;MQ0F=0;AC=2;AN=2;DP4=4,5,13,13;MQ=28 GT:PL 1/1:244,0,3
gi|9626243|ref|NC_001416.1| 2817 . GA G 175 . INDEL;IDV=41;IMF=0.911111;DP=45;VDB=0.367939;SGB=-0.693136;MQSB=0.45851;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=3,7,20,15;MQ=41 GT:PL 0/1:210,0,28
gi|9626243|ref|NC_001416.1| 2951 . ACCC A 159 . INDEL;IDV=1;IMF=0.0322581;DP=31;VDB=0.195299;SGB=-0.691153;MQSB=0.340099;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=5,8,9,9;MQ=38 GT:PL 0/1:193,0,16
gi|9626243|ref|NC_001416.1| 2952 . CCCACC CCC 228 . INDEL;IDV=28;IMF=0.903226;DP=31;VDB=0.210377;SGB=-0.692831;MQSB=0.340099;MQ0F=0;AC=2;AN=2;DP4=2,5,12,12;MQ=38 GT:PL 1/1:255,27,0
gi|9626243|ref|NC_001416.1| 3262 . GCC GC 152 . INDEL;IDV=1;IMF=0.0217391;DP=46;VDB=0.483903;SGB=-0.69311;MQSB=0.887966;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=7,8,13,18;MQ=41 GT:PL 0/1:187,0,17
gi|9626243|ref|NC_001416.1| 3264 . CA C 150 . INDEL;IDV=41;IMF=0.803922;DP=51;VDB=0.874867;SGB=-0.69312;MQSB=0.94394;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2
...
4.統(tǒng)計(jì)SNP條目的突變組合分布頻率
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.vcf |grep -v '^#'|awk '{if (length($4)==1 && length($5)==1) print}'|cut -f4,5|sort|uniq -c
7 A C
1 A G
4 A T
2 C A
4 C G
3 G A
2 G C
4 G T
6 T A
1 T C
2 T G
5.找到基因型不是 1/1 的條目,個(gè)數(shù)
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.vcf |grep -v '^#'|awk '{ print $10}'|grep -v '^1/1'
0/1:192,0,23
0/1:141,0,110
0/1:210,0,28
0/1:193,0,16
0/1:187,0,17
0/1:184,0,26
0/1:202,0,22
0/1:116,0,41
0/1:254,0,30
0/1:180,0,38
0/1:184,0,41
0/1:167,0,21
0/1:197,0,115
0/1:192,0,37
0/1:45,0,87
0/1:47,0,128
0/1:44,0,157
0/1:255,0,13
0/1:239,0,13
0/1:198,0,20
0/1:236,0,93
0/1:194,0,94
0/1:195,0,56
0/1:67,0,168
0/1:255,0,20
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.vcf |grep -v '^#'|awk '{ print $10}'|grep -v '^1/1'|wc
25 25 326
6.篩選測(cè)序深度大于20的條目
7.篩選變異位點(diǎn)質(zhì)量值大于30的條目
vip39@VM-0-15-ubuntu:~/test/bowtie2-2.3.4.3-linux-x86_64/example/reads$ cat tmp.vcf |awk '{if ($6>30) print}'|grep -v '^#'|less -S|head
gi|9626243|ref|NC_001416.1| 245 . ATT AT 157 . INDEL;IDV=33;IMF=0.942857;DP=35;VDB=0.518706;SGB=-0.692562;MQSB=0.121547;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=6,7,13,9;MQ=36 GT:PL 0/1:192,0,23
gi|9626243|ref|NC_001416.1| 351 . ATGCTGAAATT A 108 . INDEL;IDV=1;IMF=0.0285714;DP=35;VDB=0.511365;SGB=-0.688148;MQSB=0.406871;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=8,12,8,7;MQ=28 GT:PL 0/1:141,0,110
gi|9626243|ref|NC_001416.1| 353 . GCTGAAATTGA G 215 . INDEL;IDV=24;IMF=0.685714;DP=35;VDB=0.58815;SGB=-0.692976;MQSB=0.711476;MQ0F=0;AC=2;AN=2;DP4=4,5,13,13;MQ=28 GT:PL 1/1:244,0,3
gi|9626243|ref|NC_001416.1| 1104 . C A 225 . DP=43;VDB=0.162843;SGB=-0.693079;MQSB=0.981133;MQ0F=0;AC=2;AN=2;DP4=0,0,8,21;MQ=41 GT:PL 1/1:255,84,0
gi|9626243|ref|NC_001416.1| 1344 . G T 225 . DP=37;VDB=0.273288;SGB=-0.690438;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,9,8;MQ=42 GT:PL 1/1:255,51,0
gi|9626243|ref|NC_001416.1| 2143 . C G 225 . DP=46;VDB=0.902087;SGB=-0.692831;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,10,14;MQ=42 GT:PL 1/1:255,72,0
gi|9626243|ref|NC_001416.1| 2817 . GA G 175 . INDEL;IDV=41;IMF=0.911111;DP=45;VDB=0.367939;SGB=-0.693136;MQSB=0.45851;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=3,7,20,15;MQ=41 GT:PL 0/1:210,0,28
gi|9626243|ref|NC_001416.1| 2951 . ACCC A 159 . INDEL;IDV=1;IMF=0.0322581;DP=31;VDB=0.195299;SGB=-0.691153;MQSB=0.340099;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=5,8,9,9;MQ=38 GT:PL 0/1:193,0,16
gi|9626243|ref|NC_001416.1| 2952 . CCCACC CCC 228 . INDEL;IDV=28;IMF=0.903226;DP=31;VDB=0.210377;SGB=-0.692831;MQSB=0.340099;MQ0F=0;AC=2;AN=2;DP4=2,5,12,12;MQ=38 GT:PL 1/1:255,27,0
gi|9626243|ref|NC_001416.1| 3262 . GCC GC 152 . INDEL;IDV=1;IMF=0.0217391;DP=46;VDB=0.483903;SGB=-0.69311;MQSB=0.887966;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=7,8,13,18;MQ=41 GT:PL 0/1:187,0,17
8.組合篩選變異位點(diǎn)質(zhì)量值大于30并且深度大于20的條目
9. 理解DP4=4,7,11,18 這樣的字段慨飘,就是 Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases 計(jì)算每個(gè)變異位點(diǎn)的 AF
10.在前面步驟的bam文件里面找到這個(gè)vcf文件的某一個(gè)突變位點(diǎn)的測(cè)序深度表明的那些reads确憨,并且在IGV里面可視化bam和vcf定位到該變異位點(diǎn)。
生信技能樹公益視頻合輯:學(xué)習(xí)順序是linux瓤的,r休弃,軟件安裝,geo圈膏,小技巧塔猾,ngs組學(xué)!
請(qǐng)猛戳下面鏈接:
B站鏈接:https://m.bilibili.com/space/338686099
YouTube鏈接:https://m.youtube.com/channel/UC67sImqK7V8tSWHMG8azIVA/playlists
生信工程師入門最佳指南:https://mp.weixin.qq.com/s/vaX4ttaLIa19MefD86WfUA
學(xué)徒培養(yǎng):https://mp.weixin.qq.com/s/3jw3_PgZXYd7FomxEMxFmw