筆記從進階命令開始
-
cut
zless -S /teach/rna_testdata/database/gtf/gencode.v29.annotation.gtf.gz |cut -f 1,3-5 |less -S
-
paste
seq 10 |paste - - - -
seq 10 |paste - - - -|cut -f 1,2
-
利用多種命令實現(xiàn) fastq 轉 fasta
fastq 每4行一個單元
zless -S SRR1039510_1.fastq.gz|less -S
zless -S SRR1039510_1.fastq.gz|paste - - - -|less -S
zless -S SRR1039510_1.fastq.gz|paste - - - -|cut -f 1,2|tr '\t' '\n'|less -S
zless -S SRR1039510_1.fastq.gz|paste - - - -|cut -f 1,2|tr '\t' '\n'|tr '@' '>'|less -S
-
paste 轉換布局
ls *.gz|paste - -|less -S
ls /teach/rna_testdata/project/1.rna/2.raw_fq/*.gz|paste - -|less -S|cut -f 1 > 1 ls /teach/rna_testdata/project/1.rna/2.raw_fq/*.gz|paste - -|less -S|cut -f 2 > 2
從而能夠將文件分類嘁傀,以便后續(xù)處理
-
sort
zcat gencode.v29.annotation.gtf.gz|cut -f 1|less -S
sort -u 去重
zcat gencode.v29.annotation.gtf.gz|cut -f 1|sort -u
-
uniq
uniq -c 去重并顯示出現(xiàn)次數(shù)
zcat gencode.v29.annotation.gtf.gz|cut -f 1|sort|uniq -c
-
find
在指定路徑下搜索文件
$ find ./ -name "*gz" ./gencode.v25.annotation.gtf.gz ./gencode.v29.annotation.gtf.gz
-
tr
$ echo $PATH /home/lyshi/miniconda3/bin:/home/lyshi/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
將 ":" 替換為換行符 "\n"
-
wc
默認輸出:行數(shù)、字數(shù)媳溺、字節(jié)數(shù)
$ cat ~/.bashrc |wc 134 575 4367
分別 wc -l, wc -w, wc -m
-
bc 進入數(shù)學運算
scale= 設置小數(shù)點位數(shù)
-
sed
用 sed 完成替換
將 ":" 替換為 "####"
$ echo $PATH /home/lyshi/miniconda3/bin:/home/lyshi/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin $ echo $PATH|sed 's/:/####/g' /home/lyshi/miniconda3/bin####/home/lyshi/miniconda3/condabin####/usr/local/sbin####/usr/local/bin####/usr/sbin####/usr/bin####/sbin####/bin####/usr/games####/usr/local/games####/snap/bin
grep 查找文件中符合條件的行
awk 對特定的行中特定列進行操作
習題
-
統(tǒng)計reads_1.fq 文件中共有多少條序列信息
$ zcat SRR1039510_1_val_1.100000.fq.gz|paste - - - -| cut -f 1|wc -l 25000
因為每行開頭為 "@SRR"
$ zcat SRR1039510_1_val_1.100000.fq.gz|grep ^@SRR|wc -l 25000
-
輸出所有的reads_1.fq文件中的標識符(即以@開頭的那一行)
$ zcat SRR1039510_1_val_1.100000.fq.gz|paste - - - -| cut -f 1|head @SRR1039510.1 HWI-ST177:290:C0TECACXX:1:1101:1373:2104 length=63 @SRR1039510.2 HWI-ST177:290:C0TECACXX:1:1101:1340:2124 length=63 @SRR1039510.3 HWI-ST177:290:C0TECACXX:1:1101:1273:2183 length=63 @SRR1039510.4 HWI-ST177:290:C0TECACXX:1:1101:1562:2147 length=63 @SRR1039510.5 HWI-ST177:290:C0TECACXX:1:1101:1577:2181 length=63 @SRR1039510.6 HWI-ST177:290:C0TECACXX:1:1101:1650:2181 length=63 @SRR1039510.7 HWI-ST177:290:C0TECACXX:1:1101:1645:2203 length=63 @SRR1039510.8 HWI-ST177:290:C0TECACXX:1:1101:1597:2229 length=63 @SRR1039510.9 HWI-ST177:290:C0TECACXX:1:1101:1791:2146 length=63 @SRR1039510.10 HWI-ST177:290:C0TECACXX:1:1101:1902:2147 length=63
-
輸出reads_1.fq文件中的 所有序列信息(即每個序列的第二行)
$ zcat SRR1039510_1_val_1.100000.fq.gz|paste - - - -| cut -f 2|head TGGGAGGCTGAGGCAGGAGAATCACTTAAACCTGGGAGGCAGAGGTTACAGTGAGCCGAGATT AAAGAAGGCGACAGTGAGAAGGAGTCCGAGAAGAGTGATGGAGACCCAATAGTCGATCCTGAG CTGCTGGGCCCCAAGGTCCTCCTGGTCCCAGTGGTGAAGAAGGAAAGAGAGGCCCTAATGGGG CTTGGCTGCAGCCATCCCGCTTAGCCTGCCTCACCCACACCCGTGTGGTACCTTCAGCCCTGG TGAGACAGGTAATTCAGTATAGTAGATTAATATTTTTAATATATATTTTCCCTTAAGATTTCC ATTTCTCAGTGTAGAAATCATGTCTTCTTAATTGCTGAACCTTACTGCAAAAACTTGTGATGT ATCAAGAATACCAAAACAGTTTCCTAATATACAGTATTTGAAAGTGCTTGCCATATTGGCTCT CTCATTTTCATCTTCACCATCAACAGAGAGAGCAGCATACTTGCTTGCAGAACTGAACTT TCCAACCGCAGCTTGGCATCTTCGGTGGCCTGCAGCTCGTCCTCCAGCTCTTCCAGCTGCGTC CGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCGCGCTCTGCGAGGTACTTTTTCTA
-
輸出質(zhì)量值信息(即每個序列的第四行)
$ zcat SRR1039510_1_val_1.100000.fq.gz|paste - - - -| cut -f 4|head HJJJIJJJJJJJJIJJJGHHIJIIIIIIJJEHGGIJGIJIJJIJHHHGGFFDFFFDEDDDBDC HJJJJJJJJJJJIJIIGIJJJJGJHJJJHHDFFFE@CEEEDDDDDDDDDDDDDDDBDDDDDDD HJJJJJJJJJJJJJJJGIIIJJJJJHIJJJJHIJFHGIJJJJJJJHHHHHFFFDDDEDDDDDD HJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIJHJJIJJJJJHHFFFFEEEEEEEDDDDDDDB HIJJJJJJJEHJIJJJJIIIJJIIJJJJJJJJJJJJJJJJJJJJJJJJJEHJGI>FFCBGGGI HJJJJJJJJJJJHIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJHHIJJD FJGGJGIJJIIEIIJIIIGGGIGHIDGHGHGGCHFGHHBFGFBCCC4@AD@HIDIJJDHFFHH HIIIJJJJIIIIIJIJIGIIJJJJIJHIIIIIIGIGJJIIIIJJJJJIJJJJJJIGGIGJ HJJJJJJIJJJJJJJJJJJJJJJJJGIJJJIJJJJIJJJIJHHHGFFFFFEEEEEEDDDDDDB HJJJJIJJJJJJJDFGIJJJJJJJIJJJJJIIJFIJJJJJJJHFFFFEDDDD;CCDEEDDDDD
-
統(tǒng)計文件中reads_1.fq文件里面的序列的堿基總數(shù)
$ zless SRR1039510_1_val_1.100000.fq.gz |paste - - |cut -f 2|grep -io [ATCGN]|wc -w 1665852
-
統(tǒng)計reads_1.fq 中所有序列的第一位堿基的ATCGN分布情況(ATCGN分別個數(shù))(uniq)
$ zless -S SRR1039510_1_val_1.100000.fq.gz|paste - - - -|cut -f 2|grep -o ^[ACTGN]|sort|uniq -c 5923 A 6112 C 5802 G 7163 T
-
將reads_1.fq 轉為reads_1.fa文件(即將fastq轉化為fast)(tr)
$ zless -S SRR1039510_1_val_1.100000.fq.gz|paste - - - -|cut -f 1,2|tr '\t' '\n'|tr '@' '>' > ~/1_fasta $ cd ~ $ head 1_fasta >SRR1039510.1 HWI-ST177:290:C0TECACXX:1:1101:1373:2104 length=63 TGGGAGGCTGAGGCAGGAGAATCACTTAAACCTGGGAGGCAGAGGTTACAGTGAGCCGAGATT >SRR1039510.2 HWI-ST177:290:C0TECACXX:1:1101:1340:2124 length=63 AAAGAAGGCGACAGTGAGAAGGAGTCCGAGAAGAGTGATGGAGACCCAATAGTCGATCCTGAG >SRR1039510.3 HWI-ST177:290:C0TECACXX:1:1101:1273:2183 length=63 CTGCTGGGCCCCAAGGTCCTCCTGGTCCCAGTGGTGAAGAAGGAAAGAGAGGCCCTAATGGGG >SRR1039510.4 HWI-ST177:290:C0TECACXX:1:1101:1562:2147 length=63 CTTGGCTGCAGCCATCCCGCTTAGCCTGCCTCACCCACACCCGTGTGGTACCTTCAGCCCTGG >SRR1039510.5 HWI-ST177:290:C0TECACXX:1:1101:1577:2181 length=63 TGAGACAGGTAATTCAGTATAGTAGATTAATATTTTTAATATATATTTTCCCTTAAGATTTCC
-
刪除 reads_1.fa文件每條序列的前后五個堿基
# awk 里的if else并沒有完全看懂 $ less -S 1_fasta |awk '{if($0~/^>/)print $0;else print substr($0,6,length($0)-10)}'|head >SRR1039510.1 HWI-ST177:290:C0TECACXX:1:1101:1373:2104 length=63 GGCTGAGGCAGGAGAATCACTTAAACCTGGGAGGCAGAGGTTACAGTGAGCCG >SRR1039510.2 HWI-ST177:290:C0TECACXX:1:1101:1340:2124 length=63 AGGCGACAGTGAGAAGGAGTCCGAGAAGAGTGATGGAGACCCAATAGTCGATC >SRR1039510.3 HWI-ST177:290:C0TECACXX:1:1101:1273:2183 length=63 GGGCCCCAAGGTCCTCCTGGTCCCAGTGGTGAAGAAGGAAAGAGAGGCCCTAA >SRR1039510.4 HWI-ST177:290:C0TECACXX:1:1101:1562:2147 length=63 CTGCAGCCATCCCGCTTAGCCTGCCTCACCCACACCCGTGTGGTACCTTCAGC >SRR1039510.5 HWI-ST177:290:C0TECACXX:1:1101:1577:2181 length=63 CAGGTAATTCAGTATAGTAGATTAATATTTTTAATATATATTTTCCCTTAAGA
最后掀泳,向大家隆重推薦生信技能樹的一系列干貨雪隧!
- 生信技能樹全球公益巡講:https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g
- B站公益74小時生信工程師教學視頻合輯:https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
- 招學徒:https://mp.weixin.qq.com/s/KgbilzXnFjbKKunuw7NVfw