我有時(shí)候會(huì)對(duì)DIffBind的輸出結(jié)果進(jìn)行下修改孤里,然后放到deeptools-computeMatrix里面去。跟之前的HOMER一樣,computeMatrix要求的也是bed格式
chr1 3204562 3661579 NM_001011874 Xkr4 -
chr1 4481008 4486494 NM_011441 Sox17 -
chr1 4763278 4775807 NM_001177658 Mrpl15 -
chr1 4797973 4836816 NM_008866 Lypla1 +
然后我就正常按照awk的操作進(jìn)行了提取
awk -F "," 'BEGIN {OFS="\t"} $9 < -1 && $11 < 0.05 {print $1,$2,$3,$12,$5}' diff.csv | sed 's/\"http://g' | sort -k1,1 -k2,2n > test.bed
head -n 1 test.bed
Chr1 113 1134 peak_1 *
分別是提取Fold(即11) < 0.05,然后提取1,2,3,12,5列膊畴,然后排下序
原諒我的不正規(guī)的bed格式……這里關(guān)系不太大
但當(dāng)我去使用computeMatrix的使用,總是會(huì)出現(xiàn)報(bào)錯(cuò)
computeMatrix reference-point -S a.bw -R ../test.bed -a 500 -b 500 --referencePoint center --binSize 10 -p 50 -o computerMatrix_Diff.gz
Traceback (most recent call last):
File "/opt/biosoft/deepTools2.0/bin/computeMatrix", line 14, in <module>
main(args)
File "/opt/biosoft/deepTools2.0/lib/python2.7/site-packages/deeptools/computeMatrix.py", line 421, in main
hm.computeMatrix(scores_file_list, args.regionsFileName, parameters, blackListFileName=args.blackListFileName, verbose=args.verbose, allArgs=args)
File "/opt/biosoft/deepTools2.0/lib/python2.7/site-packages/deeptools/heatmapper.py", line 264, in computeMatrix
verbose=verbose)
File "/opt/biosoft/deepTools2.0/lib/python2.7/site-packages/deeptools/mapReduce.py", line 85, in mapReduce
bed_interval_tree = GTF(bedFile, defaultGroup=defaultGroup, transcriptID=transcriptID, exonID=exonID, transcript_id_designator=transcript_id_designator, keepExons=keepExons)
File "/opt/biosoft/deepTools2.0/lib/python2.7/site-packages/deeptoolsintervals/parse.py", line 595, in __init__
self.parseBED(fp, line, 3, labelColumn)
File "/opt/biosoft/deepTools2.0/lib/python2.7/site-packages/deeptoolsintervals/parse.py", line 362, in parseBED
self.parseBEDcore(line, ncols)
File "/opt/biosoft/deepTools2.0/lib/python2.7/site-packages/deeptoolsintervals/parse.py", line 225, in parseBEDcore
if int(cols[1]) < 0:
ValueError: invalid literal for int() with base 10: 'start'
根據(jù)網(wǎng)上的Question: ValueError: invalid literal for int() with base 10: 'start' computeMatrix of deeptools 問(wèn)題病游,我認(rèn)為是我的bed有header唇跨,即可能還留有seqnames、start衬衬、end這種表頭轻绞。但剛才也看到了,我已經(jīng)head過(guò)了佣耐,并沒(méi)有表頭的殘留政勃。不過(guò)我也想到了是否是因?yàn)槲襰ort了一下,表頭到了最后一行去了兼砖,果不其然奸远。
tail -n 1 test.bed
seqnames start end feature_id strand
但這就很奇怪了,我明明是設(shè)定了 $9 < -1 && $11 < 0.05
讽挟,照理說(shuō)并不會(huì)有錯(cuò)誤的懒叛,因?yàn)椴还苁荈old還是FDR,照理說(shuō)都是字符串耽梅,不應(yīng)該會(huì)<-1或者<0.05薛窥。然后我在網(wǎng)上一查,發(fā)現(xiàn)了一個(gè)比較坑爹的事情:awk greater than why show string value?
里面提到了如果你要比較混合類(lèi)型大小的話(huà),那么數(shù)字就會(huì)自動(dòng)轉(zhuǎn)換成字符串诅迷,然后字符串之間就會(huì)進(jìn)行比較了佩番。
When comparing operands of mixed types, numeric operands are converted to strings using the value of CONVFMT. ... CONVFMT's default value is "%.6g", which prints a value with at least six significant digits.
但我感覺(jué)不僅僅是這個(gè)問(wèn)題,因?yàn)槿绻以O(shè)定了
vim test.txt
Fold FDR
2 0.04
然后awk操作罢杉,并不會(huì)返回值
awk -F "\t" '$1 < -1 && $2 < 0.05' test.txt
而如果我是
vim test.txt
"Fold" "FDR"
2 0.04
就會(huì)返回值了
$ awk -F "\t" '$1 < -1 && $2 < 0.05' test.txt
"Fold" "FDR"
這樣看起來(lái)似乎是引號(hào)的問(wèn)題……具體原因我也不知道
不過(guò)如果要避免這個(gè)問(wèn)題的話(huà)趟畏,要么一開(kāi)始就用 sed 's/\"http://g'
把雙引號(hào)去掉,要么就是 sed 1d
去掉第一行