minimap2 我是用10G 以上基因組 +100G reads
默認(rèn)參數(shù)下
一般消耗20-40G 內(nèi)存;
存入文件時(shí)消耗80G
后來思考拒垃,-I 參數(shù)祠墅,對(duì)于一些大基因組 可以以消耗時(shí)間為代價(jià)难审,降低內(nèi)存消耗
-I NUM Load at most NUM target bases into RAM for indexing [4G]. If there are more than NUM bases in target.fa,
minimap2 needs to read query.fa multiple times to map it against each batch of target sequences.
NUM may be ending with k/K/m/M/g/G. NB: mapping quality is incorrect given a multi-part index.
Note:如果 基因組大于 -I 設(shè)置的大小 沙绝,就會(huì)是 multi-part index慢蜓;
這時(shí)副作用
(1) 比對(duì)質(zhì)量(mapping quality ) 會(huì)不準(zhǔn)確亚再,根據(jù)需要進(jìn)行取舍
(2) 使用 -a 參數(shù),以 sam 格式輸出,則不會(huì)有前面的SQ 行晨抡;
@SQ SN:C14E LN:145181
建議還在用sam 格式的同學(xué) 轉(zhuǎn)戰(zhàn) paf 格式吧氛悬,長度信息都在paf 中
PAF: a Pairwise mApping Format
Col Type Description
1 string Query sequence name
2 int Query sequence length
3 int Query start (0-based; BED-like; closed)
4 int Query end (0-based; BED-like; open)
5 char Relative strand: "+" or "-"
6 string Target sequence name
7 int Target sequence length
8 int Target start on original strand (0-based)
9 int Target end on original strand (0-based)
10 int Number of residue matches
11 int Alignment block length
12 int Mapping quality (0-255; 255 for missing)
默認(rèn)-I 是4G ; 也就是如果基因組過大耘柱,拆分為多份多次導(dǎo)入內(nèi)存中比對(duì)如捅;
以比對(duì)時(shí)間為代價(jià)降低內(nèi)存消耗, 建立索引時(shí)修改 -I 參數(shù)
minimap2 -I 3G -d ref.mmi ref.fasta