這邊的解決方案是增加 --forceTableScan
--forceTableScan
Forces mongodump to scan the data store directly: typically, mongodump saves entries as they appear in the index of the _id field. Use --forceTableScan to skip the index and scan the data directly. Typically there are two cases where this behavior is preferable to the default:
If you have key sizes over 800 bytes that would not be present in the _id index.
Your database uses a custom _id field.
When you run with --forceTableScan, mongodump does not use $snapshot. As a result, the dump produced by mongodump can reflect the state of the database at many different points in time.
首先有一點漩怎,我使用過的是自己設(shè)的_id糊秆。正常情況下腔剂,mongodump 會通過mongo的_id 這個索引去讀取杏愤,會產(chǎn)生大量的隨機(jī)讀靡砌。使用這個參數(shù),可以不通過_id去讀取珊楼。
mongodump默認(rèn)使用snapshot通殃,會通過掃描_id 索引,然后去讀取實際的文檔厕宗。TableScan會按照mongodb的物理存儲順序進(jìn)行掃描画舌,沒有讀取index的過程。但是tablescan潛在的問題是已慢,如果一個文檔在dump的過程中移動了(物理上)曲聂,有可能會最終輸出兩次。