摘要
隨著大數(shù)據(jù)系統(tǒng)近年來(lái)的飛速發(fā)展赦拘,各種開(kāi)源的基準(zhǔn)測(cè)試被設(shè)計(jì)出來(lái)比較和評(píng)估這些系統(tǒng)的性能,并促進(jìn)了他們性能的提升角雷。文章首先給出了流行的benchmark的總覽绣硝。并且總結(jié)出benchmark側(cè)重測(cè)試的三個(gè)方面:
- workload generation techniques 負(fù)載生成技術(shù)
- workload input data generation techniques 輸入負(fù)載生成技術(shù)
- metrics 度量標(biāo)準(zhǔn)憔披。
當(dāng)前主流的大數(shù)據(jù)系統(tǒng)主要有三種:
- Hadoop and its related systems
- data stores(database management systems (DBMSs) and NoSQL)
-
specialized systems(connected graphs, continu- ous streams, and complex scientific data)
具體參考下圖(本文圖表引自文章原文)
Figure 1. Overview of big data systems
Table 1. Overview of the State-of-the-Art Open Source Big Data Benchmarks
當(dāng)前存在的benchmark可以主要分為三大類(lèi):
- Micro benchmarks. 用于評(píng)估單個(gè)系統(tǒng)組件或特定系統(tǒng)行為矩动,常見(jiàn)的有Word count, NNBench, TestDFSIO等
- End to end. 使用典型的應(yīng)用場(chǎng)景評(píng)估整個(gè)系統(tǒng)有巧,每個(gè)場(chǎng)景對(duì)應(yīng)一組相關(guān)的工作負(fù)載,常見(jiàn)的有TPC(Transaction Processing Performance Council)提供的一系列OLTP(On-Line Transaction Processing)查詢(xún)
-
Benchmark suites. 多個(gè)1和2的組合悲没,常見(jiàn)的有HiBench, CloudSuite, BigDataBench
Figure 2. Advent of big data benchmarks: A timeline
常見(jiàn)的NoSQL類(lèi)型及例子:
- key/ value stores (e.g., Amazon Dynamo, Cassandra, Linkedin Voldemort)
- column-oriented databases (e.g., BigTable and Hypertable)
- document- oriented stores (e.g., CouchDB and MongoDB)
針對(duì)圖數(shù)據(jù)的兩種系統(tǒng):
- graph databases such as Neo4j
- distributed graph processing systems such as Google Pregel