概述
最近在看Netty的源碼吆视,關(guān)注了下其隊(duì)列的實(shí)現(xiàn)情妖;Netty中基于不同的IO模型瀑志,提供了不同的線程實(shí)現(xiàn):
- BIO:ThreadPerChannelEventLoop
每個(gè)Channel一個(gè)線程贩猎,采用的隊(duì)列為LinkedBlockingQueue - NIO:NioEventLoop(水平觸發(fā))
每個(gè)線程一個(gè)Selector蜗顽,可以注冊多個(gè)Channel,采用的隊(duì)列為MpscChunkedArrayQueue或MpscLinkedAtomicQueue - Epoll:EpollEventLoop(邊緣觸發(fā))
和2相同
那為什么要采用不同的Queue實(shí)現(xiàn)呢血崭?下面看看不同Queue的具體實(shí)現(xiàn);
LinkedBlockingQueue
LinkedBlockingQueue是JDK提供的卧惜,采用鏈表存儲數(shù)據(jù),通過ReentrantLock和Condition來解決競爭和支持堵塞夹纫;
既然采用鏈表咽瓷,鐵定要定義一個(gè)新的節(jié)點(diǎn)類,在LinkedBlockingQueue中這個(gè)節(jié)點(diǎn)類為:
static class Node<E> {
E item;
Node<E> next;
Node(E x) { item = x;}
}
可以看到實(shí)現(xiàn)很簡單舰讹,采用單向鏈接茅姜,通過next指向下一個(gè)節(jié)點(diǎn),如果next為null月匣,表示該節(jié)點(diǎn)為尾節(jié)點(diǎn)钻洒;
LinkedBlockingQueue的成員變量為:
//容量,隊(duì)列是和ArrayList不同锄开,有容量限制
private final int capacity;
//當(dāng)前節(jié)點(diǎn)數(shù)量
private final AtomicInteger count = new AtomicInteger(0);
//頭節(jié)點(diǎn)
private transient Node<E> head;
//尾節(jié)點(diǎn)
private transient Node<E> last;
//出列鎖素标,當(dāng)從隊(duì)列取數(shù)據(jù)時(shí),要先獲取該鎖
private final ReentrantLock takeLock = new ReentrantLock();
//隊(duì)列非空條件變量萍悴,當(dāng)隊(duì)列為空時(shí)头遭,出列線程要等待該條件變量
private final Condition notEmpty = takeLock.newCondition();
//入列鎖寓免,當(dāng)往隊(duì)列添加數(shù)據(jù)時(shí),要先獲取該鎖
private final ReentrantLock putLock = new ReentrantLock();
//隊(duì)列容量未滿條件變量计维,當(dāng)隊(duì)列滿了袜香,入列線程要等待該條件變量
private final Condition notFull = putLock.newCondition();
從上面的成員變量大概可以看出:
- 可以設(shè)置容量,但未提供初始容量鲫惶、最大容量之類的特性蜈首;
- 先入先出隊(duì)列,入列和出列都要獲取鎖欠母,因此是線程安全的疾就;
- 入列和出列分為兩個(gè)鎖;
以其中的入列offer方法為例(由于netty中使用的是Queue而不是BlockingQueue,因此此處分析的都是非堵塞的方法):
public boolean offer(E e) {
if (e == null) throw new NullPointerException();//參數(shù)非空
final AtomicInteger count = this.count;//隊(duì)列元素?cái)?shù)量
if (count.get() == capacity)//隊(duì)列已滿艺蝴,無法添加猬腰,返回false
return false;
int c = -1;
Node<E> node = new Node(e);//將元素封裝為節(jié)點(diǎn)
final ReentrantLock putLock = this.putLock;
putLock.lock();//獲取鎖,所有入列操作共有同一個(gè)鎖
try {
if (count.get() < capacity) {//只有隊(duì)列不滿猜敢,才能添加
enqueue(node);//入列
c = count.getAndIncrement();
if (c + 1 < capacity)//如果添加元素之后姑荷,隊(duì)列仍然不滿,notFull條件變量滿足條件缩擂,通知排隊(duì)等待的線程
notFull.signal();
}
} finally {
putLock.unlock();//釋放鎖
}
if (c == 0)
signalNotEmpty();//說明之前隊(duì)列為空鼠冕,因此需要出發(fā)非空條件變量
return c >= 0;
}
ArrayBlockingQueue
顧名思義,ArrayBlockingQueue是采用數(shù)組存儲數(shù)據(jù)的胯盯;它的成員變量如下:
//數(shù)組懈费,用于存儲數(shù)據(jù)
final Object[] items;
//ArrayBlockingQueue維護(hù)了兩個(gè)索引,一個(gè)用于出列博脑,一個(gè)用于入列
int takeIndex;
int putIndex;
//當(dāng)前隊(duì)列的元素?cái)?shù)量
int count;
//可重入鎖
final ReentrantLock lock;
//隊(duì)列容量非空條件變量憎乙,當(dāng)隊(duì)列空了,出列線程要等待該條件變量
private final Condition notEmpty;
//隊(duì)列容量未滿條件變量叉趣,當(dāng)隊(duì)列滿了泞边,入列線程要等待該條件變量
private final Condition notFull;
從上面可出:
- 入列和出列采用同一個(gè)鎖,也就是說入列和出列會彼此競爭鎖疗杉;
- 采用索引來記錄當(dāng)前出列和入列的位置阵谚,避免了移動(dòng)數(shù)組元素;
- 基于以上2點(diǎn)烟具,在高并發(fā)的情況下梢什,由于鎖競爭,性能應(yīng)該比不上鏈表的實(shí)現(xiàn)朝聋;
MpscChunkedArrayQueue
MpscChunkedArrayQueue也是采用數(shù)組來實(shí)現(xiàn)的嗡午,從名字上可以看出它是支持多生產(chǎn)者單消費(fèi)者( Multi Producer Single Consumer),和前面的兩種隊(duì)列使用場景有些差異;但恰好符合netty的使用場景玖翅;它對特定場景進(jìn)行了優(yōu)化:
- CacheLine Padding
LinkedBlockingQueue的head和last是相鄰的翼馆,ArrayBlockingQueue的takeIndex和putIndex是相鄰的;而我們都知道CPU將數(shù)據(jù)加載到緩存實(shí)際上是按照緩存行加載的,因此可能出現(xiàn)明明沒有修改last金度,但由于出列操作修改了head应媚,導(dǎo)致整個(gè)緩存行失效,需要重新進(jìn)行加載猜极;
//此處我將多個(gè)類中的變量合并到了一起中姜,便于查看
long p01, p02, p03, p04, p05, p06, p07;
long p10, p11, p12, p13, p14, p15, p16, p17;
protected long producerIndex;
long p01, p02, p03, p04, p05, p06, p07;
long p10, p11, p12, p13, p14, p15, p16, p17;
protected long maxQueueCapacity;
protected long producerMask;
protected E[] producerBuffer;
protected volatile long producerLimit;
protected boolean isFixedChunkSize = false;
long p0, p1, p2, p3, p4, p5, p6, p7;
long p10, p11, p12, p13, p14, p15, p16, p17;
protected long consumerMask;
protected E[] consumerBuffer;
protected long consumerIndex;
可以看到生產(chǎn)者索引和消費(fèi)者索引中間padding了18個(gè)long變量,18*8=144跟伏,而一般操作系統(tǒng)的cacheline為64,可以通過如下方式查看緩存行大小:
cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
減少鎖的使用,使用CAS+自旋:
由于使用鎖會造成線程切換丢胚,消耗資源;因此MpscChunkedArrayQueue并未使用鎖受扳,而是使用自旋携龟;和Disruptor的BusySpinWaitStrategy比較類似,如果系統(tǒng)比較繁忙勘高,自旋效率會很適合峡蟋;當(dāng)然它也會造成CPU使用率比較高,所以建議使用時(shí)將這些線程綁定到特定的CPU;支持?jǐn)U容;
MpscChunkedArrayQueue采用數(shù)組作為內(nèi)部存儲結(jié)構(gòu)华望,那么它是如何實(shí)現(xiàn)擴(kuò)容的呢蕊蝗?可能大家第一反應(yīng)想到的是創(chuàng)建新數(shù)組,然后將老數(shù)據(jù)挪到新數(shù)組中去赖舟;但MpscChunkedArrayQueue采用了一種獨(dú)特的方式蓬戚,避免了數(shù)組的復(fù)制;
舉例說明:
假設(shè)隊(duì)列的初始化大小為4宾抓,則初始的buffer數(shù)組為4+1子漩;為什么要+1呢?因?yàn)樽詈笠粋€(gè)元素需要存儲下一個(gè)buffer的指針石洗;假設(shè)隊(duì)列中存儲了8個(gè)元素痛单,則數(shù)組的內(nèi)容如下:
- buffer
數(shù)組下標(biāo) | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
內(nèi)容 | e0 | e1 | e2 | JUMP | next[5] |
- next
數(shù)組下標(biāo) | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|
內(nèi)容 | e4 | e5 | JUMP | e3 | next |
可以看到,每個(gè)buffer數(shù)組的大小都是固定的(之前的版本支持固定大小和非固定大芯⑼取)旭绒,也就是initialCapacity指定的大小焦人;每個(gè)數(shù)組的最后一個(gè)實(shí)際保存的是個(gè)指針挥吵,指向下一個(gè)數(shù)組;讀取數(shù)據(jù)時(shí)花椭,如果遇到JUMP表示要從下一個(gè)buffer數(shù)組讀取數(shù)據(jù)忽匈;
public E poll() {//消費(fèi)隊(duì)列元素
final E[] buffer = consumerBuffer;
final long index = consumerIndex;
final long mask = consumerMask;
//通過Unsafe.getObjectVolatile(E[] buffer, long offset)獲取數(shù)組元素
//因此需要根據(jù)數(shù)組索引,計(jì)算出在內(nèi)存中的偏移量
final long offset = modifiedCalcElementOffset(index, mask);
Object e = lvElement(buffer, offset);
if (e == null) {
//e==null并不一定表示隊(duì)列為空,因?yàn)槿肓械臅r(shí)候是先更新producerIndex,后更新數(shù)組元素矿辽,因此需要判斷producerIndex
if (index != lvProducerIndex()) {
//采用自旋丹允,直到獲取到數(shù)據(jù)
do {
e = lvElement(buffer, offset);
} while (e == null);
}
else {
return null;
}
}
if (e == JUMP) {//跳轉(zhuǎn)到新的buff尋找
final E[] nextBuffer = getNextBuffer(buffer, mask);
return newBufferPoll(nextBuffer, index);
}
//從隊(duì)列中取出數(shù)據(jù)之后郭厌,將數(shù)組對應(yīng)位置元素清除
soElement(buffer, offset, null);
soConsumerIndex(index + 2);
return (E) e;
}
性能對比
從網(wǎng)上找了一份測試代碼,稍做修改:
public class TestQueue {
private static int PRD_THREAD_NUM;
private static int C_THREAD_NUM=1;
private static int N = 1<<20;
private static ExecutorService executor;
public static void main(String[] args) throws Exception {
System.out.println("Producer\tConsumer\tcapacity \t LinkedBlockingQueue \t ArrayBlockingQueue \t MpscLinkedAtomicQueue \t MpscChunkedArrayQueue \t MpscArrayQueue");
for (int j = 1; j < 8; j++) {
PRD_THREAD_NUM = (int) Math.pow(2, j);
executor = Executors.newFixedThreadPool(PRD_THREAD_NUM * 2);
for (int i = 9; i < 12; i++) {
int length = 1<< i;
System.out.print(PRD_THREAD_NUM + "\t\t");
System.out.print(C_THREAD_NUM + "\t\t");
System.out.print(length + "\t\t");
System.out.print(doTest2(new LinkedBlockingQueue<Integer>(length), N) + "/s\t\t");
System.out.print(doTest2(new ArrayBlockingQueue<Integer>(length), N) + "/s\t\t");
System.out.print(doTest2(new MpscLinkedAtomicQueue<Integer>(), N) + "/s\t\t");
System.out.print(doTest2(new MpscChunkedArrayQueue<Integer>(length), N) + "/s\t\t");
System.out.print(doTest2(new MpscArrayQueue<Integer>(length), N) + "/s");
System.out.println();
}
executor.shutdown();
}
}
private static class Producer implements Runnable {
int n;
Queue<Integer> q;
public Producer(int initN, Queue<Integer> initQ) {
n = initN;
q = initQ;
}
public void run() {
while (n > 0) {
if (q.offer(n)) {
n--;
}
}
}
}
private static class Consumer implements Callable<Long> {
int n;
Queue<Integer> q;
public Consumer(int initN, Queue<Integer> initQ) {
n = initN;
q = initQ;
}
public Long call() {
long sum = 0;
Integer e = null;
while (n > 0) {
if ((e = q.poll()) != null) {
sum += e;
n--;
}
}
return sum;
}
}
private static long doTest2(final Queue<Integer> q, final int n)
throws Exception {
CompletionService<Long> completionServ = new ExecutorCompletionService<>(executor);
long t = System.nanoTime();
for (int i = 0; i < PRD_THREAD_NUM; i++) {
executor.submit(new Producer(n / PRD_THREAD_NUM, q));
}
for (int i = 0; i < C_THREAD_NUM; i++) {
completionServ.submit(new Consumer(n / C_THREAD_NUM, q));
}
for (int i = 0; i < 1; i++) {
completionServ.take().get();
}
t = System.nanoTime() - t;
return (long) (1000000000.0 * N / t); // Throughput, items/sec
}
}
從上面可以看到:
- Mpsc*Queue表現(xiàn)最好,而且性能表現(xiàn)也最穩(wěn)定雕蔽;
- 并發(fā)數(shù)較低的時(shí)候,基于數(shù)組的隊(duì)列比基于鏈表的隊(duì)列表現(xiàn)要好折柠,,推測有可能是因?yàn)閿?shù)組在內(nèi)存中是連續(xù)分配的批狐,因此加載的時(shí)候可以有效利用緩存行扇售,減少讀的次數(shù);而鏈表在內(nèi)存的地址不是連續(xù)的嚣艇,隨機(jī)讀代價(jià)比較大承冰;
- 并發(fā)數(shù)較高的時(shí)候,基于鏈表的隊(duì)列比基于數(shù)組的隊(duì)列表現(xiàn)要好食零;LinkedBlockingQueue因?yàn)槿肓泻统隽胁捎貌煌逆i困乒,因此鎖競爭應(yīng)該比ArrayBlockingQueue小贰谣;而MpscLinkedAtomicQueue沒有容量限制顶燕,使用AtomicReference提供的XCHG功能修改鏈接即可達(dá)到出列和入列的目的,效率特別高冈爹;
- MpscChunkedArrayQueue相對于MpscArrayQueue涌攻,提供了動(dòng)態(tài)擴(kuò)容大能力型酥;