線(xiàn)程池-執(zhí)行機(jī)制ForkJoinPool官方文檔

1.ForkJoinPool使用說(shuō)明

An ExecutorService for running ForkJoinTasks. A ForkJoinPool 
provides the entry point for submissions from non-ForkJoinTask 
clients, as well as management and monitoring operations.

A ForkJoinPool differs from other kinds of ExecutorService mainly by 
virtue of employing work-stealing: all threads in the pool attempt to 
find and execute tasks submitted to the pool and/or created by other 
active tasks (eventually blocking waiting for work if none exist). This 
enables efficient processing when most tasks spawn other subtasks 
(as do most ForkJoinTasks), as well as when many small tasks are 
submitted to the pool from external clients. Especially when setting 
asyncMode to true in constructors, ForkJoinPools may also be 
appropriate for use with event-style tasks that are never joined.

A static commonPool() is available and appropriate for most 
applications. The common pool is used by any ForkJoinTask that is 
not explicitly submitted to a specified pool. Using the common pool 
normally reduces resource usage (its threads are slowly reclaimed 
during periods of non-use, and reinstated upon subsequent use).

For applications that require separate or custom pools, a 
ForkJoinPool may be constructed with a given target parallelism level; 
by default, equal to the number of available processors. The pool 
attempts to maintain enough active (or available) threads by 
dynamically adding, suspending, or resuming internal worker threads, 
even if some tasks are stalled waiting to join others. However, no such 
adjustments are guaranteed in the face of blocked I/O or other 
unmanaged synchronization. The nested 
ForkJoinPool.ManagedBlocker interface enables extension of the 
kinds of synchronization accommodated.

In addition to execution and lifecycle control methods, this class 
provides status check methods (for example getStealCount()) that are 
intended to aid in developing, tuning, and monitoring fork/join 
applications. Also, method toString() returns indications of pool state 
in a convenient form for informal monitoring.

As is the case with other ExecutorServices, there are three main task 
execution methods summarized in the following table. These are 
designed to be used primarily by clients not already engaged in 
fork/join computations in the current pool. The main forms of these 
methods accept instances of ForkJoinTask, but overloaded forms also 
allow mixed execution of plain Runnable- or Callable- based activities 
as well. However, tasks that are already executing in a pool should 
normally instead use the within-computation forms listed in the table 
unless using async event-style tasks that are not usually joined, in 
which case there is little difference among choice of methods.

用于運(yùn)行ForkJoinTasks的ExecutorService逝段。 ForkJoinPool提供非ForkJoinTask客戶(hù)端提交的入口點(diǎn)骑歹,以及管理和監(jiān)視操作。

ForkJoinPool與其他類(lèi)型的ExecutorService的不同之處主要在于使用了工作竊取:池中的所有線(xiàn)程都嘗試查找和執(zhí)行提交給池的任務(wù)和/或由其他活動(dòng)任務(wù)創(chuàng)建的任務(wù)(如果不存在則會(huì)最終會(huì)阻塞等待) 。當(dāng)大多數(shù)任務(wù)產(chǎn)生子任務(wù)(大多數(shù)ForkJoinTasks都如此),以及從外部客戶(hù)端向池提交許多小任務(wù)時(shí)寒亥,均可以實(shí)現(xiàn)高效處理。特別是在構(gòu)造函數(shù)中將asyncMode設(shè)置為true時(shí)荧关,F(xiàn)orkJoinPools也可能適用于不會(huì)join的事件形式的任務(wù)溉奕。

靜態(tài)commonPool()適用于大多數(shù)應(yīng)用程序。common pool由未顯式提交到指定池的任何ForkJoinTask使用忍啤。使用common pool通常會(huì)減少資源使用(其線(xiàn)程在不使用期間緩慢回收加勤,并在后續(xù)使用時(shí)恢復(fù))。

對(duì)于需要單獨(dú)或自定義池的應(yīng)用程序同波,可以使用目標(biāo)并行度構(gòu)造ForkJoinPool胸竞;默認(rèn)情況下,等于可用處理器的數(shù)量参萄。池嘗試通過(guò)動(dòng)態(tài)添加卫枝、掛起或恢復(fù)內(nèi)部工作線(xiàn)程來(lái)維護(hù)足夠的活動(dòng)(或可用)線(xiàn)程,即使某些任務(wù)停下來(lái)等待join其他任務(wù)也是如此讹挎。但是校赤,面對(duì)阻塞的I/O或其他非可控的同步,不能保證這樣的調(diào)整筒溃。嵌套的ForkJoinPool.ManagedBlocker接口可以擴(kuò)展所容納的同步類(lèi)型马篮。

除了執(zhí)行和生命周期控制方法之外,此類(lèi)還提供狀態(tài)檢查方法(例如getStealCount())怜奖,旨在幫助開(kāi)發(fā)浑测、調(diào)優(yōu)和監(jiān)視fork / join應(yīng)用程序。此外歪玲,方法toString()以方便的形式返回池狀態(tài)迁央。

與其他ExecutorServices的情況一樣,下表總結(jié)了三種主要的任務(wù)執(zhí)行方法滥崩。這些主要用于當(dāng)前池中尚未參與fork / join計(jì)算的客戶(hù)端岖圈。這些方法的主要形式接受ForkJoinTask的實(shí)例,但重載形式也允許混合執(zhí)行基于Runnable或基于Callable的簡(jiǎn)單活動(dòng)钙皮。但是蜂科,已經(jīng)在池中執(zhí)行的任務(wù)通常應(yīng)該使用表中列出的內(nèi)部計(jì)算形式顽决,除非使用非joined的異步事件形式的任務(wù),在這種情況下导匣,這些方法幾乎沒(méi)有區(qū)別才菠。

默認(rèn)情況下,common pool使用默認(rèn)參數(shù)構(gòu)造贡定,但可以通過(guò)設(shè)置三個(gè)系統(tǒng)屬性來(lái)控制:

  • java.util.concurrent.ForkJoinPool.common.parallelism - 并行度赋访,非負(fù)整數(shù)
  • java.util.concurrent.ForkJoinPool.common.threadFactory - the class name of a ForkJoinPool.ForkJoinWorkerThreadFactory
  • java.util.concurrent.ForkJoinPool.common.exceptionHandler - the class name of a Thread.UncaughtExceptionHandler

如果存在SecurityManager且未指定工廠(chǎng),則默認(rèn)池使用生產(chǎn)無(wú)任何權(quán)限線(xiàn)程的線(xiàn)程工廠(chǎng)厕氨。 系統(tǒng)類(lèi)加載器用于加載這些類(lèi)进每。 如果在建立這些設(shè)置時(shí)出現(xiàn)任何錯(cuò)誤汹粤,則使用默認(rèn)參數(shù)命斧。 通過(guò)將parallelism屬性設(shè)置為零和/或使用可能返回null的工廠(chǎng),可以禁用或限制公共池中線(xiàn)程的使用嘱兼。 但是国葬,這樣做可能會(huì)導(dǎo)致永遠(yuǎn)不會(huì)執(zhí)行unjoined的任務(wù)。

實(shí)現(xiàn)說(shuō)明:此實(shí)現(xiàn)將最大運(yùn)行線(xiàn)程數(shù)限制為32767.嘗試創(chuàng)建大于最大數(shù)量的池會(huì)導(dǎo)致IllegalArgumentException芹壕。

僅當(dāng)池關(guān)閉或內(nèi)部資源耗盡時(shí)汇四,此實(shí)現(xiàn)才會(huì)拒絕提交的任務(wù)(即通過(guò)拋出RejectedExecutionException)。

2.實(shí)現(xiàn)說(shuō)明

2.1 Implementation Overview

Implementation Overview

This class and its nested classes provide the main
functionality and control for a set of worker threads:
Submissions from non-FJ threads enter into submission queues.
Workers take these tasks and typically split them into subtasks
that may be stolen by other workers.  Preference rules give
first priority to processing tasks from their own queues (LIFO
or FIFO, depending on mode), then to randomized FIFO steals of
tasks in other queues.  This framework began as vehicle for
supporting tree-structured parallelism using work-stealing.
Over time, its scalability advantages led to extensions and
changes to better support more diverse usage contexts.  Because
most internal methods and nested classes are interrelated,
their main rationale and descriptions are presented here;
individual methods and nested classes contain only brief
comments about details.

此類(lèi)及其內(nèi)部類(lèi)為一組工作線(xiàn)程提供主要功能和控制:來(lái)自非FJ線(xiàn)程的提交進(jìn)入提交隊(duì)列踢涌。 工作線(xiàn)程取走這些任務(wù)并通常將其分成子任務(wù)通孽,子任務(wù)可能被其他線(xiàn)程竊取。 首選規(guī)則優(yōu)先處理自己隊(duì)列的(LIFO或FIFO睁壁,取決于模式)任務(wù)背苦,然后隨機(jī)FIFO竊取其他隊(duì)列中的任務(wù)。 該框架開(kāi)始用作支持樹(shù)結(jié)構(gòu)并行的工具潘明,使用工作竊取行剂。 隨著時(shí)間的推移,其可擴(kuò)展性?xún)?yōu)勢(shì)引發(fā)擴(kuò)展和更改钳降,以更好地支持更多樣化的使用環(huán)境厚宰。 因?yàn)榇蠖鄶?shù)內(nèi)部方法和嵌套類(lèi)是相互關(guān)聯(lián)的,所以它們的主要原理和描述都在這里給出遂填; 單個(gè)方法和嵌套類(lèi)僅包含簡(jiǎn)短注釋铲觉。

2.2 WorkQueues

WorkQueues
==========

Most operations occur within work-stealing queues (in nested
class WorkQueue).  These are special forms of Deques that
support only three of the four possible end-operations -- push,
pop, and poll (aka steal), under the further constraints that
push and pop are called only from the owning thread (or, as
extended here, under a lock), while poll may be called from
other threads.  (If you are unfamiliar with them, you probably
want to read Herlihy and Shavit's book "The Art of
Multiprocessor programming", chapter 16 describing these in
more detail before proceeding.)  The main work-stealing queue
design is roughly similar to those in the papers "Dynamic
Circular Work-Stealing Deque" by Chase and Lev, SPAA 2005
(http://research.sun.com/scalable/pubs/index.html) and
"Idempotent work stealing" by Michael, Saraswat, and Vechev,
PPoPP 2009 (http://portal.acm.org/citation.cfm?id=1504186).
The main differences ultimately stem from GC requirements that
we null out taken slots as soon as we can, to maintain as small
a footprint as possible even in programs generating huge
numbers of tasks. To accomplish this, we shift the CAS
arbitrating pop vs poll (steal) from being on the indices
("base" and "top") to the slots themselves.

Adding tasks then takes the form of a classic array push(task):
   q.array[q.top] = task; ++q.top;

(The actual code needs to null-check and size-check the array,
properly fence the accesses, and possibly signal waiting
workers to start scanning -- see below.)  Both a successful pop
and poll mainly entail a CAS of a slot from non-null to null.

The pop operation (always performed by owner) is:
  if ((base != top) and
       (the task at top slot is not null) and
       (CAS slot to null))
          decrement top and return task;

And the poll operation (usually by a stealer) is
   if ((base != top) and
       (the task at base slot is not null) and
       (base has not changed) and
       (CAS slot to null))
          increment base and return task;

Because we rely on CASes of references, we do not need tag bits
on base or top.  They are simple ints as used in any circular
array-based queue (see for example ArrayDeque).  Updates to the
indices guarantee that top == base means the queue is empty,
but otherwise may err on the side of possibly making the queue
appear nonempty when a push, pop, or poll have not fully
committed. (Method isEmpty() checks the case of a partially
completed removal of the last element.)  Because of this, the
poll operation, considered individually, is not wait-free. One
thief cannot successfully continue until another in-progress
one (or, if previously empty, a push) completes.  However, in
the aggregate, we ensure at least probabilistic
non-blockingness.  If an attempted steal fails, a thief always
chooses a different random victim target to try next. So, in
order for one thief to progress, it suffices for any
in-progress poll or new push on any empty queue to
complete. (This is why we normally use method pollAt and its
variants that try once at the apparent base index, else
consider alternative actions, rather than method poll, which
retries.)

This approach also enables support of a user mode in which
local task processing is in FIFO, not LIFO order, simply by
using poll rather than pop.  This can be useful in
message-passing frameworks in which tasks are never joined.
However neither mode considers affinities, loads, cache
localities, etc, so rarely provide the best possible
performance on a given machine, but portably provide good
throughput by averaging over these factors.  Further, even if
we did try to use such information, we do not usually have a
basis for exploiting it.  For example, some sets of tasks
profit from cache affinities, but others are harmed by cache
pollution effects. Additionally, even though it requires
scanning, long-term throughput is often best using random
selection rather than directed selection policies, so cheap
randomization of sufficient quality is used whenever
applicable.  Various Marsaglia XorShifts (some with different
shift constants) are inlined at use points.

WorkQueues are also used in a similar way for tasks submitted
to the pool. We cannot mix these tasks in the same queues used
by workers. Instead, we randomly associate submission queues
with submitting threads, using a form of hashing.  The
ThreadLocalRandom probe value serves as a hash code for
choosing existing queues, and may be randomly repositioned upon
contention with other submitters.  In essence, submitters act
like workers except that they are restricted to executing local
tasks that they submitted (or in the case of CountedCompleters,
others with the same root task).  Insertion of tasks in shared
mode requires a lock (mainly to protect in the case of
resizing) but we use only a simple spinlock (using field
qlock), because submitters encountering a busy queue move on to
try or create other queues -- they block only when creating and
registering new queues. Additionally, "qlock" saturates to an
unlockable value (-1) at shutdown. Unlocking still can be and
is performed by cheaper ordered writes of "qlock" in successful
cases, but uses CAS in unsuccessful cases.

大多數(shù)操作發(fā)生在工作竊取隊(duì)列中(在嵌套類(lèi)WorkQueue中)。這些特殊形式的Deques只支持四種可能的端操作中的三種 - push吓坚,pop和poll(也稱(chēng)為steal)备燃,進(jìn)一步的限制是push和pop僅從擁有的線(xiàn)程調(diào)用,而poll可以從其他線(xiàn)程調(diào)用凌唬。 (如果你不熟悉并齐,可以參閱Herlihy和Shavit的書(shū)“"The Art of Multiprocessor programming"”第16章)漏麦。主要的工作竊取隊(duì)列設(shè)計(jì)大致類(lèi)似于Chase和Lev撰寫(xiě)的“Dynamic Circular Work-Stealing Deque”和Michael、Saraswat和Vechev的“Idempotent work stealing”况褪。主要差異最終源于GC要求撕贞,這里會(huì)盡快將slot置為null,即使在生成大量任務(wù)的程序中也盡可能保持足夠小的痕跡测垛。為了實(shí)現(xiàn)這一點(diǎn)捏膨,將CAS仲裁pop vs poll(steal)從索引(“base”和“top”)轉(zhuǎn)移到插槽本身。

然后添加任務(wù)采用經(jīng)典數(shù)組push(task)的形式:

q.array[q.top] = task; ++ q.top;

(實(shí)際代碼需要對(duì)數(shù)組進(jìn)行空檢查和大小檢查食侮,正確阻止訪(fǎng)問(wèn)号涯,并signal等待線(xiàn)程開(kāi)始進(jìn)行掃描 - 見(jiàn)下文。)成功的pop和poll主要需要一個(gè)將槽從non-null 轉(zhuǎn)換為 null的CAS锯七。

The pop operation (always performed by owner) is:
  if ((base != top) and
       (the task at top slot is not null) and
       (CAS slot to null))
          decrement top and return task;

And the poll operation (usually by a stealer) is
   if ((base != top) and
       (the task at base slot is not null) and
       (base has not changed) and
       (CAS slot to null))
          increment base and return task;

因?yàn)槲覀円蕾?lài)于引用的CAS链快,所以不需要base或top的標(biāo)記位。它們只是任何基于循環(huán)數(shù)組的隊(duì)列中使用的簡(jiǎn)單整數(shù)(參見(jiàn)ArrayDeque)眉尸。對(duì)索引的更新保證top == base意味著隊(duì)列是空的域蜗,但是當(dāng)push、pop或者poll沒(méi)有完全提交時(shí)噪猾,可能會(huì)使隊(duì)列顯得非空霉祸。 (方法isEmpty檢查部分完成最后一個(gè)元素刪除的情況。)因此袱蜡,單獨(dú)考慮的poll操作不是wait-free丝蹭。一個(gè)小偷無(wú)法成功繼續(xù),直到另一個(gè)正在執(zhí)行的操作(或者坪蚁,如果先前為空奔穿,push)完成。但是迅细,總的來(lái)說(shuō)巫橄,我們至少確保概率無(wú)阻塞。如果嘗試竊取失敗茵典,小偷總是會(huì)選擇一個(gè)不同的隨機(jī)目標(biāo)來(lái)嘗試湘换。因此,為了讓一個(gè)小偷繼續(xù)處理统阿,任何正在進(jìn)行的poll或任何空隊(duì)列的新push都可以完成彩倚。 (這就是為什么我們通常使用方法pollAt及其在base索引上嘗試一次的變體,否則考慮替代操作扶平,而不是重試的方法poll帆离。)

該方法還支持用戶(hù)模式,本地任務(wù)以FIFO順序而不是LIFO進(jìn)行處理结澄,只需使用poll而不是pop哥谷。這在任務(wù)不用joined的消息傳遞框架中非常有用岸夯。然而,兩種模式都不考慮關(guān)聯(lián)性们妥、負(fù)載猜扮、緩存本地等,因此很少在給定的機(jī)器上提供最佳性能监婶,但通過(guò)對(duì)這些因素的平衡可以提供良好的吞吐量旅赢。此外,即使嘗試使用這些信息惑惶,通常也沒(méi)有開(kāi)發(fā)它的基礎(chǔ)煮盼。例如,一些任務(wù)集從緩存關(guān)聯(lián)性中獲益带污,但其他任務(wù)受到緩存污染的傷害僵控。此外,即使需要掃描刮刑,長(zhǎng)期吞吐量通常最好使用隨機(jī)選擇而不是定向選擇策略喉祭,因此在適用的情況下使用足夠質(zhì)量的廉價(jià)隨機(jī)化养渴。各種Marsaglia XorShifts(一些具有不同的移位常數(shù))在使用點(diǎn)處內(nèi)聯(lián)雷绢。

WorkQueues也以類(lèi)似的方式用于提交(submit)到池的任務(wù)。不能將這些任務(wù)混合放入工作線(xiàn)程使用的同一個(gè)隊(duì)列理卑。相反,使用散列形式隨機(jī)將提交隊(duì)列與提交線(xiàn)程相關(guān)聯(lián)。 ThreadLocalRandom探測(cè)器值用作選擇現(xiàn)有隊(duì)列的哈希碼融师,并且在與其他提交者爭(zhēng)用時(shí)隨機(jī)重新定位爆土。本質(zhì)上,提交者的行為類(lèi)似于工作線(xiàn)程宇立,除了他們被限制執(zhí)行提交的本地任務(wù)(或者在CountedCompleters的情況下踪宠,其他具有相同的根任務(wù))。在共享模式下插入任務(wù)需要鎖定(主要是為了在調(diào)整大小的情況下保護(hù))妈嘹,但只使用一個(gè)簡(jiǎn)單的自旋鎖(使用字段qlock)柳琢,因?yàn)橛龅矫﹃?duì)列的提交者會(huì)繼續(xù)嘗試或創(chuàng)建其他隊(duì)列 - 僅在創(chuàng)建和注冊(cè)新隊(duì)列時(shí)會(huì)阻塞。此外润脸,“qlock”在關(guān)必時(shí)設(shè)置為可解鎖值(-1)柬脸。在成功情況中,解鎖仍然可以通過(guò)更便宜的“qlock”有序?qū)懭雭?lái)執(zhí)行毙驯,但在不成功的情況下使用CAS倒堕。

2.3 Management

Management
==========

The main throughput advantages of work-stealing stem from
decentralized control -- workers mostly take tasks from
themselves or each other, at rates that can exceed a billion
per second.  The pool itself creates, activates (enables
scanning for and running tasks), deactivates, blocks, and
terminates threads, all with minimal central information.
There are only a few properties that we can globally track or
maintain, so we pack them into a small number of variables,
often maintaining atomicity without blocking or locking.
Nearly all essentially atomic control state is held in two
volatile variables that are by far most often read (not
written) as status and consistency checks. (Also, field
"config" holds unchanging configuration state.)

Field "ctl" contains 64 bits holding information needed to
atomically decide to add, inactivate, enqueue (on an event
queue), dequeue, and/or re-activate workers.  To enable this
packing, we restrict maximum parallelism to (1<<15)-1 (which is
far in excess of normal operating range) to allow ids, counts,
and their negations (used for thresholding) to fit into 16bit
subfields.

Field "runState" holds lockable state bits (STARTED, STOP, etc)
also protecting updates to the workQueues array.  When used as
a lock, it is normally held only for a few instructions (the
only exceptions are one-time array initialization and uncommon
resizing), so is nearly always available after at most a brief
spin. But to be extra-cautious, after spinning, method
awaitRunStateLock (called only if an initial CAS fails), uses a
wait/notify mechanics on a builtin monitor to block when
(rarely) needed. This would be a terrible idea for a highly
contended lock, but most pools run without the lock ever
contending after the spin limit, so this works fine as a more
conservative alternative. Because we don't otherwise have an
internal Object to use as a monitor, the "stealCounter" (an
AtomicLong) is used when available (it too must be lazily
initialized; see externalSubmit).

Usages of "runState" vs "ctl" interact in only one case:
deciding to add a worker thread (see tryAddWorker), in which
case the ctl CAS is performed while the lock is held.

Recording WorkQueues.  WorkQueues are recorded in the
"workQueues" array. The array is created upon first use (see
externalSubmit) and expanded if necessary.  Updates to the
array while recording new workers and unrecording terminated
ones are protected from each other by the runState lock, but
the array is otherwise concurrently readable, and accessed
directly. We also ensure that reads of the array reference
itself never become too stale. To simplify index-based
operations, the array size is always a power of two, and all
readers must tolerate null slots. Worker queues are at odd
indices. Shared (submission) queues are at even indices, up to
a maximum of 64 slots, to limit growth even if array needs to
expand to add more workers. Grouping them together in this way
simplifies and speeds up task scanning.

All worker thread creation is on-demand, triggered by task
submissions, replacement of terminated workers, and/or
compensation for blocked workers. However, all other support
code is set up to work with other policies.  To ensure that we
do not hold on to worker references that would prevent GC, All
accesses to workQueues are via indices into the workQueues
array (which is one source of some of the messy code
constructions here). In essence, the workQueues array serves as
a weak reference mechanism. Thus for example the stack top
subfield of ctl stores indices, not references.

Queuing Idle Workers. Unlike HPC work-stealing frameworks, we
cannot let workers spin indefinitely scanning for tasks when
none can be found immediately, and we cannot start/resume
workers unless there appear to be tasks available.  On the
other hand, we must quickly prod them into action when new
tasks are submitted or generated. In many usages, ramp-up time
to activate workers is the main limiting factor in overall
performance, which is compounded at program start-up by JIT
compilation and allocation. So we streamline this as much as
possible.

The "ctl" field atomically maintains active and total worker
counts as well as a queue to place waiting threads so they can
be located for signalling. Active counts also play the role of
quiescence indicators, so are decremented when workers believe
that there are no more tasks to execute. The "queue" is
actually a form of Treiber stack.  A stack is ideal for
activating threads in most-recently used order. This improves
performance and locality, outweighing the disadvantages of
being prone to contention and inability to release a worker
unless it is topmost on stack.  We park/unpark workers after
pushing on the idle worker stack (represented by the lower
32bit subfield of ctl) when they cannot find work.  The top
stack state holds the value of the "scanState" field of the
worker: its index and status, plus a version counter that, in
addition to the count subfields (also serving as version
stamps) provide protection against Treiber stack ABA effects.

Field scanState is used by both workers and the pool to manage
and track whether a worker is INACTIVE (possibly blocked
waiting for a signal), or SCANNING for tasks (when neither hold
it is busy running tasks).  When a worker is inactivated, its
scanState field is set, and is prevented from executing tasks,
even though it must scan once for them to avoid queuing
races. Note that scanState updates lag queue CAS releases so
usage requires care. When queued, the lower 16 bits of
scanState must hold its pool index. So we place the index there
upon initialization (see registerWorker) and otherwise keep it
there or restore it when necessary.

Memory ordering.  See "Correct and Efficient Work-Stealing for
Weak Memory Models" by Le, Pop, Cohen, and Nardelli, PPoPP 2013
(http://www.di.ens.fr/~zappa/readings/ppopp13.pdf) for an
analysis of memory ordering requirements in work-stealing
algorithms similar to the one used here.  We usually need
stronger than minimal ordering because we must sometimes signal
workers, requiring Dekker-like full-fences to avoid lost
signals.  Arranging for enough ordering without expensive
over-fencing requires tradeoffs among the supported means of
expressing access constraints. The most central operations,
taking from queues and updating ctl state, require full-fence
CAS.  Array slots are read using the emulation of volatiles
provided by Unsafe.  Access from other threads to WorkQueue
base, top, and array requires a volatile load of the first of
any of these read.  We use the convention of declaring the
"base" index volatile, and always read it before other fields.
The owner thread must ensure ordered updates, so writes use
ordered intrinsics unless they can piggyback on those for other
writes.  Similar conventions and rationales hold for other
WorkQueue fields (such as "currentSteal") that are only written
by owners but observed by others.

Creating workers. To create a worker, we pre-increment total
count (serving as a reservation), and attempt to construct a
ForkJoinWorkerThread via its factory. Upon construction, the
new thread invokes registerWorker, where it constructs a
WorkQueue and is assigned an index in the workQueues array
(expanding the array if necessary). The thread is then
started. Upon any exception across these steps, or null return
from factory, deregisterWorker adjusts counts and records
accordingly.  If a null return, the pool continues running with
fewer than the target number workers. If exceptional, the
exception is propagated, generally to some external caller.
Worker index assignment avoids the bias in scanning that would
occur if entries were sequentially packed starting at the front
of the workQueues array. We treat the array as a simple
power-of-two hash table, expanding as needed. The seedIndex
increment ensures no collisions until a resize is needed or a
worker is deregistered and replaced, and thereafter keeps
probability of collision low. We cannot use
ThreadLocalRandom.getProbe() for similar purposes here because
the thread has not started yet, but do so for creating
submission queues for existing external threads.

Deactivation and waiting. Queuing encounters several intrinsic
races; most notably that a task-producing thread can miss
seeing (and signalling) another thread that gave up looking for
work but has not yet entered the wait queue.  When a worker
cannot find a task to steal, it deactivates and enqueues. Very
often, the lack of tasks is transient due to GC or OS
scheduling. To reduce false-alarm deactivation, scanners
compute checksums of queue states during sweeps.  (The
stability checks used here and elsewhere are probabilistic
variants of snapshot techniques -- see Herlihy & Shavit.)
Workers give up and try to deactivate only after the sum is
stable across scans. Further, to avoid missed signals, they
repeat this scanning process after successful enqueuing until
again stable.  In this state, the worker cannot take/run a task
it sees until it is released from the queue, so the worker
itself eventually tries to release itself or any successor (see
tryRelease).  Otherwise, upon an empty scan, a deactivated
worker uses an adaptive local spin construction (see awaitWork)
before blocking (via park). Note the unusual conventions about
Thread.interrupts surrounding parking and other blocking:
Because interrupts are used solely to alert threads to check
termination, which is checked anyway upon blocking, we clear
status (using Thread.interrupted) before any call to park, so
that park does not immediately return due to status being set
via some other unrelated call to interrupt in user code.

Signalling and activation.  Workers are created or activated
only when there appears to be at least one task they might be
able to find and execute.  Upon push (either by a worker or an
external submission) to a previously (possibly) empty queue,
workers are signalled if idle, or created if fewer exist than
the given parallelism level.  These primary signals are
buttressed by others whenever other threads remove a task from
a queue and notice that there are other tasks there as well.
On most platforms, signalling (unpark) overhead time is
noticeably long, and the time between signalling a thread and
it actually making progress can be very noticeably long, so it
is worth offloading these delays from critical paths as much as
possible. Also, because inactive workers are often rescanning
or spinning rather than blocking, we set and clear the "parker"
field of WorkQueues to reduce unnecessary calls to unpark.
(This requires a secondary recheck to avoid missed signals.)

Trimming workers. To release resources after periods of lack of
use, a worker starting to wait when the pool is quiescent will
time out and terminate (see awaitWork) if the pool has remained
quiescent for period IDLE_TIMEOUT, increasing the period as the
number of threads decreases, eventually removing all workers.
Also, when more than two spare threads exist, excess threads
are immediately terminated at the next quiescent point.
(Padding by two avoids hysteresis.)

Shutdown and Termination. A call to shutdownNow invokes
tryTerminate to atomically set a runState bit. The calling
thread, as well as every other worker thereafter terminating,
helps terminate others by setting their (qlock) status,
cancelling their unprocessed tasks, and waking them up, doing
so repeatedly until stable (but with a loop bounded by the
number of workers).  Calls to non-abrupt shutdown() preface
this by checking whether termination should commence. This
relies primarily on the active count bits of "ctl" maintaining
consensus -- tryTerminate is called from awaitWork whenever
quiescent. However, external submitters do not take part in
this consensus.  So, tryTerminate sweeps through queues (until
stable) to ensure lack of in-flight submissions and workers
about to process them before triggering the "STOP" phase of
termination. (Note: there is an intrinsic conflict if
helpQuiescePool is called when shutdown is enabled. Both wait
for quiescence, but tryTerminate is biased to not trigger until
helpQuiescePool completes.)

工作竊取的主要吞吐量?jī)?yōu)勢(shì)源于分散控制 - 工作線(xiàn)程大多從自己或其他隊(duì)列獲取任務(wù),速度可超過(guò)每秒10億爆价。池本身創(chuàng)建垦巴、激活(啟用掃描和運(yùn)行任務(wù))媳搪、停用、阻塞和終止線(xiàn)程骤宣,所有這些操作都只需要最少的全局信息蛾号。需要全局跟蹤或維護(hù)屬性比較少,因此將它們打包成少量變量涯雅,通常不需要通過(guò)阻塞或鎖來(lái)保持原子性鲜结。幾乎所有原子控制狀態(tài)都保存在兩個(gè)volatile變量中,這些變量最常作為狀態(tài)被讀然钅妗(而非寫(xiě))和一致性檢查精刷。 (另外,字段“config”保持不變的配置狀態(tài)蔗候。)

字段“ctl”包含64位怒允,保存所需的信息用于原子地決定添加、停用锈遥、入隊(duì)(在事件隊(duì)列上)纫事、出隊(duì)和/或重新激活工作工作線(xiàn)程。為了實(shí)現(xiàn)這種打包所灸,將最大并行度限制為(1 << 15)-1(遠(yuǎn)遠(yuǎn)超過(guò)正常工作范圍)丽惶,以允許ids,counts及其negations(用于閾值處理)適合16位子域爬立。

字段“runState”保存鎖狀態(tài)位(STARTED钾唬,STOP等),同時(shí)保護(hù)對(duì)workQueues數(shù)組的更新侠驯。當(dāng)用作鎖時(shí)抡秆,它通常僅用于少量指令(唯一的例外是一次性數(shù)組初始化和不常見(jiàn)的大小調(diào)整),因此在最多短暫自旋后幾乎總是可用吟策。但要謹(jǐn)慎儒士,在自旋后,方法awaitRunStateLock(僅在初始CAS失敗時(shí)調(diào)用)檩坚,需要時(shí)在內(nèi)置monitor(鎖-管程)使用等待/通知機(jī)制來(lái)阻塞着撩。對(duì)于高度爭(zhēng)用的鎖來(lái)說(shuō)這是一個(gè)可怕的想法,但是大多數(shù)池在自旋后沒(méi)有鎖的情況下運(yùn)行效床,因此這可以作為保守的替代方案睹酌。因?yàn)闆](méi)有將內(nèi)部Object用作鎖,所以在可用時(shí)會(huì)使用“stealCounter”(AtomicLong)(它也必須被懶惰地初始化剩檀;請(qǐng)參閱externalSubmit)憋沿。

“runState”與“ctl”的使用僅在一種情況下交互:決定是否添加工作線(xiàn)程(請(qǐng)參閱tryAddWorker),在這種情況下沪猴,在保持鎖定時(shí)執(zhí)行ctl CAS辐啄。

記錄WorkQueues采章。 WorkQueues記錄在“workQueues”數(shù)組中。首次使用時(shí)會(huì)創(chuàng)建該數(shù)組(請(qǐng)參閱externalSubmit)并在必要時(shí)進(jìn)行擴(kuò)展壶辜。在記錄新工作線(xiàn)程和取消終止的線(xiàn)程時(shí)對(duì)數(shù)組的更新由runState鎖保護(hù)悯舟,但該數(shù)組可以同時(shí)并發(fā)讀取并直接訪(fǎng)問(wèn)。還確保數(shù)組引用本身的讀取永遠(yuǎn)不會(huì)過(guò)時(shí)砸民。為了簡(jiǎn)化基于索引的操作抵怎,數(shù)組大小始終是2的冪,并且所有讀取線(xiàn)程必須容忍空槽岭参。工作線(xiàn)程隊(duì)列是奇數(shù)索引反惕。共享(submission)隊(duì)列在偶數(shù)索引處,最多64個(gè)插槽以限制增長(zhǎng)演侯,即使數(shù)組需要擴(kuò)展以添加更多工作線(xiàn)程姿染。以這種方式將它們組合在一起簡(jiǎn)化并加速了任務(wù)掃描。

所有工作線(xiàn)程創(chuàng)建都是按需創(chuàng)建的秒际,由任務(wù)提交悬赏、替換已終止的工作線(xiàn)程和/或?qū)Ρ蛔枞墓ぷ骶€(xiàn)程補(bǔ)償來(lái)觸發(fā)。但是娄徊,所有其他支持代碼都設(shè)置為與其他策略一起使用闽颇。為了確保不會(huì)持有不會(huì)被GC的工作線(xiàn)程引用,所有對(duì)workQueues的訪(fǎng)問(wèn)都是通過(guò)workQueues數(shù)組索引(這里是一些凌亂的代碼結(jié)構(gòu)的一個(gè)來(lái)源)嵌莉。實(shí)質(zhì)上进萄,workQueues數(shù)組充當(dāng)弱引用機(jī)制捻脖。因此锐峭,例如,棧頂部ctl的子字段存儲(chǔ)索引可婶,而不是引用沿癞。

排隊(duì)空閑工作線(xiàn)程。與HPC工作竊取框架不同矛渴,不能讓工作線(xiàn)程在無(wú)法立即找到任何工作時(shí)無(wú)限期地掃描任務(wù)椎扬,不能啟動(dòng)/恢復(fù)工作線(xiàn)程除非有任務(wù)需要處理。另一方面具温,必須在提交或生成新任務(wù)時(shí)迅速采取行動(dòng)蚕涤。在許多用途中,激活工作線(xiàn)程的加速時(shí)間是整體性能的主要限制因素铣猩,在JIT編譯和分配的程序啟動(dòng)時(shí)更加復(fù)雜揖铜。所以我們盡可能地簡(jiǎn)化了這一點(diǎn)。

“ctl”字段原子地維護(hù)活動(dòng)和總工作線(xiàn)程計(jì)數(shù)以及放置等待線(xiàn)程的隊(duì)列达皿,以便在signal時(shí)方便定位天吓』呒纾活動(dòng)計(jì)數(shù)也起到靜止指示符的作用,因此當(dāng)工作線(xiàn)程認(rèn)為不再執(zhí)行任務(wù)時(shí)會(huì)減少龄寞。 “隊(duì)列”實(shí)際上是Treiber棧的一種形式汰规。棧非常適合以最近使用的順序激活線(xiàn)程。這樣可以提高性能和局部性物邑,這些優(yōu)點(diǎn)比如下缺點(diǎn)更重要:容易發(fā)生爭(zhēng)用和無(wú)法釋放工作線(xiàn)程溜哮,除非它是棧頂線(xiàn)程。當(dāng)線(xiàn)程找不到任務(wù)時(shí)色解,在壓入棧后(由ctl的低32位子字段表示)park/unpark工作線(xiàn)程茬射。頂部棧狀態(tài)保存worker的“scanState”字段的值:其索引和狀態(tài),以及除計(jì)數(shù)子字段(也用作版本標(biāo)記)之外的版本計(jì)數(shù)器冒签,防止reiber棧ABA問(wèn)題在抛。

工作線(xiàn)程和池使用字段scanState來(lái)管理和跟蹤工作線(xiàn)程是否處于INACTIVE狀態(tài)(可能是阻塞等待信號(hào)),還是SCANNING任務(wù)(當(dāng)他們都沒(méi)有忙于運(yùn)行任務(wù)時(shí))萧恕。當(dāng)一個(gè)worker被停用時(shí)刚梭,它的scanState字段被設(shè)置,并且被阻止執(zhí)行任務(wù)票唆,即使它必須掃描一次以避免排隊(duì)競(jìng)爭(zhēng)朴读。請(qǐng)注意,scanState更新滯后于隊(duì)列CAS 釋放走趋,因此需要小心使用衅金。排隊(duì)時(shí),scanState的低16位必須保存其池索引簿煌。所以在初始化時(shí)將索引放在那里(參見(jiàn)registerWorker)氮唯,將其保存在那里或在必要時(shí)恢復(fù)它。

內(nèi)存排序姨伟。請(qǐng)參閱Le惩琉、Pop、Cohen和Nardelli的“Correct and Efficient Work-Stealing for
Weak Memory Models”夺荒,文中分析的工作竊取算法中的內(nèi)存排序要求類(lèi)似于此處使用的算法瞒渠。通常需要強(qiáng)于最小排序,因?yàn)橛袝r(shí)必須向工作線(xiàn)程發(fā)出信號(hào)signal技扼,需要Dekker-like的全屏障以避免丟失信號(hào)伍玖。在沒(méi)有昂貴的過(guò)度屏障的情況下安排足夠的排序需要在支持的訪(fǎng)問(wèn)限制的方式之間進(jìn)行權(quán)衡。最核心的操作:從隊(duì)列取取元素和更新ctl狀態(tài)需要全屏障CAS剿吻。使用Unsafe提供的volatile模擬讀操作數(shù)組槽窍箍。其他線(xiàn)程訪(fǎng)問(wèn)WorkQueue base、top和array需要對(duì)這些讀取中的第一個(gè)進(jìn)行volatile加載。使用volatile的base索引的約定仔燕,并始終在其他字段之前讀取它造垛。所有者線(xiàn)程必須確保有序更新,因此寫(xiě)入使用有序內(nèi)部函數(shù)晰搀,除非他們可以搭載其他寫(xiě)入五辽。類(lèi)似的約定和基本原理適用于其他WorkQueue字段(例如“currentSteal”),這些字段僅由所有者寫(xiě)入但由其他人讀取外恕。

創(chuàng)建工作線(xiàn)程杆逗。為了創(chuàng)建一個(gè)worker,預(yù)先增加總計(jì)數(shù)(用作保留)鳞疲,并嘗試通過(guò)其工廠(chǎng)構(gòu)造一個(gè)ForkJoinWorkerThread罪郊。在構(gòu)造時(shí),新線(xiàn)程調(diào)用registerWorker尚洽,在其中構(gòu)造WorkQueue并在workQueues數(shù)組中分配索引(如果需要悔橄,擴(kuò)展數(shù)組)。然后啟動(dòng)該線(xiàn)程腺毫。如果這些步驟之間出現(xiàn)任何異常癣疟,或者從工廠(chǎng)返回null,則deregisterWorker會(huì)相應(yīng)地調(diào)整計(jì)數(shù)和記錄潮酒。如果返回null睛挚,則池繼續(xù)以少于目標(biāo)數(shù)字的worker運(yùn)行。如果發(fā)生異常急黎,則異常通常傳播給某些外部調(diào)用者扎狱。工作線(xiàn)程索引的分配避免了可能發(fā)生的掃描偏差:如果條目是在workQueues數(shù)組的前面開(kāi)始順序打包的。將數(shù)組視為一個(gè)簡(jiǎn)單的二次冪哈希表勃教,根據(jù)需要進(jìn)行擴(kuò)展淤击。 seedIndex增量確保不會(huì)發(fā)生沖突,直到需要調(diào)整大小或者取消注冊(cè)和替換worker荣回,然后保持較低的沖突概率遭贸。不能在此處使用ThreadLocalRandom.getProbe()用于類(lèi)似目的,因?yàn)榫€(xiàn)程尚未啟動(dòng)心软,但是為現(xiàn)有外部線(xiàn)程創(chuàng)建提交隊(duì)列會(huì)這么做。

停止和等待著蛙。排隊(duì)遇到幾個(gè)內(nèi)部競(jìng)爭(zhēng)删铃;最值得注意的是,生成任務(wù)的線(xiàn)程可能會(huì)錯(cuò)過(guò)查看(并發(fā)出信號(hào))另一個(gè)放棄尋找工作但尚未進(jìn)入等待隊(duì)列的線(xiàn)程踏堡。當(dāng)工作線(xiàn)程找不到要偷的任務(wù)時(shí)猎唁,它會(huì)停止并入隊(duì)。通常顷蟆,由于GC或OS調(diào)度诫隅,缺少任務(wù)是暫時(shí)的腐魂。為了減少錯(cuò)誤警報(bào)停止,掃描程序在清除sweep期間計(jì)算隊(duì)列狀態(tài)的校驗(yàn)和逐纬。 (此處和其他地方使用的穩(wěn)定性檢查是快照技術(shù)的概率變體 - 請(qǐng)參閱Herlihy&Shavit蛔屹。)工作線(xiàn)程放棄并嘗試停止僅在掃描期間總和穩(wěn)定后發(fā)生。此外豁生,為避免遺漏信號(hào)兔毒,在成功入隊(duì)后重復(fù)此掃描過(guò)程,直到再次穩(wěn)定甸箱。在這種狀態(tài)下育叁,工作線(xiàn)程在從隊(duì)列中釋放之前不能take/run它看到的任務(wù),因此工作線(xiàn)程本身最終會(huì)嘗試釋放自己或任何后繼者(請(qǐng)參閱tryRelease)芍殖。否則豪嗽,在空掃描時(shí),停用的工作線(xiàn)程在阻塞(通過(guò)park)之前使用自適應(yīng)本地自旋構(gòu)造(請(qǐng)參閱awaitWork)豌骏。注意有關(guān)Thread.interrupts周?chē)鷓ark和其他阻塞的不尋常約定:因?yàn)橹袛鄡H用于警告線(xiàn)程檢查終止昵骤,在阻塞時(shí)總是檢查終止,在任何調(diào)用park之前清除狀態(tài)(使用Thread.interrupted):由于狀態(tài)是通過(guò)用戶(hù)代碼中的其他無(wú)關(guān)的中斷調(diào)用設(shè)置的肯适,因此park不會(huì)立即返回变秦。

信號(hào)和激活。僅當(dāng)看起來(lái)至少有一個(gè)能夠找到并執(zhí)行的任務(wù)時(shí)框舔,才會(huì)創(chuàng)建或激活工作線(xiàn)程蹦玫。在push(由工作線(xiàn)程或外部提交)壓入先前(可能)空隊(duì)列時(shí),工作線(xiàn)程在空閑時(shí)會(huì)被signal刘绣,或者如果存在的數(shù)量少于給定的并行度時(shí)會(huì)創(chuàng)建新線(xiàn)程樱溉。每當(dāng)其他線(xiàn)程從隊(duì)列中移除任務(wù)時(shí)注意到還有其他任務(wù)時(shí),其他線(xiàn)程也會(huì)進(jìn)行signal纬凤。在大多數(shù)平臺(tái)上福贞,信號(hào)(unpark)開(kāi)銷(xiāo)時(shí)間明顯很長(zhǎng),并且發(fā)信號(hào)通知線(xiàn)程和線(xiàn)程實(shí)際開(kāi)始處理之間的時(shí)間可能非常長(zhǎng)停士,因此值得盡可能地從關(guān)鍵路徑減少這些延遲挖帘。此外,由于非活動(dòng)工作線(xiàn)程經(jīng)常重新掃描或自旋而不是阻塞恋技,設(shè)置并清除WorkQueues的“parker”字段以減少對(duì)unpark的不必要調(diào)用拇舀。 (這需要進(jìn)行二次重新檢查以避免錯(cuò)過(guò)信號(hào)。)

修剪工作線(xiàn)程蜻底。要在周期性的缺少使用后釋放資源骄崩,當(dāng)池靜止時(shí)開(kāi)始等待的工作線(xiàn)程將超時(shí)并終止(請(qǐng)參閱awaitWork)如果池在IDLE_TIMEOUT期間保持靜止,則隨著線(xiàn)程數(shù)減少而增加周期,最終刪除所有工作線(xiàn)程要拂。此外抠璃,當(dāng)存在兩個(gè)以上的備用線(xiàn)程時(shí),多余的線(xiàn)程會(huì)立即在下一個(gè)靜止點(diǎn)終止脱惰。 (兩個(gè)填充可以避免滯后現(xiàn)象搏嗡。)

Shutdown和Termination.。調(diào)用shutdownNow會(huì)進(jìn)一步調(diào)用tryTerminate枪芒,其以原子方式設(shè)置runState位彻况。調(diào)用線(xiàn)程以及此后終止的所有其他工作線(xiàn)程,通過(guò)設(shè)置其他線(xiàn)程的(qlock)狀態(tài)舅踪、取消未處理的任務(wù)并喚醒它們來(lái)幫助終止其他線(xiàn)程纽甘,重復(fù)這樣做直到穩(wěn)定。通過(guò)檢查終止是否應(yīng)該開(kāi)始來(lái)調(diào)用非突然shutdown()抽碌。這主要依賴(lài)于“ctl”的活動(dòng)計(jì)數(shù)位 - 每當(dāng)靜止時(shí)悍赢,都會(huì)從awaitWork調(diào)用tryTerminate。但是货徙,外部提交者不參與此共識(shí)左权。因此,tryTerminate清除隊(duì)列(直到穩(wěn)定)以確保缺少飛行中的提交痴颊,以及工作線(xiàn)程在觸發(fā)終止的“停止”階段之前處理它們赏迟。 (注意:如果在啟用關(guān)閉時(shí)調(diào)用了helpQuiescePool,則會(huì)發(fā)生內(nèi)在沖突蠢棱。兩者都等待靜止锌杀,但是在helpQuiescePool完成之前,tryTerminate偏向于不會(huì)觸發(fā)泻仙。)

2.4 Joining Tasks

Joining Tasks
=============

Any of several actions may be taken when one worker is waiting
to join a task stolen (or always held) by another.  Because we
are multiplexing many tasks on to a pool of workers, we can't
just let them block (as in Thread.join).  We also cannot just
reassign the joiner's run-time stack with another and replace
it later, which would be a form of "continuation", that even if
possible is not necessarily a good idea since we may need both
an unblocked task and its continuation to progress.  Instead we
combine two tactics:

  Helping: Arranging for the joiner to execute some task that it
     would be running if the steal had not occurred.

  Compensating: Unless there are already enough live threads,
     method tryCompensate() may create or re-activate a spare
     thread to compensate for blocked joiners until they unblock.

A third form (implemented in tryRemoveAndExec) amounts to
helping a hypothetical compensator: If we can readily tell that
a possible action of a compensator is to steal and execute the
task being joined, the joining thread can do so directly,
without the need for a compensation thread (although at the
expense of larger run-time stacks, but the tradeoff is
typically worthwhile).

The ManagedBlocker extension API can't use helping so relies
only on compensation in method awaitBlocker.

The algorithm in helpStealer entails a form of "linear
helping".  Each worker records (in field currentSteal) the most
recent task it stole from some other worker (or a submission).
It also records (in field currentJoin) the task it is currently
actively joining. Method helpStealer uses these markers to try
to find a worker to help (i.e., steal back a task from and
execute it) that could hasten completion of the actively joined
task.  Thus, the joiner executes a task that would be on its
own local deque had the to-be-joined task not been stolen. This
is a conservative variant of the approach described in Wagner &
Calder "Leapfrogging: a portable technique for implementing
efficient futures" SIGPLAN Notices, 1993
(http://portal.acm.org/citation.cfm?id=155354). It differs in
that: (1) We only maintain dependency links across workers upon
steals, rather than use per-task bookkeeping.  This sometimes
requires a linear scan of workQueues array to locate stealers,
but often doesn't because stealers leave hints (that may become
stale/wrong) of where to locate them.  It is only a hint
because a worker might have had multiple steals and the hint
records only one of them (usually the most current).  Hinting
isolates cost to when it is needed, rather than adding to
per-task overhead.  (2) It is "shallow", ignoring nesting and
potentially cyclic mutual steals.  (3) It is intentionally
racy: field currentJoin is updated only while actively joining,
which means that we miss links in the chain during long-lived
tasks, GC stalls etc (which is OK since blocking in such cases
is usually a good idea).  (4) We bound the number of attempts
to find work using checksums and fall back to suspending the
worker and if necessary replacing it with another.

Helping actions for CountedCompleters do not require tracking
currentJoins: Method helpComplete takes and executes any task
with the same root as the task being waited on (preferring
local pops to non-local polls). However, this still entails
some traversal of completer chains, so is less efficient than
using CountedCompleters without explicit joins.

Compensation does not aim to keep exactly the target
parallelism number of unblocked threads running at any given
time. Some previous versions of this class employed immediate
compensations for any blocked join. However, in practice, the
vast majority of blockages are transient byproducts of GC and
other JVM or OS activities that are made worse by replacement.
Currently, compensation is attempted only after validating that
all purportedly active threads are processing tasks by checking
field WorkQueue.scanState, which eliminates most false
positives.  Also, compensation is bypassed (tolerating fewer
threads) in the most common case in which it is rarely
beneficial: when a worker with an empty queue (thus no
continuation tasks) blocks on a join and there still remain
enough threads to ensure liveness.

The compensation mechanism may be bounded.  Bounds for the
commonPool (see commonMaxSpares) better enable JVMs to cope
with programming errors and abuse before running out of
resources to do so. In other cases, users may supply factories
that limit thread construction. The effects of bounding in this
pool (like all others) is imprecise.  Total worker counts are
decremented when threads deregister, not when they exit and
resources are reclaimed by the JVM and OS. So the number of
simultaneously live threads may transiently exceed bounds.

當(dāng)一個(gè)工作線(xiàn)程等待join另一個(gè)被盜(或總是持有)的任務(wù)時(shí)糕再,可以采取任何一種行動(dòng)。因?yàn)槲覀儗⒃S多任務(wù)復(fù)用到工作池上玉转,所以不能讓它們阻塞(如在Thread.join中)突想。也不能只是將joiner的運(yùn)行時(shí)棧重新分配給另一個(gè)并稍后替換它,這將是一種“continuation”究抓,即使可能也不一定是個(gè)好主意猾担,因?yàn)槲覀兛赡苄枰粋€(gè)未阻塞的任務(wù)并且其continuation也在進(jìn)展。相反漩蟆,我們結(jié)合了兩種策略:

  • 幫助:如果沒(méi)有發(fā)生竊取垒探,安排joiner執(zhí)行一些將要運(yùn)行的任務(wù)。
  • 補(bǔ)償:除非已有足夠的活動(dòng)線(xiàn)程怠李,否則方法tryCompensate()可以創(chuàng)建或重新激活備用線(xiàn)程,以補(bǔ)償阻塞的joiners,直到它們解除阻塞捺癞。

第三種形式(在tryRemoveAndExec中實(shí)現(xiàn))相當(dāng)于幫助假設(shè)的補(bǔ)償器:如果我們可以很容易地告訴補(bǔ)償器要竊取的可能動(dòng)作并執(zhí)行正在被joined的任務(wù)夷蚊,則joining線(xiàn)程可以直接執(zhí)行,而不需要補(bǔ)償線(xiàn)程(雖然以較大的運(yùn)行時(shí)棧為代價(jià)髓介,但該權(quán)衡通常是值得的)惕鼓。

ManagedBlocker擴(kuò)展API無(wú)法使用helping,因此僅依賴(lài)于方法awaitBlocker中的補(bǔ)償唐础。

helpStealer中的算法需要一種“線(xiàn)性幫助”的形式箱歧。每個(gè)工作線(xiàn)程(在字段currentSteal中)記錄它從其他工作線(xiàn)程(或提交)中竊取的最新任務(wù)。它還記錄(在currentJoin中)它當(dāng)前正在joining的任務(wù)一膨。方法helpStealer使用這些標(biāo)記來(lái)嘗試找到一個(gè)工作線(xiàn)程來(lái)幫助(即呀邢,從一個(gè)工作線(xiàn)程中竊取一個(gè)任務(wù)并執(zhí)行它),這可以加速主動(dòng)joined任務(wù)的完成豹绪。因此价淌,如果待加入的任務(wù)沒(méi)有被盜,則joiner執(zhí)行將在其自己的本地deque上的任務(wù)瞒津。這是Wagner&Calder所描述的方法的保守變體“Leapfrogging:a portable technique for implementing efficient futures”蝉衣。不同之處在于:

  • 1)只是在竊取時(shí)維護(hù)工作線(xiàn)程之間的依賴(lài)關(guān)系,而不是使用每個(gè)任務(wù)的簿記巷蚪。這有時(shí)需要對(duì)workQueues數(shù)組進(jìn)行線(xiàn)性?huà)呙枰哉业礁`取者病毡,但通常不會(huì),因?yàn)楦`取者留下提示(可能會(huì)陳舊/錯(cuò)誤)的位置屁柏。這只是一個(gè)提示啦膜,因?yàn)橐粋€(gè)工作線(xiàn)程可能有多次竊取,而提示只記錄其中一個(gè)(通常是最新的)前联。提示可以隔離成本到需要的程度功戚,而不是增加每個(gè)任務(wù)的開(kāi)銷(xiāo)。
  • 2)它是“淺的”似嗤,忽略了嵌套和潛在的循環(huán)相互竊取啸臀。
  • 3)故意racy:域currentJoin僅在主動(dòng)joining時(shí)更新,這意味著在長(zhǎng)期任務(wù)烁落、GC停頓等期間會(huì)丟失鏈中的鏈接(這是正常的乘粒,因?yàn)樵谶@種情況下阻塞通常是個(gè)好主意) 。
  • 4)我們限制了使用校驗(yàn)和查找工作的次數(shù)伤塌,然后回退到暫停工作線(xiàn)程灯萍,必要時(shí)將其替換為另一工作線(xiàn)程。

對(duì)CountedCompleters的幫助操作不需要跟蹤currentJoins:方法helpComplete獲取并執(zhí)行與等待的任務(wù)具有相同根的任何任務(wù)(更喜歡本地pop而非本地poll)每聪。但是旦棉,這仍然需要遍歷完整鏈齿风,因此效率低于使用沒(méi)有顯式j(luò)oin的CountedCompleters。

補(bǔ)償?shù)哪康牟皇菧?zhǔn)確保持在任何給定時(shí)間運(yùn)行的未阻塞線(xiàn)程數(shù)目都是目標(biāo)并行數(shù)绑洛。此類(lèi)的某些先前版本對(duì)任何阻塞的join使用立即補(bǔ)償救斑。但是,在實(shí)踐中真屯,絕大多數(shù)阻塞都是GC和其他JVM或OS活動(dòng)的短暫副作用脸候,會(huì)因替代而變得更糟。目前绑蔫,僅在通過(guò)檢查字段WorkQueue.scanState驗(yàn)證所有聲稱(chēng)活動(dòng)的線(xiàn)程正在處理任務(wù)之后才嘗試補(bǔ)償运沦,這消除了大多數(shù)誤報(bào)。此外配深,在最常見(jiàn)的情況下携添,補(bǔ)償被繞過(guò)(容忍更少的線(xiàn)程),因?yàn)閹?lái)的好處太少:當(dāng)具有空隊(duì)列(因此??沒(méi)有連續(xù)任務(wù))的工作線(xiàn)程阻塞在join并且仍然保留足夠的線(xiàn)程以確绷构荩活躍時(shí)薪寓。

補(bǔ)償機(jī)制可能是有限制的。 commonPool的邊界(請(qǐng)參閱commonMaxSpares)可以更好地使JVM在耗盡資源之前應(yīng)對(duì)編程錯(cuò)誤和濫用澜共。在其他情況下向叉,用戶(hù)可能會(huì)提供限制線(xiàn)程創(chuàng)建的工廠(chǎng)。此池中的限制效果(與所有其他池一樣)是不精確的嗦董。當(dāng)線(xiàn)程取消注冊(cè)時(shí)母谎,工作線(xiàn)程總數(shù)減少,而不是當(dāng)它們退出并且JVM和OS回收資源時(shí)京革。因此奇唤,同時(shí)活動(dòng)線(xiàn)程的數(shù)量可能會(huì)暫時(shí)超過(guò)限制。

2.5 Common Pool

Common Pool
===========

The static common pool always exists after static
initialization.  Since it (or any other created pool) need
never be used, we minimize initial construction overhead and
footprint to the setup of about a dozen fields, with no nested
allocation. Most bootstrapping occurs within method
externalSubmit during the first submission to the pool.

When external threads submit to the common pool, they can
perform subtask processing (see externalHelpComplete and
related methods) upon joins.  This caller-helps policy makes it
sensible to set common pool parallelism level to one (or more)
less than the total number of available cores, or even zero for
pure caller-runs.  We do not need to record whether external
submissions are to the common pool -- if not, external help
methods return quickly. These submitters would otherwise be
blocked waiting for completion, so the extra effort (with
liberally sprinkled task status checks) in inapplicable cases
amounts to an odd form of limited spin-wait before blocking in
ForkJoinTask.join.

As a more appropriate default in managed environments, unless
overridden by system properties, we use workers of subclass
InnocuousForkJoinWorkerThread when there is a SecurityManager
present. These workers have no permissions set, do not belong
to any user-defined ThreadGroup, and erase all ThreadLocals
after executing any top-level task (see WorkQueue.runTask).
The associated mechanics (mainly in ForkJoinWorkerThread) may
be JVM-dependent and must access particular Thread class fields
to achieve this effect.

靜態(tài)初始化后匹摇,靜態(tài)公共池始終存在咬扇。由于它(或任何其他創(chuàng)建的池)不需要使用,我們將初始構(gòu)造開(kāi)銷(xiāo)和占用空間最小化到大約十二個(gè)字段的設(shè)置廊勃,沒(méi)有嵌套分配懈贺。在第一次提交到池期間,大多數(shù)引導(dǎo)都發(fā)生在方法externalSubmit中坡垫。

當(dāng)外部線(xiàn)程提交到公共池時(shí)梭灿,它們可以在join時(shí)執(zhí)行子任務(wù)處理(請(qǐng)參閱externalHelpComplete和相關(guān)方法)。這種調(diào)用者幫助策略使得將公共池并行度級(jí)別設(shè)置為小于可用核心總數(shù)的一個(gè)(或更多)是明智的冰悠,對(duì)于純調(diào)用者運(yùn)行甚至為零堡妒。我們不需要記錄外部提交是否屬于公共池 - 如果不是,外部幫助方法會(huì)快速返回溉卓。否則這些提交者會(huì)阻塞等待完成皮迟,因此在不適用的情況下額外的努力(通過(guò)大量的任務(wù)狀態(tài)檢查)相當(dāng)于在ForkJoinTask.join中阻塞之前的有限自旋等待的奇怪形式搬泥。

作為管理環(huán)境中更合適的默認(rèn)設(shè)置,除非被系統(tǒng)屬性覆蓋万栅,否則當(dāng)存在SecurityManager時(shí)佑钾,我們使用子類(lèi)InnocuousForkJoinWorkerThread的worker西疤。這些worker沒(méi)有設(shè)置權(quán)限烦粒,不屬于任何用戶(hù)定義的ThreadGroup,并且在執(zhí)行任何頂級(jí)任務(wù)后擦除所有ThreadLocals(請(qǐng)參閱WorkQueue.runTask)代赁。相關(guān)的機(jī)制(主要在ForkJoinWorkerThread中)可能依賴(lài)于JVM扰她,并且必須訪(fǎng)問(wèn)特定的Thread類(lèi)字段才能實(shí)現(xiàn)此效果。

2.6 Style notes

Style notes
===========

Memory ordering relies mainly on Unsafe intrinsics that carry
the further responsibility of explicitly performing null- and
bounds- checks otherwise carried out implicitly by JVMs.  This
can be awkward and ugly, but also reflects the need to control
outcomes across the unusual cases that arise in very racy code
with very few invariants. So these explicit checks would exist
in some form anyway.  All fields are read into locals before
use, and null-checked if they are references.  This is usually
done in a "C"-like style of listing declarations at the heads
of methods or blocks, and using inline assignments on first
encounter.  Array bounds-checks are usually performed by
masking with array.length-1, which relies on the invariant that
these arrays are created with positive lengths, which is itself
paranoically checked. Nearly all explicit checks lead to
bypass/return, not exception throws, because they may
legitimately arise due to cancellation/revocation during
shutdown.

There is a lot of representation-level coupling among classes
ForkJoinPool, ForkJoinWorkerThread, and ForkJoinTask.  The
fields of WorkQueue maintain data structures managed by
ForkJoinPool, so are directly accessed.  There is little point
trying to reduce this, since any associated future changes in
representations will need to be accompanied by algorithmic
changes anyway. Several methods intrinsically sprawl because
they must accumulate sets of consistent reads of fields held in
local variables.  There are also other coding oddities
(including several unnecessary-looking hoisted null checks)
that help some methods perform reasonably even when interpreted
(not compiled).

The order of declarations in this file is (with a few exceptions):
(1) Static utility functions
(2) Nested (static) classes
(3) Static fields
(4) Fields, along with constants used when unpacking some of them
(5) Internal control methods
(6) Callbacks and other support for ForkJoinTask methods
(7) Exported methods
(8) Static block initializing statics in minimally dependent order

內(nèi)存排序主要依賴(lài)于Unsafe內(nèi)部機(jī)制芭碍,它還承擔(dān)著明確執(zhí)行空值和邊界檢查的責(zé)任徒役,否則JVM會(huì)隱式執(zhí)行這些檢查。這種方式不夠優(yōu)雅窖壕,但也反映了在競(jìng)爭(zhēng)非常激烈代碼中控制結(jié)果的需要忧勿。所以這些顯式檢查無(wú)論如何都會(huì)以某種形式存在。所有字段在使用前都會(huì)讀入本地瞻讽,如果它們是引用則會(huì)進(jìn)行空值檢查鸳吸。這通常在方法或塊的頭部以類(lèi)似“C”的方式進(jìn)行列表聲明,并在第一次遇到時(shí)使用內(nèi)聯(lián)賦值速勇。數(shù)組邊界檢查通常通過(guò)使用array.length-1掩碼來(lái)執(zhí)行晌砾,依賴(lài)于使用正長(zhǎng)度創(chuàng)建這些數(shù)組的不變量,其本身就是異常檢查烦磁。幾乎所有顯式檢查都會(huì)導(dǎo)致繞過(guò)/返回养匈,而不是異常拋出,因?yàn)樗鼈兛赡苡捎谠陉P(guān)閉期間取消/撤銷(xiāo)而合法地出現(xiàn)都伪。

類(lèi)ForkJoinPool呕乎、ForkJoinWorkerThread和ForkJoinTask之間有很多表示級(jí)耦合。 WorkQueue維護(hù)的數(shù)據(jù)結(jié)構(gòu)的字段由ForkJoinPool管理陨晶,因此可以直接訪(fǎng)問(wèn)猬仁。試圖減少這一點(diǎn)沒(méi)有什么意義,因?yàn)楸硎局腥魏蜗嚓P(guān)的未來(lái)變化都需要伴隨著算法變化珍逸。有幾種方法本質(zhì)上是無(wú)用的逐虚,因?yàn)樗鼈儽仨毨鄯e一組對(duì)局部變量中保存的字段的一致讀取。還有其他奇怪編碼(包括幾個(gè)看似不必要的空值檢查)谆膳,這有助于某些方法即使在解釋?zhuān)ㄎ淳幾g)時(shí)也能合理地執(zhí)行叭爱。

此文件中的聲明順序是(除了少數(shù)例外):

  • 1)靜態(tài)實(shí)用程序
  • 2)嵌套(靜態(tài))類(lèi)
  • 3)靜態(tài)字段
  • 4)字段,以及unpacking其中一些字段時(shí)使用的常量
  • 5)內(nèi)部控制方法
  • 6)對(duì)ForkJoinTask方法的回調(diào)和其他支持
  • 7)導(dǎo)出的方法
  • 8)以最小依賴(lài)順序初始化靜態(tài)變量的靜態(tài)塊
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末漱病,一起剝皮案震驚了整個(gè)濱河市买雾,隨后出現(xiàn)的幾起案子把曼,更是在濱河造成了極大的恐慌,老刑警劉巖漓穿,帶你破解...
    沈念sama閱讀 211,743評(píng)論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件嗤军,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡晃危,警方通過(guò)查閱死者的電腦和手機(jī)叙赚,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,296評(píng)論 3 385
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)僚饭,“玉大人震叮,你說(shuō)我怎么就攤上這事△⑼遥” “怎么了苇瓣?”我有些...
    開(kāi)封第一講書(shū)人閱讀 157,285評(píng)論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀(guān)的道長(zhǎng)偿乖。 經(jīng)常有香客問(wèn)我击罪,道長(zhǎng),這世上最難降的妖魔是什么贪薪? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 56,485評(píng)論 1 283
  • 正文 為了忘掉前任媳禁,我火速辦了婚禮,結(jié)果婚禮上古掏,老公的妹妹穿的比我還像新娘损话。我一直安慰自己,他們只是感情好槽唾,可當(dāng)我...
    茶點(diǎn)故事閱讀 65,581評(píng)論 6 386
  • 文/花漫 我一把揭開(kāi)白布丧枪。 她就那樣靜靜地躺著,像睡著了一般庞萍。 火紅的嫁衣襯著肌膚如雪拧烦。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書(shū)人閱讀 49,821評(píng)論 1 290
  • 那天钝计,我揣著相機(jī)與錄音恋博,去河邊找鬼。 笑死私恬,一個(gè)胖子當(dāng)著我的面吹牛债沮,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播本鸣,決...
    沈念sama閱讀 38,960評(píng)論 3 408
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼疫衩,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了荣德?” 一聲冷哼從身側(cè)響起闷煤,我...
    開(kāi)封第一講書(shū)人閱讀 37,719評(píng)論 0 266
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤童芹,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后鲤拿,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體假褪,經(jīng)...
    沈念sama閱讀 44,186評(píng)論 1 303
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,516評(píng)論 2 327
  • 正文 我和宋清朗相戀三年近顷,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了生音。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 38,650評(píng)論 1 340
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡幕庐,死狀恐怖久锥,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情异剥,我是刑警寧澤,帶...
    沈念sama閱讀 34,329評(píng)論 4 330
  • 正文 年R本政府宣布絮重,位于F島的核電站冤寿,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏青伤。R本人自食惡果不足惜督怜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,936評(píng)論 3 313
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望狠角。 院中可真熱鬧号杠,春花似錦、人聲如沸丰歌。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 30,757評(píng)論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)立帖。三九已至眼溶,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間晓勇,已是汗流浹背堂飞。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 31,991評(píng)論 1 266
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留绑咱,地道東北人绰筛。 一個(gè)月前我還...
    沈念sama閱讀 46,370評(píng)論 2 360
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像描融,于是被迫代替她去往敵國(guó)和親铝噩。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 43,527評(píng)論 2 349

推薦閱讀更多精彩內(nèi)容

  • 為什么需要線(xiàn)程池 對(duì)象復(fù)用思想在編程中有很多應(yīng)用稼稿,不論是線(xiàn)程池還是連接池都是一種對(duì)象復(fù)用的思想薄榛。今天來(lái)談?wù)凧ava...
    zhong0316閱讀 25,852評(píng)論 0 20
  • Java平臺(tái)類(lèi)庫(kù)包含了豐富的并發(fā)基礎(chǔ)構(gòu)建模塊讳窟,例如線(xiàn)程安全的容器類(lèi)以及各種用于協(xié)調(diào)多個(gè)相互協(xié)作的線(xiàn)程控制流的同步工...
    Steven1997閱讀 555評(píng)論 0 0
  • 多線(xiàn)程 1.悲觀(guān)鎖和樂(lè)觀(guān)鎖 http://www.importnew.com/21037.html 悲觀(guān)鎖悲觀(guān)鎖,...
    bigfish1129閱讀 367評(píng)論 0 0
  • 鹿玙—40 最近一段時(shí)間陜西榆林的產(chǎn)婦墜樓事件屢屢登上微博熱搜敞恋,引起了廣泛的關(guān)注丽啡,在最初的網(wǎng)友評(píng)論里,幾乎是一邊倒...
    鹿玙閱讀 232評(píng)論 0 0
  • 九月的風(fēng) 九月的并州 還殘留著夏的氣息 卻帶來(lái)了秋的涼爽 過(guò)了曼妙的夏季 好像參加了一場(chǎng)盛大的party 絢麗 彌...
    楊二妞閱讀 348評(píng)論 0 1