大師兄的Python源碼學(xué)習(xí)筆記(四十一): Python的多線程機(jī)制(三)
大師兄的Python源碼學(xué)習(xí)筆記(四十三): Python的多線程機(jī)制(五)
四、創(chuàng)建線程
2. 線程狀態(tài)保護(hù)機(jī)制
- 我們已經(jīng)知道,在Python中每個線程都會有一個PyThreadState對象與之關(guān)聯(lián),它保存著對應(yīng)線程的狀態(tài)和獨有信息。
Include\pystate.h
typedef struct _ts {
/* See Python/ceval.c for comments explaining most fields */
struct _ts *prev;
struct _ts *next;
PyInterpreterState *interp;
struct _frame *frame;
int recursion_depth;
char overflowed; /* The stack has overflowed. Allow 50 more calls
to handle the runtime error. */
char recursion_critical; /* The current calls must not cause
a stack overflow. */
int stackcheck_counter;
/* 'tracing' keeps track of the execution depth when tracing/profiling.
This is to prevent the actual trace/profile code from being recorded in
the trace/profile. */
int tracing;
int use_tracing;
Py_tracefunc c_profilefunc;
Py_tracefunc c_tracefunc;
PyObject *c_profileobj;
PyObject *c_traceobj;
/* The exception currently being raised */
PyObject *curexc_type;
PyObject *curexc_value;
PyObject *curexc_traceback;
/* The exception currently being handled, if no coroutines/generators
* are present. Always last element on the stack referred to be exc_info.
*/
_PyErr_StackItem exc_state;
/* Pointer to the top of the stack of the exceptions currently
* being handled */
_PyErr_StackItem *exc_info;
PyObject *dict; /* Stores per-thread state */
int gilstate_counter;
PyObject *async_exc; /* Asynchronous exception to raise */
unsigned long thread_id; /* Thread id where this tstate was created */
int trash_delete_nesting;
PyObject *trash_delete_later;
/* Called when a thread state is deleted normally, but not when it
* is destroyed after fork().
* Pain: to prevent rare but fatal shutdown errors (issue 18808),
* Thread.join() must wait for the join'ed thread's tstate to be unlinked
* from the tstate chain. That happens at the end of a thread's life,
* in pystate.c.
* The obvious way doesn't quite work: create a lock which the tstate
* unlinking code releases, and have Thread.join() wait to acquire that
* lock. The problem is that we _are_ at the end of the thread's life:
* if the thread holds the last reference to the lock, decref'ing the
* lock will delete the lock, and that may trigger arbitrary Python code
* if there's a weakref, with a callback, to the lock. But by this time
* _PyThreadState_Current is already NULL, so only the simplest of C code
* can be allowed to run (in particular it must not be possible to
* release the GIL).
* So instead of holding the lock directly, the tstate holds a weakref to
* the lock: that's the value of on_delete_data below. Decref'ing a
* weakref is harmless.
* on_delete points to _threadmodule.c's static release_sentinel() function.
* After the tstate is unlinked, release_sentinel is called with the
* weakref-to-lock (on_delete_data) argument, and release_sentinel releases
* the indirectly held lock.
*/
void (*on_delete)(void *);
void *on_delete_data;
int coroutine_origin_tracking_depth;
PyObject *coroutine_wrapper;
int in_coroutine_wrapper;
PyObject *async_gen_firstiter;
PyObject *async_gen_finalizer;
PyObject *context;
uint64_t context_ver;
/* Unique thread state id. */
uint64_t id;
/* XXX signal handlers should also be here */
} PyThreadState;
- 從結(jié)構(gòu)體代碼可以看出楼眷,PyThreadState對象中保存著當(dāng)前線程的PyFrameObject對象及線程id等信息澄干。
- Python內(nèi)部有一套機(jī)制旺芽,用來保證進(jìn)程始終在自己的上下文環(huán)境中運行,所以需要訪問PyThreadState中的信息擅这。
- 再觀察結(jié)構(gòu)體代碼的頭部澈魄,可以發(fā)現(xiàn)PyThreadState對象是一個鏈表結(jié)構(gòu),將所有PyThreadState對象串聯(lián)起來仲翎。
struct _ts *prev;
struct _ts *next;
- Python會使用一套TSS(Thread Specific Storage)機(jī)制來管理線程信息痹扇。
Include\pythread.h
typedef struct _Py_tss_t Py_tss_t; /* opaque */
struct _Py_tss_t {
int _is_initialized;
NATIVE_TSS_KEY_T _key;
};
Python\thread.c
Py_tss_t *
PyThread_tss_alloc(void)
{
Py_tss_t *new_key = (Py_tss_t *)PyMem_RawMalloc(sizeof(Py_tss_t));
if (new_key == NULL) {
return NULL;
}
new_key->_is_initialized = 0;
return new_key;
}
- 同時還會創(chuàng)建一個獨立的鎖和TSSkey密鑰:
Python\pystate.c
static _PyInitError
_PyRuntimeState_Init_impl(_PyRuntimeState *runtime)
{
memset(runtime, 0, sizeof(*runtime));
_PyGC_Initialize(&runtime->gc);
_PyEval_Initialize(&runtime->ceval);
runtime->gilstate.check_enabled = 1;
/* A TSS key must be initialized with Py_tss_NEEDS_INIT
in accordance with the specification. */
Py_tss_t initial = Py_tss_NEEDS_INIT;
runtime->gilstate.autoTSSkey = initial;
runtime->interpreters.mutex = PyThread_allocate_lock();
if (runtime->interpreters.mutex == NULL) {
return _Py_INIT_ERR("Can't initialize threads for interpreter");
}
runtime->interpreters.next_id = -1;
return _Py_INIT_OK();
}
- TSS有一套API用于處理線程信息。
3. 從GIL到字節(jié)碼解釋器
- 回顧線程創(chuàng)建的過程:
Python\pystate.c
static PyThreadState *
new_threadstate(PyInterpreterState *interp, int init)
{
PyThreadState *tstate = (PyThreadState *)PyMem_RawMalloc(sizeof(PyThreadState));
if (_PyThreadState_GetFrame == NULL)
_PyThreadState_GetFrame = threadstate_getframe;
if (tstate != NULL) {
... ...
if (init)
_PyThreadState_Init(tstate);
... ...
}
return tstate;
}
- 觀察_PyThreadState_Init函數(shù):
Python\pystate.c
void
_PyThreadState_Init(PyThreadState *tstate)
{
_PyGILState_NoteThreadState(tstate);
}
Python\pystate.c
static void
_PyGILState_NoteThreadState(PyThreadState* tstate)
{
/* If autoTSSkey isn't initialized, this must be the very first
threadstate created in Py_Initialize(). Don't do anything for now
(we'll be back here when _PyGILState_Init is called). */
if (!_PyRuntime.gilstate.autoInterpreterState)
return;
/* Stick the thread state for this thread in thread specific storage.
The only situation where you can legitimately have more than one
thread state for an OS level thread is when there are multiple
interpreters.
You shouldn't really be using the PyGILState_ APIs anyway (see issues
#10915 and #15751).
The first thread state created for that given OS level thread will
"win", which seems reasonable behaviour.
*/
if (PyThread_tss_get(&_PyRuntime.gilstate.autoTSSkey) == NULL) {
if ((PyThread_tss_set(&_PyRuntime.gilstate.autoTSSkey, (void *)tstate)
) != 0)
{
Py_FatalError("Couldn't create autoTSSkey mapping");
}
}
/* PyGILState_Release must not try to delete this thread state. */
tstate->gilstate_counter = 1;
}
- _PyGILState_NoteThreadState配置了線程對象狀態(tài)密鑰溯香。
- 這里要注意的是當(dāng)前活動的線程不一定獲得了GIL:
- 由于主線程和子線程都對應(yīng)操作系統(tǒng)的原生線程鲫构,而操作系統(tǒng)級別的線程調(diào)度和python級別的線程調(diào)度不同,所以操作系統(tǒng)系統(tǒng)是可能在主線程和子線程之間切換的逐哈。
- 但是當(dāng)所有的線程都完成了初始化動作之后芬迄,操作系統(tǒng)的線程調(diào)度和python的線程調(diào)度才會統(tǒng)一问顷。
- 那時python的線程調(diào)度會迫使當(dāng)前活動線程釋放GIL昂秃,而這一操作會觸發(fā)操作系統(tǒng)內(nèi)核的用于管理線程調(diào)度的對象,進(jìn)而觸發(fā)操作系統(tǒng)對線程的調(diào)度杜窄。
- 回到上一章肠骆,子線程開始了與主線程對GIL的競爭:
Modules\_threadmodule.c
static void
t_bootstrap(void *boot_raw)
{
struct bootstate *boot = (struct bootstate *) boot_raw;
PyThreadState *tstate;
PyObject *res;
tstate = boot->tstate;
tstate->thread_id = PyThread_get_thread_ident();
_PyThreadState_Init(tstate);
PyEval_AcquireThread(tstate);
tstate->interp->num_threads++;
res = PyObject_Call(boot->func, boot->args, boot->keyw);
... ...
}
- 主線程和子線程通過PyEval_AcquireThread爭奪GIL:
Python\ceval.c
void
PyEval_AcquireThread(PyThreadState *tstate)
{
if (tstate == NULL)
Py_FatalError("PyEval_AcquireThread: NULL new thread state");
/* Check someone has called PyEval_InitThreads() to create the lock */
assert(gil_created());
take_gil(tstate);
if (PyThreadState_Swap(tstate) != NULL)
Py_FatalError(
"PyEval_AcquireThread: non-NULL old thread state");
}
- 這里有一個關(guān)鍵方法PyThreadState_Swap之前沒有提到:
Python\pystate.c
PyThreadState *
PyThreadState_Swap(PyThreadState *newts)
{
PyThreadState *oldts = GET_TSTATE();
SET_TSTATE(newts);
/* It should not be possible for more than one thread state
to be used for a thread. Check this the best we can in debug
builds.
*/
#if defined(Py_DEBUG)
if (newts) {
/* This can be called from PyEval_RestoreThread(). Similar
to it, we need to ensure errno doesn't change.
*/
int err = errno;
PyThreadState *check = PyGILState_GetThisThreadState();
if (check && check->interp == newts->interp && check != newts)
Py_FatalError("Invalid thread state for this thread");
errno = err;
}
#endif
return oldts;
}
- 當(dāng)子線程被Python的線程調(diào)度機(jī)制喚醒后,首先就要通過PyThreadState_Swap將Python維護(hù)的當(dāng)前線程狀態(tài)對象設(shè)置為其自身的狀態(tài)對象塞耕。
- 之后子線程繼續(xù)完成初始化蚀腿,并最終進(jìn)入解釋器,被Python線程調(diào)度機(jī)制控制。
- 這里需要再次強(qiáng)調(diào)一下莉钙,thread_PyThread_start_new_thread是從主線程中執(zhí)行廓脆,而從t_bootstrap開始,則是在子線程中執(zhí)行的磁玉。