Thank Zhihao Tao for your hard work. The document spent countless nights and weekends, using his hard work to make it convenient for everyone.
If you have any questions, please send a email to zhihao.tao@outlook.com

1. 流的內(nèi)存處理

在suricata中跟蹤流就需要使用內(nèi)存伟骨。流越多垃环，所需的內(nèi)存就越多帘睦。
因此我們要保持對內(nèi)存使用的控制五垮，有幾個(gè)選項(xiàng):

用于設(shè)置流引擎將使用的最大字節(jié)數(shù)的memcap選項(xiàng)
用于設(shè)置哈希表大小的哈希大小
用于以下內(nèi)容的預(yù)分配：
- 對于還不屬于流的數(shù)據(jù)包小泉，Suricata創(chuàng)建了一個(gè)新的流。這是一個(gè)相對昂貴的行動弥搞。
- 由此帶來的風(fēng)險(xiǎn)是兆衅，攻擊者/黑客可以在此部分攻擊引擎系統(tǒng)。
- 當(dāng)它們確保一臺計(jì)算機(jī)獲得許多具有不同元組的數(shù)據(jù)包時(shí)风响，引擎必須生成許多新的流嘉汰。
- 這樣，攻擊者就可以淹沒系統(tǒng)状勤。
- 為了減輕引擎過載鞋怀，此選項(xiàng)指示Suricata在內(nèi)存中保持多個(gè)流就緒。這樣一來持搜，Suricata就不那么容易受到此類攻擊密似。

流引擎有一個(gè)獨(dú)立于包處理的管理線程。這個(gè)線程稱為流管理器葫盼。該線程確保盡可能在Memcap內(nèi)残腌。將準(zhǔn)備10000個(gè)流。

flow:
  memcap: 33554432              #The maximum amount of bytes the flow-engine will make use of.
  hash_size: 65536              #Flows will be organized in a hash-table. With this option you can set the
                                #size of the hash-table.
  Prealloc: 10000               #The amount of flows Suricata has to keep ready in memory.
  emergency_recovery: 30                  #Percentage of 1000 prealloc'd flows.
  prune_flows: 5                          #Amount of flows being terminated during the emergency mode.

1.1 memcap選項(xiàng)

memcap選項(xiàng)用于設(shè)置流引擎將使用的最大字節(jié)數(shù)贫导。

默認(rèn)memcap為32M抛猫，即33554432字節(jié)。

#define FLOW_DEFAULT_MEMCAP      (32 * 1024 * 1024) /* 32 MB */
SC_ATOMIC_SET(flow_config.memcap, FLOW_DEFAULT_MEMCAP);

通過FLOW_CHECK_MEMCAP來檢查內(nèi)存分配的字節(jié)數(shù)是否超過了memcap脱盲。

/** \brief check if a memory alloc would fit in the memcap
 *
 *  \param size memory allocation size to check
 *
 *  \retval 1 it fits
 *  \retval 0 no fit
 */
#define FLOW_CHECK_MEMCAP(size) \
    ((((uint64_t)SC_ATOMIC_GET(flow_memuse) + (uint64_t)(size)) <= SC_ATOMIC_GET(flow_config.memcap)))

1.1.1 流的快速分配

當(dāng)一條新流到達(dá)而spare隊(duì)列中沒有剩余的空閑流邑滨，進(jìn)入快速分配流程。（緊急處理參見emergency_recovery選項(xiàng)）钱反。

static Flow *FlowGetNew(ThreadVars *tv, DecodeThreadVars *dtv, const Packet *p)
{
...
    f = FlowDequeue(&flow_spare_q);
    if (f == NULL) {
        /* If we reached the max memcap, we get a used flow */
        if (!(FLOW_CHECK_MEMCAP(sizeof(Flow) + FlowStorageSize()))) {

直接分配一條新的流掖看。

        } else {
            /* now see if we can alloc a new flow */
            f = FlowAlloc();

1.2 hash_size選項(xiàng)

hash_size選項(xiàng)用于設(shè)置哈希表大小的哈希大小。

hash種子是一個(gè)隨機(jī)數(shù)面哥。
hash大小默認(rèn)為65536哎壳。

#define FLOW_DEFAULT_HASHSIZE    65536
flow_config.hash_rand   = (uint32_t)RandomGet();
flow_config.hash_size   = FLOW_DEFAULT_HASHSIZE;

1.3 prealloc選項(xiàng)

prealloc選項(xiàng)用于設(shè)置內(nèi)存中預(yù)分配流的數(shù)量。

#define FLOW_DEFAULT_PREALLOC    10000
flow_config.prealloc    = FLOW_DEFAULT_PREALLOC;

1.3.1 預(yù)分配的初始化

/* pre allocate flows */
for (i = 0; i < flow_config.prealloc; i++) {
    if (!(FLOW_CHECK_MEMCAP(sizeof(Flow) + FlowStorageSize()))) {
        SCLogError(SC_ERR_FLOW_INIT, "preallocating flows failed: "
                "max flow memcap reached. Memcap %"PRIu64", "
                "Memuse %"PRIu64".", SC_ATOMIC_GET(flow_config.memcap),
                ((uint64_t)SC_ATOMIC_GET(flow_memuse) + (uint64_t)sizeof(Flow)));
        exit(EXIT_FAILURE);
    }

    Flow *f = FlowAlloc();
    if (f == NULL) {
        SCLogError(SC_ERR_FLOW_INIT, "preallocating flow failed: %s", strerror(errno));
        exit(EXIT_FAILURE);
    }

    FlowEnqueue(&flow_spare_q,f);
}

1.3.1 預(yù)分配的管理

流管理器會定時(shí)對于預(yù)分配的流的數(shù)量進(jìn)行管理尚卫。

少則補(bǔ)之

int FlowUpdateSpareFlows(void)
{
    SCEnter();
    uint32_t toalloc = 0, tofree = 0, len;

    FQLOCK_LOCK(&flow_spare_q);
    len = flow_spare_q.len;
    FQLOCK_UNLOCK(&flow_spare_q);

    if (len < flow_config.prealloc) {
        toalloc = flow_config.prealloc - len;

        uint32_t i;
        for (i = 0; i < toalloc; i++) {
            Flow *f = FlowAlloc();
            if (f == NULL)
                return 0;

            FlowEnqueue(&flow_spare_q,f);
        }

多則刪之

    } else if (len > flow_config.prealloc) {
        tofree = len - flow_config.prealloc;

        uint32_t i;
        for (i = 0; i < tofree; i++) {
            /* FlowDequeue locks the queue */
            Flow *f = FlowDequeue(&flow_spare_q);
            if (f == NULL)
                return 1;

            FlowFree(f);
        }
    }

    return 1;
}

1.4 emergency_recovery選項(xiàng)

emergency_recovery選項(xiàng)使得流引擎進(jìn)入緊急模式归榕。在此模式下，引擎將利用較短的超時(shí)時(shí)間吱涉。其讓流利用較短的超時(shí)時(shí)間刹泄，它使流以更積極的方式過期外里，因此將有更多空間容納新的流。

緊急恢復(fù)特石。緊急恢復(fù)設(shè)置為30盅蝗。這是預(yù)分配流的百分比，在此百分比之后姆蘸，流引擎將恢復(fù)正常（當(dāng)10000個(gè)流中的30％完成時(shí)）墩莫。
修剪流。如果在緊急模式中逞敷，過度超時(shí)沒有所需的結(jié)果狂秦，則此選項(xiàng)是最終的解決方案。它結(jié)束了一些流推捐，即使他們還沒有達(dá)到他們的超時(shí)時(shí)間裂问。修剪流選項(xiàng)顯示每次設(shè)置新流時(shí)將終止的流的數(shù)量。

#define FLOW_DEFAULT_EMERGENCY_RECOVERY 30
flow_config.emergency_recovery = FLOW_DEFAULT_EMERGENCY_RECOVERY;

1.4.1 緊急模式進(jìn)入

獲取新的Flow

static Flow *FlowGetNew(ThreadVars *tv, DecodeThreadVars *dtv, const Packet *p)
{
...
    f = FlowDequeue(&flow_spare_q);
    if (f == NULL) {
        /* If we reached the max memcap, we get a used flow */

如果達(dá)到MEMCAP后玖姑，進(jìn)入緊急模式愕秫，超時(shí)時(shí)間改為緊急超時(shí)時(shí)間。

        if (!(FLOW_CHECK_MEMCAP(sizeof(Flow) + FlowStorageSize()))) {
            /* declare state of emergency */
            if (!(SC_ATOMIC_GET(flow_flags) & FLOW_EMERGENCY)) {
                SC_ATOMIC_OR(flow_flags, FLOW_EMERGENCY);

                FlowTimeoutsEmergency();

                /* under high load, waking up the flow mgr each time leads
                 * to high cpu usage. Flows are not timed out much faster if
                 * we check a 1000 times a second. */
                FlowWakeupFlowManagerThread();
            }

            f = FlowGetUsedFlow(tv, dtv);

遍歷哈希焰络，直到可以釋放流戴甩。
- 不要修剪包或流消息在使用的流。
- 輸出日志闪彼。
- flow_prune_idx確保我們不會每次都從頂部開始甜孤，因?yàn)槟菢訒宄⒘械捻敳浚瑥亩鴮?dǎo)致在高壓下搜索時(shí)間越來越長畏腕。

static Flow *FlowGetUsedFlow(ThreadVars *tv, DecodeThreadVars *dtv)
{
...
        if (SC_ATOMIC_GET(f->use_cnt) > 0) {
            FBLOCK_UNLOCK(fb);
            FLOWLOCK_UNLOCK(f);
            continue;
        }

從hash中刪除缴川，設(shè)置FORCED和EMERGENCY標(biāo)志收壕。

        /* remove from the hash */

        f->flow_end_flags |= FLOW_END_FLAG_FORCED;

        if (SC_ATOMIC_GET(flow_flags) & FLOW_EMERGENCY)
            f->flow_end_flags |= FLOW_END_FLAG_EMERGENCY;

log記錄七问，清除舊內(nèi)存，初始為新狀態(tài)锋华，增加flow_prune_idx铭污。

        /* invoke flow log api */
        if (dtv && dtv->output_flow_thread_data)
            (void)OutputFlowLog(tv, dtv->output_flow_thread_data, f);

        FlowClearMemory(f, f->protomap);

        FlowUpdateState(f, FLOW_STATE_NEW);

        FLOWLOCK_UNLOCK(f);

        (void) SC_ATOMIC_ADD(flow_prune_idx, (flow_config.hash_size - cnt));

1.4.1 緊急模式退出

獲取spare隊(duì)列中的flow數(shù)

static TmEcode FlowManager(ThreadVars *th_v, void *thread_data)
{
...
        uint32_t len = 0;
        FQLOCK_LOCK(&flow_spare_q);
        len = flow_spare_q.len;
        FQLOCK_UNLOCK(&flow_spare_q);
        StatsSetUI64(th_v, ftd->flow_mgr_spare, (uint64_t)len);

如果可用flow與預(yù)分配流的百分比大于emergency_recovery選項(xiàng)的配置恋日。

            if (len * 100 / flow_config.prealloc > flow_config.emergency_recovery) {
                SC_ATOMIC_AND(flow_flags, ~FLOW_EMERGENCY);

恢復(fù)正常的超時(shí)時(shí)間，退出緊急狀態(tài)嘹狞。

                FlowTimeoutsReset();

2. 流管理器

2.1 流狀態(tài)

流可以處于不同的狀態(tài)岂膳。Suricata區(qū)分TCP流狀態(tài)和UDP流狀態(tài)。

enum FlowState {
    FLOW_STATE_NEW = 0,
    FLOW_STATE_ESTABLISHED,
    FLOW_STATE_CLOSED,
    FLOW_STATE_LOCAL_BYPASSED,
#ifdef CAPTURE_OFFLOAD
    FLOW_STATE_CAPTURE_BYPASSED,
#endif
};

2.1.1 TCP流狀態(tài)

New: 三次握手期間的時(shí)間段磅网。
Established: 三次握手期完成后的狀態(tài)谈截。
Closed: 關(guān)閉狀態(tài)。有幾種方法可以結(jié)束流程。這是通過復(fù)位或四次FIN揮手進(jìn)行的簸喂。

static void StreamTcpPacketSetState(Packet *p, TcpSession *ssn,
                                           uint8_t state)
{
...
    /* update the flow state */
    switch(ssn->state) {
        case TCP_ESTABLISHED:
        case TCP_FIN_WAIT1:
        case TCP_FIN_WAIT2:
        case TCP_CLOSING:
        case TCP_CLOSE_WAIT:
            FlowUpdateState(p->flow, FLOW_STATE_ESTABLISHED);
            break;
        case TCP_LAST_ACK:
        case TCP_TIME_WAIT:
        case TCP_CLOSED:
            FlowUpdateState(p->flow, FLOW_STATE_CLOSED);
            break;
    }
}

2.1.2 UDP流狀態(tài)

New: 流剛剛創(chuàng)建后的狀態(tài)毙死。
Established: 數(shù)據(jù)包從兩個(gè)方向發(fā)送接收。

void FlowHandlePacketUpdate(Flow *f, Packet *p)
{
...
    if (SC_ATOMIC_GET(f->flow_state) == FLOW_STATE_ESTABLISHED) {
        SCLogDebug("pkt %p FLOW_PKT_ESTABLISHED", p);
        p->flowflags |= FLOW_PKT_ESTABLISHED;

    } else if ((f->flags & (FLOW_TO_DST_SEEN|FLOW_TO_SRC_SEEN)) ==
            (FLOW_TO_DST_SEEN|FLOW_TO_SRC_SEEN)) {
        SCLogDebug("pkt %p FLOW_PKT_ESTABLISHED", p);
        p->flowflags |= FLOW_PKT_ESTABLISHED;

        if (f->proto != IPPROTO_TCP) {
            FlowUpdateState(f, FLOW_STATE_ESTABLISHED);
        }
    }

Local_bypassed: 僅從一個(gè)方向發(fā)送數(shù)據(jù)包的狀態(tài)娘赴。
如果數(shù)據(jù)包到來的時(shí)間超過1/2超時(shí)時(shí)間规哲，則降級到Local_bypassed狀態(tài)。

void FlowHandlePacketUpdate(Flow *f, Packet *p)
{
#ifdef CAPTURE_OFFLOAD
    int state = SC_ATOMIC_GET(f->flow_state);

    if (state != FLOW_STATE_CAPTURE_BYPASSED) {
#endif
        /* update the last seen timestamp of this flow */
        COPY_TIMESTAMP(&p->ts, &f->lastts);
#ifdef CAPTURE_OFFLOAD
    } else {
        /* still seeing packet, we downgrade to local bypass */
        if (p->ts.tv_sec - f->lastts.tv_sec > FLOW_BYPASSED_TIMEOUT / 2) {
            SCLogDebug("Downgrading flow to local bypass");
            COPY_TIMESTAMP(&p->ts, &f->lastts);
            FlowUpdateState(f, FLOW_STATE_LOCAL_BYPASSED);
        }
...

2.2 流超時(shí)

Suricata將流保持在內(nèi)存中的時(shí)間由流超時(shí)時(shí)間確定诽表。

flow-timeouts:

  default:
    new: 30                     #Time-out in seconds after the last activity in this flow in a New state.
    established: 300            #Time-out in seconds after the last activity in this flow in a Established
                                #state.
    emergency_new: 10           #Time-out in seconds after the last activity in this flow in a New state
                                #during the emergency mode.
    emergency_established: 100  #Time-out in seconds after the last activity in this flow in a Established
                                #state in the emergency mode.
  tcp:
    new: 60
    established: 3600
    closed: 120
    emergency_new: 10
    emergency_established: 300
    emergency_closed: 20
  udp:
    new: 30
    established: 300
    emergency_new: 10
    emergency_established: 100
  icmp:
    new: 30
    established: 300
    emergency_new: 10
    emergency_established: 100

2.2.1 超時(shí)處理

流的超時(shí)管理是在流管理器（FlowManager線程）實(shí)現(xiàn)的。

static TmEcode FlowManager(ThreadVars *th_v, void *thread_data)
...
        FlowTimeoutHash(&ts, 0 /* check all */, ftd->min, ftd->max, &counters);

遍歷整個(gè)流的hash桶隅肥，對桶中的所有流進(jìn)行檢查竿奏。

static uint32_t FlowTimeoutHash(struct timeval *ts, 
...
if (SC_ATOMIC_GET(flow_flags) & FLOW_EMERGENCY)
        emergency = 1;

    for (idx = hash_min; idx < hash_max; idx++) {
        FlowBucket *fb = &flow_hash[idx];

        counters->rows_checked++;

        int32_t check_ts = SC_ATOMIC_GET(fb->next_ts);
        if (check_ts > (int32_t)ts->tv_sec) {
            counters->rows_skipped++;
            continue;
        }

獲取Row Lock（桶的鎖）之前，先要確保報(bào)文池中至少有9個(gè)包腥放。

        /* before grabbing the row lock, make sure we have at least
         * 9 packets in the pool */
        PacketPoolWaitForN(9);

        if (FBLOCK_TRYLOCK(fb) != 0) {
            counters->rows_busy++;
            continue;
        }

        /* flow hash bucket is now locked */

        if (fb->tail == NULL) {
            SC_ATOMIC_SET(fb->next_ts, INT_MAX);
            counters->rows_empty++;
            goto next;
        }

        int32_t next_ts = 0;
        /* we have a flow, or more than one */
        cnt += FlowManagerHashRowTimeout(fb->tail, ts, emergency, counters, &next_ts);

對hash桶中的flow進(jìn)行檢查泛啸。
- 根據(jù)lastts和狀態(tài)檢查流超時(shí)。
- 獲取Flow Lock之前秃症，先要確保報(bào)文池中至少有3個(gè)包候址。

static uint32_t FlowManagerHashRowTimeout(Flow *f, struct timeval *ts,
        int emergency, FlowTimeoutCounters *counters, int32_t *next_ts)
{
...
        /* timeout logic goes here */
        if (FlowManagerFlowTimeout(f, state, ts, next_ts) == 0) {
...
        /* before grabbing the flow lock, make sure we have at least
         * 3 packets in the pool */
        PacketPoolWaitForN(3);

        FLOWLOCK_WRLOCK(f);

檢查流是否已完全超時(shí)，如果超時(shí)進(jìn)行放棄种柑，并把流放入flow_recycle_q隊(duì)列中岗仑。

        /* check if the flow is fully timed out and
         * ready to be discarded. */
        if (FlowManagerFlowTimedOut(f, ts, counters) == 1) {
...
            f->flow_end_flags |= FLOW_END_FLAG_TIMEOUT;
...
            FlowEnqueue(&flow_recycle_q, f);

檢查use_cnt引用計(jì)數(shù)，進(jìn)行流的強(qiáng)制重組聚请。

static inline int FlowManagerFlowTimedOut(Flow *f, struct timeval *ts, 
...
    if (SC_ATOMIC_GET(f->use_cnt) > 0) {
        return 0;
    }
...
    if (!(f->flags & FLOW_TIMEOUT_REASSEMBLY_DONE) &&
#ifdef CAPTURE_OFFLOAD
            SC_ATOMIC_GET(f->flow_state) != FLOW_STATE_CAPTURE_BYPASSED &&
#endif
            SC_ATOMIC_GET(f->flow_state) != FLOW_STATE_LOCAL_BYPASSED &&
            FlowForceReassemblyNeedReassembly(f, &server, &client) == 1) {
        FlowForceReassemblyForFlow(f, server, client);
        return 0;
    }
...
    return 1;
}

Suricata-流的處理