圖智能 101

Why

Graph無處不在

graph is everywhere

Graph Intelligence helps

graph intelligence helps

It's the right time now!

Gartner預測篓冲,graph技術(shù)在數(shù)據(jù)和分析創(chuàng)新中的使用率從2021年的10%,到2025年會增長到80%嗤攻。我們目前正在經(jīng)歷從early adoption到early mainstream的穿越大峽谷的時期,既不太早也不太晚妇菱,時間剛剛好暴区。


now is the right time

What

如何建模圖

A graph ?? is an ordered pair ?? = (??, ??) comprising:

  • ??, a set of vertices (or nodes)
  • ???{(??,??)|??,??∈??}, a set of edges (or links), which are pairs of nodes
    Example:


    directed graph

Different Types of Graph

  • Are edges directed?
    Directed Graph vs. Undirected Graph
  • Are there multiple types of nodes or multiple types of edges?
    Homogeneous Graph vs Heterogeneous Graph

如何表示圖

不同的表示方式會指向不同的計算模式闯团。


graph的兩種計算模式

如何計算圖

如下圖所示,圖的計算步驟如下:

  • 遍歷圖中的所有結(jié)點颜启,或者采樣圖中的一些結(jié)點偷俭。每次選擇其中一個結(jié)點,稱為目標結(jié)點(target node);
  • 一個??-層的GNN至少需要聚合目標結(jié)點的L-跳領域的信息缰盏。因此涌萤,我們需要以圍繞目標結(jié)點構(gòu)造一個L-跳的ego network淹遵。圖中是一個2-跳ego network的例子,其中綠色結(jié)點是第1跳负溪,藍色結(jié)點是第2跳透揣;
  • 計算并更新ego-network里的每個結(jié)點的embedding。embeddings會使用到圖的結(jié)構(gòu)信息和結(jié)點與邊的特征川抡。


    ego network

那么辐真,這些embedding是如何計算和更新的呢?主要是使用Message Passing的計算方法崖堤。Message Passing有一些計算范式如GAS(Gather-ApplyEdge-Scatter), SAGA(Scatter-ApplyEdge-Gather-ApplyVertex)等侍咱。我們這里介紹歸納得比較全面的SAGA計算范式。假設需要計算和更新下圖中的\vec{x_1}:

graph

  • Scatter
    Propagate message from source vertex to edge.

    scatter

  • ApplyEdge
    Transform message along each edge.

    ApplyEdge

  • Gather
    Gather transformed message to the destination vertex.

    Gather

  • ApplyVertex
    Transform the gathered output to get updated vertex.

    ApplyVertex

公式如下:


Formula

分析一下密幔,會發(fā)現(xiàn)楔脯,SAGA模式中ApplyEdge和ApplyVertex是傳統(tǒng)deep learning中的NN(Neural Network)操作,我們可以復用胯甩;而Scatter和Gather是GNN新引入的操作昧廷。即,Graph Computing = Graph Ops + NN Ops偎箫。

operation characteristics

不同的圖數(shù)據(jù)集規(guī)模

  • One big graph


    one big graph
  • Many small graphs


    many small graphs

不同的圖任務

  • Node-level prediction
    預測圖中結(jié)點的類別或性質(zhì)


    Node-level Prediction
  • Edge-level prediction
    預測圖中兩個結(jié)點是否存在邊眉枕,以及邊的類別或性質(zhì)


    Edge-level Prediction
  • Graph-level prediction
    預測整圖或子圖的類別或性質(zhì)


    Graph-level Prediction

How

Workflow

general workflow

以fraud detection為例:

  • Tabformer數(shù)據(jù)集


    Tabformer data
  • workflow


    fraud detection workflow

軟件棧

software stack
  • 計算平面


    compute plane
  • 數(shù)據(jù)平面


    data plane

SW Challenges

Graph Sampler

For many small graphs datasets, full batch training works most time. Full batch training means we can do training on whole graph;
When it comes to one large graph datasets, in many real scenarios, we meet Neighbor Explosion problem;

Neighbor Explosion:


neighboe explosion

Graph sampler comes to rescue. Only sample a fraction of target nodes, and furthermore, for each target node, we sample a sub-graph of its ego-network for training. This is called mini-batch training.
Graph sampling is triggered for each data loading. And the hops of the sampled graph equals the GNN layer number ??. Which means graph sampler in data loader is important in GNN training.

sampling take a lot of time

Challenge: How to optimize sampler both as standalone and in training pipe?

When graph comes to huge(billions of nodes, tens of billions of edges), we meet new at-scale challenges:

  • How to store the huge graph across node? -> graph partition
  • How to build a training system w/ not only distributed model computing but also distributed graph store and sampling?
    • How to cut the graph while minimize cross partition connections?


      graph partition

A possible GNN distributed training architecture:


GNN distributed training

Scatter-Gather

  • Fuse adjacent graphs ops

One common fuse pattern for GCN & GraphSAGE:


Aggregate

Challenge:
How to fuse more GNN patterns on different ApplyEdge and ApplyVertex,automatically?

  • How to implement fused Aggregate


    Aggregate implementations

Challenge:
- Different graph data structures lead to different implementations in same logic operations;
- Different graph characteristics favors different data structures;(like low-degree graphs favor COO, high-degree graphs favor CSR)
- **How to find the applicable zone for each and hide such complexity to data scientists? **

More

  • Inference challenge
    • GNN inference needs full batch inference, how to make it efficient?
    • Distributed inference for big graph?
    • Vector quantization for node and edge features?
    • GNN distilled to MLP?
  • SW-HW co-design challenge
    • How to relief irregular memory access in scatter-gather?
    • Do we need some data flow engine for acceleration?

Finishing words

"There is plenty of room at the top" 對技術(shù)人員很重要流纹。但為避免入寶山而空返漱凝,我們更需要建立起技術(shù)架構(gòu)茸炒,這就像是地圖一樣阵苇,只有按圖索驥才能更好地探索和利用好top里的plenty of room绅项。


There is plenty of room at the top

References

  1. Graph + AI: What’s Next? Progress in Democratizing Graph for All
  2. Recent Advances in Efficient and Scalable Graph Neural Networks
  3. Crossing the Chasm – Technology adoption lifecycle
  4. Understanding and Bridging the Gaps in Current GNN Performance Optimizations
  5. Automatic Generation of High-Performance Inference Kernels for Graph Neural Networks on Multi-Core Systems
  6. Understanding GNN Computational Graph: A Coordinated Computation, IO, And Memory Perspective
  7. Graphiler: A Compiler For Graph Neural Networks
  8. Scatter-Add in Data Parallel Architectures
  9. fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU
  10. VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization
  11. NeuGraph: Parallel Deep Neural Network Computation on Large Graphs
  12. Completing a member knowledge graph with Graph Neural Networks
  13. PinnerFormer: Sequence Modeling for User Representation at Pinterest
  14. Gartner and Graph Analytics
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末掀亥,一起剝皮案震驚了整個濱河市妥色,隨后出現(xiàn)的幾起案子垛膝,更是在濱河造成了極大的恐慌吼拥,老刑警劉巖凿可,帶你破解...
    沈念sama閱讀 218,682評論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件枯跑,死亡現(xiàn)場離奇詭異敛助,居然都是意外死亡纳击,警方通過查閱死者的電腦和手機焕数,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,277評論 3 395
  • 文/潘曉璐 我一進店門堡赔,熙熙樓的掌柜王于貴愁眉苦臉地迎上來善已,“玉大人,你說我怎么就攤上這事纵东≠饲颍” “怎么了衰絮?”我有些...
    開封第一講書人閱讀 165,083評論 0 355
  • 文/不壞的土叔 我叫張陵猫牡,是天一觀的道長淌友。 經(jīng)常有香客問我震庭,道長你雌,這世上最難降的妖魔是什么婿崭? 我笑而不...
    開封第一講書人閱讀 58,763評論 1 295
  • 正文 為了忘掉前任氓栈,我火速辦了婚禮,結(jié)果婚禮上幸海,老公的妹妹穿的比我還像新娘物独。我一直安慰自己氯葬,他們只是感情好,可當我...
    茶點故事閱讀 67,785評論 6 392
  • 文/花漫 我一把揭開白布秽澳。 她就那樣靜靜地躺著担神,像睡著了一般妄讯。 火紅的嫁衣襯著肌膚如雪亥贸。 梳的紋絲不亂的頭發(fā)上炕置,一...
    開封第一講書人閱讀 51,624評論 1 305
  • 那天男韧,我揣著相機與錄音此虑,去河邊找鬼寡壮。 笑死况既,一個胖子當著我的面吹牛棒仍,可吹牛的內(nèi)容都是我干的臭胜。 我是一名探鬼主播,決...
    沈念sama閱讀 40,358評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼憨颠!你這毒婦竟也來了爽彤?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,261評論 0 276
  • 序言:老撾萬榮一對情侶失蹤往核,失蹤者是張志新(化名)和其女友劉穎聂儒,沒想到半個月后薄货,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體谅猾,經(jīng)...
    沈念sama閱讀 45,722評論 1 315
  • 正文 獨居荒郊野嶺守林人離奇死亡税娜,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,900評論 3 336
  • 正文 我和宋清朗相戀三年藏研,在試婚紗的時候發(fā)現(xiàn)自己被綠了蠢挡。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,030評論 1 350
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖勤家,靈堂內(nèi)的尸體忽然破棺而出热幔,到底是詐尸還是另有隱情讼庇,我是刑警寧澤蠕啄,帶...
    沈念sama閱讀 35,737評論 5 346
  • 正文 年R本政府宣布,位于F島的核電站舶沛,受9級特大地震影響如庭,放射性物質(zhì)發(fā)生泄漏坪它。R本人自食惡果不足惜帝牡,卻給世界環(huán)境...
    茶點故事閱讀 41,360評論 3 330
  • 文/蒙蒙 一开瞭、第九天 我趴在偏房一處隱蔽的房頂上張望嗤详。 院中可真熱鬧葱色,春花似錦苍狰、人聲如沸烘绽。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,941評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至糯笙,卻和暖如春给涕,著一層夾襖步出監(jiān)牢的瞬間够庙,已是汗流浹背耘眨。 一陣腳步聲響...
    開封第一講書人閱讀 33,057評論 1 270
  • 我被黑心中介騙來泰國打工剔难, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留非迹,地道東北人纯趋。 一個月前我還...
    沈念sama閱讀 48,237評論 3 371
  • 正文 我出身青樓唇兑,卻偏偏與公主長得像扎附,于是被迫代替她去往敵國和親留夜。 傳聞我的和親對象是個殘疾皇子碍粥,可洞房花燭夜當晚...
    茶點故事閱讀 44,976評論 2 355

推薦閱讀更多精彩內(nèi)容