圖智能 101

Why

Graph無處不在

graph is everywhere

Graph Intelligence helps

graph intelligence helps

It's the right time now!

Gartner預測篓冲，graph技術(shù)在數(shù)據(jù)和分析創(chuàng)新中的使用率從2021年的10%，到2025年會增長到80%嗤攻。我們目前正在經(jīng)歷從early adoption到early mainstream的穿越大峽谷的時期，既不太早也不太晚妇菱，時間剛剛好暴区。

now is the right time

What

如何建模圖

A graph ?? is an ordered pair ?? = (??, ??) comprising:

??, a set of vertices (or nodes)

???{(??,??)|??,??∈??}, a set of edges (or links), which are pairs of nodes
Example:

directed graph

Different Types of Graph

Are edges directed?
Directed Graph vs. Undirected Graph
Are there multiple types of nodes or multiple types of edges?
Homogeneous Graph vs Heterogeneous Graph

如何表示圖

不同的表示方式會指向不同的計算模式闯团。

graph的兩種計算模式

如何計算圖

如下圖所示，圖的計算步驟如下：

遍歷圖中的所有結(jié)點颜启，或者采樣圖中的一些結(jié)點偷俭。每次選擇其中一個結(jié)點，稱為目標結(jié)點(target node);
一個??-層的GNN至少需要聚合目標結(jié)點的L-跳領域的信息缰盏。因此涌萤，我們需要以圍繞目標結(jié)點構(gòu)造一個L-跳的ego network淹遵。圖中是一個2-跳ego network的例子，其中綠色結(jié)點是第1跳负溪，藍色結(jié)點是第2跳透揣；
計算并更新ego-network里的每個結(jié)點的embedding。embeddings會使用到圖的結(jié)構(gòu)信息和結(jié)點與邊的特征川抡。

ego network

那么辐真，這些embedding是如何計算和更新的呢？主要是使用Message Passing的計算方法崖堤。Message Passing有一些計算范式如GAS(Gather-ApplyEdge-Scatter), SAGA(Scatter-ApplyEdge-Gather-ApplyVertex)等侍咱。我們這里介紹歸納得比較全面的SAGA計算范式。假設需要計算和更新下圖中的 $\vec{x_1}$ :

graph

Scatter
Propagate message from source vertex to edge.

scatter
ApplyEdge
Transform message along each edge.

ApplyEdge
Gather
Gather transformed message to the destination vertex.

Gather
ApplyVertex
Transform the gathered output to get updated vertex.

ApplyVertex

公式如下：

Formula

分析一下密幔，會發(fā)現(xiàn)楔脯，SAGA模式中ApplyEdge和ApplyVertex是傳統(tǒng)deep learning中的NN(Neural Network)操作，我們可以復用胯甩；而Scatter和Gather是GNN新引入的操作昧廷。即，Graph Computing = Graph Ops + NN Ops偎箫。

operation characteristics

不同的圖數(shù)據(jù)集規(guī)模

One big graph

one big graph
Many small graphs

many small graphs

不同的圖任務

Node-level prediction
預測圖中結(jié)點的類別或性質(zhì)

Node-level Prediction
Edge-level prediction
預測圖中兩個結(jié)點是否存在邊眉枕，以及邊的類別或性質(zhì)

Edge-level Prediction
Graph-level prediction
預測整圖或子圖的類別或性質(zhì)

Graph-level Prediction

How

Workflow

general workflow

以fraud detection為例：

Tabformer數(shù)據(jù)集

Tabformer data

workflow

fraud detection workflow

軟件棧

software stack

計算平面

compute plane
數(shù)據(jù)平面

data plane

SW Challenges

Graph Sampler

For many small graphs datasets, full batch training works most time. Full batch training means we can do training on whole graph;
When it comes to one large graph datasets, in many real scenarios, we meet Neighbor Explosion problem;

Neighbor Explosion:

neighboe explosion

Graph sampler comes to rescue. Only sample a fraction of target nodes, and furthermore, for each target node, we sample a sub-graph of its ego-network for training. This is called mini-batch training.
Graph sampling is triggered for each data loading. And the hops of the sampled graph equals the GNN layer number ??. Which means graph sampler in data loader is important in GNN training.

sampling take a lot of time

Challenge: How to optimize sampler both as standalone and in training pipe?

When graph comes to huge(billions of nodes, tens of billions of edges), we meet new at-scale challenges:

How to store the huge graph across node? -> graph partition
How to build a training system w/ not only distributed model computing but also distributed graph store and sampling?
- How to cut the graph while minimize cross partition connections?
  
  graph partition

A possible GNN distributed training architecture:

GNN distributed training

Scatter-Gather

Fuse adjacent graphs ops

One common fuse pattern for GCN & GraphSAGE：

Aggregate

Challenge:
How to fuse more GNN patterns on different ApplyEdge and ApplyVertex,automatically?

How to implement fused Aggregate

Aggregate implementations

Challenge:
- Different graph data structures lead to different implementations in same logic operations;
- Different graph characteristics favors different data structures;(like low-degree graphs favor COO, high-degree graphs favor CSR)
- **How to find the applicable zone for each and hide such complexity to data scientists? **

Inference challenge
- GNN inference needs full batch inference, how to make it efficient?
- Distributed inference for big graph?
- Vector quantization for node and edge features?
- GNN distilled to MLP?
SW-HW co-design challenge
- How to relief irregular memory access in scatter-gather?
- Do we need some data flow engine for acceleration?
…

Finishing words

"There is plenty of room at the top" 對技術(shù)人員很重要流纹。但為避免入寶山而空返漱凝，我們更需要建立起技術(shù)架構(gòu)茸炒，這就像是地圖一樣阵苇，只有按圖索驥才能更好地探索和利用好top里的plenty of room绅项。

There is plenty of room at the top

References

Graph + AI: What’s Next? Progress in Democratizing Graph for All
Recent Advances in Efficient and Scalable Graph Neural Networks
Crossing the Chasm – Technology adoption lifecycle
Understanding and Bridging the Gaps in Current GNN Performance Optimizations
Automatic Generation of High-Performance Inference Kernels for Graph Neural Networks on Multi-Core Systems
Understanding GNN Computational Graph: A Coordinated Computation, IO, And Memory Perspective
Graphiler: A Compiler For Graph Neural Networks
Scatter-Add in Data Parallel Architectures
fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU
VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization
NeuGraph: Parallel Deep Neural Network Computation on Large Graphs
Completing a member knowledge graph with Graph Neural Networks
PinnerFormer: Sequence Modeling for User Representation at Pinterest
Gartner and Graph Analytics

最后編輯于：2022.11.06 08:48:42

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末掀亥，一起剝皮案震驚了整個濱河市妥色，隨后出現(xiàn)的幾起案子垛膝，更是在濱河造成了極大的恐慌吼拥，老刑警劉巖凿可，帶你破解...
沈念sama閱讀 218,682評論 6贊 507
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件枯跑，死亡現(xiàn)場離奇詭異敛助，居然都是意外死亡纳击，警方通過查閱死者的電腦和手機焕数，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,277評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門堡赔，熙熙樓的掌柜王于貴愁眉苦臉地迎上來善已，“玉大人，你說我怎么就攤上這事纵东≠饲颍” “怎么了衰絮？”我有些...
開封第一講書人閱讀 165,083評論 0贊 355
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵猫牡，是天一觀的道長淌友。經(jīng)常有香客問我震庭，道長你雌，這世上最難降的妖魔是什么婿崭？我笑而不...
開封第一講書人閱讀 58,763評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任氓栈，我火速辦了婚禮，結(jié)果婚禮上幸海，老公的妹妹穿的比我還像新娘物独。我一直安慰自己氯葬，他們只是感情好，可當我...
茶點故事閱讀 67,785評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布秽澳。她就那樣靜靜地躺著担神，像睡著了一般妄讯。火紅的嫁衣襯著肌膚如雪亥贸。梳的紋絲不亂的頭發(fā)上炕置，一...
開封第一講書人閱讀 51,624評論 1贊 305
城市分裂傳說
那天男韧，我揣著相機與錄音此虑，去河邊找鬼寡壮。笑死况既，一個胖子當著我的面吹牛棒仍，可吹牛的內(nèi)容都是我干的臭胜。我是一名探鬼主播，決...
沈念sama閱讀 40,358評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼憨颠！你這毒婦竟也來了爽彤？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 39,261評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤往核，失蹤者是張志新（化名）和其女友劉穎聂儒，沒想到半個月后薄货，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體谅猾，經(jīng)...
沈念sama閱讀 45,722評論 1贊 315
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡税娜，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,900評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年藏研，在試婚紗的時候發(fā)現(xiàn)自己被綠了蠢挡。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 40,030評論 1贊 350
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖勤家，靈堂內(nèi)的尸體忽然破棺而出热幔，到底是詐尸還是另有隱情讼庇，我是刑警寧澤蠕啄，帶...
沈念sama閱讀 35,737評論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站舶沛，受9級特大地震影響如庭，放射性物質(zhì)發(fā)生泄漏坪它。R本人自食惡果不足惜帝牡，卻給世界環(huán)境...
茶點故事閱讀 41,360評論 3贊 330
男人毒藥：我在死后第九天來索命
文/蒙蒙一开瞭、第九天我趴在偏房一處隱蔽的房頂上張望嗤详。院中可真熱鬧葱色，春花似錦苍狰、人聲如沸烘绽。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,941評論 0贊 22
一樁弒父案玷禽，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至糯笙，卻和暖如春给涕，著一層夾襖步出監(jiān)牢的瞬間够庙，已是汗流浹背耘眨。一陣腳步聲響...
開封第一講書人閱讀 33,057評論 1贊 270
情欲美人皮
我被黑心中介騙來泰國打工剔难，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留非迹，地道東北人纯趋。一個月前我還...
沈念sama閱讀 48,237評論 3贊 371
代替公主和親
正文我出身青樓唇兑，卻偏偏與公主長得像扎附，于是被迫代替她去往敵國和親留夜。傳聞我的和親對象是個殘疾皇子碍粥，可洞房花燭夜當晚...
茶點故事閱讀 44,976評論 2贊 355

圖智能 101

圖智能 101

Why

Graph無處不在

Graph Intelligence helps

It's the right time now!

What

如何建模圖

Different Types of Graph

如何表示圖

如何計算圖

不同的圖數(shù)據(jù)集規(guī)模

不同的圖任務

How

Workflow

軟件棧

SW Challenges

Graph Sampler

Scatter-Gather

More

Finishing words

References

推薦閱讀更多精彩內(nèi)容