2022.01.27 補譯構筑優(yōu)化Part1
Dice Theory and Branch Theory
From YGOrganization
by Inexorably
By making approximations on a system, it is far simpler to see general rules and logic lines, and their effects, than pre-approximations. To this end, let us look at Yu-Gi-Oh! after making some approximations in this manner.
(作者的目的應該是完成一個可編程的喇完、環(huán)境卡組間對局勝率的模擬算法
目前的進展是可以計算3回合內(nèi)定勝負的卡組間的對局勝率
本文為Part1)
“Dice Theory”
What does it mean to win? To win, it simply means that the combination of your decision making, deck building, and luck were greater than the decision making, deck building, and luck of your opponent. In a way, a game can be described or approximated as the two players each pulling out their [heavily modded, sleeved, 1st edition hobby league ready-microwaved] dice, and rolling them. Whoever rolls higher would be the winner.
勝負歉甚,意味著什么?簡而言之秕磷,就是敵我雙方在決策沉删、構筑、運氣這三個指標上的綜合對比删壮。從某個角度航缀,一場決斗可以被 描述 或 近似地看 成兩個玩家各自投出一枚骰子,點數(shù)大的一方就是贏家与纽。
In this comparison, the various numbers on the dice would be the results of a formula accounting for the three qualities stated above (decision making, deck building, and luck). It should be noted that your opening hand is both a subset of deck building and luck, and thus the variance in your opening hands is shown in the dice’s face values. A dice could have various amounts of sides (just like how decks have various amounts of standard plays – example, +1 Fire Fist vs Dragon Rulers). In the following examples, I’m going to set the amount of sides to 10 for all, because it makes it easier to illustrate my point.
在這個對比中侣签,骰子上的點數(shù)就是上述三個指標(謀筑命)經(jīng)過一個合理算法演算整合后的結果塘装。值得一提的是急迂,你的起手也是構筑與運氣的一個子集,因此起手的差異同樣會從骰面的點數(shù) 上顯現(xiàn)出來蹦肴。不同的骰子可以擁有不同數(shù)量的骰面(形狀不規(guī)則)僚碎,就像不同卡組可以有不同數(shù)量的標準展開(例如+1炎星 VS 征龍)。在之后的例子中阴幌,我將設定所有的骰面數(shù)量為10勺阐,以方便闡述我的觀點卷中。
注釋1
骰面函數(shù) P = F(決策, 構筑, 運氣)
我把骰面理解為初盤行動力,具體見下文
決策是指玩家的控制力渊抽、設計戰(zhàn)術路線的能力蟆豫,即相同的手牌在不同玩家手中是不同的
構筑決定不同起手的概率分布
運氣是隨機變量的賦值
下文主要是構筑的討論
假設玩家決策為局部理性(Rational Actor Models),即對每個move的資源變化估值準確
Player one has a dice with not only holographic, astral pack edges but also an 8 on each face – so 10 faces with 8 on each. If he rolls any amount of times, his average value will be 8. This kind of dice, consistent and having decent values, could be alikened to +1 fire fist.
玩家1有一個全息星包棱邊并且10面都是點數(shù)8(8888888888)的骰子懒闷。
無論投多少次十减,平均值都將是8。
+1炎星就很像這種穩(wěn)定可靠且數(shù)值體面的骰子愤估。
Player two is going for maximum yolo(You Only Live Once!). He just got back from 360 n0sc0ping some pleb on COD, and is using a minimum scope maximum skill dice, on which 6 sides are 9, and 4 sides are 1. The average value of this dice’s faces are 5.8.
玩家2則是莽夫流(人生能有幾回搏帮辟?此時不搏更待何時?)
他剛在使命召喚盲狙(no scoping)了些平民回來玩焰,用的是一個視界小殺傷大的(狙由驹?)6面9點、4面1點(9999991111)的骰子昔园。
(盲狙是準了致命(9)蔓榄,不準則一點傷害沒有(1);而上面全8骰子則像散射的AK47蒿赢,準不準總能給點傷害(8)润樱。這個比喻還是挺生動的)
骰面的平均值是5.8。
So, between player one and player two, who is more likely to win? The simple answer which new players may mistakenly pick is player one, because on average his roll is an 8, versus player two, who’s rolling not only his swagged out tricycle but also a 5.8. However, at no point does player two’s dice land on 5.8 – it will always land on either a 9, or a 1. If it lands on a 9 player two wins, and if it lands on a 1 he loses. So in other words, player two has the advantage with a 60% win rate (as games approach infinity).
所以羡棵,這倆人中壹若,誰更可能獲勝?
對于新手來說皂冰,玩家1(8888888888)是顯而易見的答案店展, 因為平均點數(shù)8,對比玩家2(9999991111)的5.8秃流,有種大炮干鳥槍的錯覺赂蕴。
但其實玩家2的骰子任何時候都不會落在點數(shù)5.8上,而是非9即1舶胀。落在9點贏了概说,落在1點投了(9>8>1)。
換句話說嚣伐,玩家2勝率在60%(樣本趨向無窮)糖赔,更有優(yōu)勢。
Let’s change player one out for player three. Player three is using a consistent dice from a past meta, so it has 6 on each of its 10 sides. Player three will have the same win – loss rate as player one versus player two’s dice, but will have a 0% win rate against player one. So, let’s look at this:
讓我們把玩家1換成玩家3轩端。玩家3用的是一個10面6點(6666666666)的穩(wěn)定的骰子代表舊環(huán)境的主流卡組放典。
玩家3將有相同的勝負率對抗玩家2的骰子,但永遠無法戰(zhàn)勝玩家1。如下所示:
(1). Player one(8888888888) vs Player two(9999991111): 40-60
(2). Player one(8888888888) vs Player three(6666666666): 100-0
(3). Player three(6666666666) vs Player two(9999991111): 40-60
Note that player one and player three seem to be equal based on our win rates in (1) and (3).
值得注意的是玩家1和玩家3在case(1)和case(3)中的勝率比相等
However, they are obviously not equal when we look at (2), with a 100% win rate for player one. So, what does this mean?
然而奋构,在case(2)中卻明顯不同壳影,玩家1(對玩家3)有著100%的勝率。這意味著什么呢弥臼?
注釋2
Player1(8888888888):高上限宴咧,高穩(wěn)定
Player2(9999991111):高上限,低穩(wěn)定
Player3(6666666666):低上限径缅,高穩(wěn)定低穩(wěn)定→【Variance Based悠汽,波動基準】
高穩(wěn)定→【Consistency Based究飞,穩(wěn)定基準】
(以上術語將在后文中出現(xiàn))一個顯見的結論
Player2 vs Player3易在開局(第一次投擲)出現(xiàn)明顯優(yōu)劣勢(即autowin(差值+3)和autoloss(差值-5))日麸,而快速分出勝負
可以通過起手概率分布茂嗓,確定勝率比
Traditionally, a difference between OCG and TCG deck building has been noted: OCG seems to run builds with more variance between hands (sacrificing consistency for higher power). TCG traditionally (not saying it is forever to be static) runs builds with lower variance between hands, but at the cost of some of the higher power (and inconsistent) hands. This does not only extend to variance in the consistency sense – ‘win-more’ has become a buzzword in the TCG, but with good reason? When looking at the game in the approximated dice form above, we see consistency is not the deciding factor – nor is a card like Dark Hole. In the past, Dark Hole was a very dualistic card – good for baiting your opponent into overcommitting and good when you were in a losing position, but suboptimal when you have an established field. The argument here is generally ‘if I’m winning its fine to have a suboptimal card, as if I start to lose the card becomes optimal’. However, is this correct?
傳統(tǒng)意義上隔缀,OCG與TCG的卡組構筑上有著不同的傾向:OCG傾向于手牌變數(shù)大的構筑(犧牲穩(wěn)定性而追求更高Power手牌的可能性)追驴;TCG則往往傾向于手牌變數(shù)小的構筑(并不是說將一成不變)必盖,但犧牲了某些更高Power起手的可能性(及不確定性朵栖,即低Power起手的可能性)踏志。
這不僅僅局限于穩(wěn)定性上的變數(shù) —— 'win-more'(優(yōu)勢卡丽猬,錦上添花)已經(jīng)成為TCG的一個術語宿饱,(在這個Dice Theory下)又有什么好的解釋嗎?
(起手行動力)穩(wěn)定的目標是勝率脚祟,而行動力上限也是構筑上追求勝率的一個重要因素
當我們通過上述dice模型的視角重新審視這個游戲谬以,我們發(fā)現(xiàn)穩(wěn)定性并不是(勝負的)決定性因素,類似Dark Hole(等在環(huán)境中平均價值高)的卡片也不是由桌。過去为黎,Dark Hole是一張(在不同情況下的價值)兩極分化的卡片:引誘對手過度地場面投資和當你處于場面劣勢時,它能體現(xiàn)出高價值(optimal)行您;但當你已經(jīng)建立起場面時铭乾,卻低于平均值(subpar)。
這里一般出現(xiàn)的爭議是“當我處于優(yōu)勢則無所謂多一張低費卡娃循,相反處于劣勢則多了張高費卡”炕檩。
然而,這是正確的嗎捌斧?(這樣的描述足夠精確嗎笛质?)
Let’s say that player two adds a Dark Hole, or a similar card, to his deck. What happens here is the values it at suboptimal times and optimal times become averaged over all faces, but are applied differently to different faces of the dice? For example, two of the faces with 1’s on them may become 4s, while two of the faces with 9s on them (when he is in a winning position / established field) may become 7s (he lost a combo piece for dark hole). His dice now has two faces with 1s, two faces with 4s, four faces with 9s, and two faces with 7s.
讓我們假設玩家2多下了張Dark Hole或者類似的卡片去卡組。
這張卡價值的高低將分攤到所有骰面(10面)來決定捞蚂。對于不同的骰面妇押,估值也不同嗎?
例如洞难,(投入這張卡后)兩個1點的骰面將變成4點舆吮,然而兩個9點的骰面(勝勢/場面優(yōu)勢)將變成7點(上手均卡Dark Hole而損失了combo組件)
他的骰面現(xiàn)在是(1144779999)
(可以同樣方式分析強奪、終焉之始等)
His average dice roll is now a 6, up from 5.8 – a more consistent deck. Let us examine his win rates.
他的平均點數(shù)從5.8上升到6队贱,形成一個更穩(wěn)定的構筑色冀。讓我們來校驗他的勝率。
Player two(1144779999) vs Player one(8888888888): 40-60
Player two(1144779999) vs Player three(6666666666): 60-40
While his win rate vs player three is still 60%, his win rate versus player one has dropped 20% (from 60% to 40%) because he sacrificed power for consistency. So power is a broad word, how are we defining it? Power is generally used to describe how strong a field can be thrown up (example, Abyss-Teus + Aqua Spirit used to be a common power play), as well as how much damage (both life point and card advantage) something can do (Judgment Dragon is powerful). However, speed / tempo is built into this definition, and often overlooked. There is a winning threshold in this game, where if you are above your opponent by a certain amount (dependent on point in the game and match up) you win.
盡管他(玩家2)對玩家3的勝率仍有60%柱嫌,他對玩家1的勝率下降了20%(60%→40%)因為他犧牲Power上限換取了穩(wěn)定性
所以Power是一個寬泛的詞锋恬,如何(更精確地)定義它?
Power一般用來描述(回合)可以鋪出的場面有多強(例如鄧氏+水之精靈=鎖鏈+R7曾是一個經(jīng)典的Power Play)编丘,同時可以做到多少優(yōu)勢對換(血量和資源上与学,削減對手)(例如裁龍,就很Powerful)
然而嘉抓,往往被忽視的 牌速/節(jié)奏索守,也應當融入這個定義。這個游戲中一個獲勝手段就是抑片,(在節(jié)奏上)領先于對手一定量卵佛。
戰(zhàn)術節(jié)奏優(yōu)勢是指,策略上有一定的提前量敞斋,能一定程度上預測對手行為并針對性地調(diào)整資源結構截汪,對大部分場面有充足的資源-行動力來修正,維持提前量/最終將場面狀態(tài)控制在甜區(qū)內(nèi)植捎。例如衙解,HAT環(huán)境,Sahabi的永火策略焰枢,對環(huán)境其他策略的牌速壓制蚓峦。
注釋3
以我的理解
這個模型,實際上是擬合每回合【行動力】的曲線
【行動力】是節(jié)奏變化的能力上限(節(jié)奏的增減济锄,不同節(jié)奏間的取舍/轉換)
【節(jié)奏】(血量節(jié)奏枫匾,資源節(jié)奏(生產(chǎn)效率,對換效率))(戰(zhàn)術節(jié)奏)
結合上文拟淮,所謂“上限”是指【行動力】所支撐的節(jié)奏變化幅度干茉,而“穩(wěn)定”是指開局【行動力】離散程度(體現(xiàn)于統(tǒng)計學標準差,理想樣本容量為無限)
行動力主要由兩部分組成很泊,進攻端和防御端角虫,這兩部分有不同的描述方法
進攻端行動力
體現(xiàn)在己方節(jié)奏的提高上防御端行動力
體現(xiàn)在對方節(jié)奏的損失上(對方行動力增益為negative,換算為己方則是positive)
這部分投資的收益的回收是reactive的委造,對換的主動權在對手戳鹅,所以是一種預投資,變被動為主動需要【引導對換】防御端行動力較進攻端變數(shù)大昏兆,需要基于已知信息進行估值枫虏,實際的對換發(fā)生后定價
對方防御端行動力則是己方進攻端行動力估值的重點
行動力=(進攻端行動力 - 對方·防御端行動力)+ 防御端行動力 = 系統(tǒng)行動力 + 均卡行動力
由于功能單位的特性和上手率,我們一般將資源劃分為系統(tǒng)和均卡,均適用于以上框架
系統(tǒng)功能組件的上手一般較均卡穩(wěn)定隶债,所以往往優(yōu)先關注系統(tǒng)行動力上限腾它,也最易分析
在系統(tǒng)組件積累>消耗(往往生產(chǎn)效率>對換效率)時,系統(tǒng)行動力上限逐步提高至其極限值(Limit)死讹,從玩家2的角度即骰面點數(shù)1→9
系統(tǒng)行動力主要關注系統(tǒng)資源和資源激活機制
起手質量(資源存量和激活機制上手與否)決定初始系統(tǒng)行動力瞒滴,穩(wěn)定性越高,即系統(tǒng)功能組件易湊齊的系統(tǒng)赞警,越容易在前期達到系統(tǒng)行動力上限的極限
結合系統(tǒng)節(jié)奏的定義妓忍,901機殼的系統(tǒng)節(jié)奏0 - 節(jié)奏3,可以設定為骰面點數(shù)0 - 3
Big Deck與Small Deck不是坑量的本質界限
若進攻端行動力難于提高(+1炎星)/存在低谷(機殼進入節(jié)奏3之前)愧旦,則期待防御端的彌補世剖,最終目標都是手牌資源效率的最大化
優(yōu)化資源結構也是提高整體行動力的手段,系統(tǒng)內(nèi)有將進攻端轉化為防御端的手段笤虫,例如BA火湖/機殼再星搁廓、神龍騎士,此外則依靠構筑精度耕皮、屯牌策略
而Combo Deck由于功能單位大境蜕,需要充足的系統(tǒng)資源,則構筑上會削減防御端凌停,即坑量少
當資源無法支持行動力達到預設上限時粱年,可以資源積累為核心目標,即將行動力投資在(資源節(jié)奏中的)資源生產(chǎn)效率
Boundary: 當行動力達到預設上限罚拟,若資源進一步積累則是資源溢出台诗,應將重心放在節(jié)奏推進上,即血量節(jié)奏(對換LP)和(資源節(jié)奏中的)資源對換效率(削減對手資源以降低對手行動力)
(下文是以Dice Model來對環(huán)境卡組進行分類和對局解釋)
To better implement describe this, let’s give each player three hearts. A player loses the difference in rolls in hearts, and players keep rolling until one dies. What does this do? If one player rolls a 9 and the other a 6, the player whom rolled a 6 will lose right on the first roll. However, if one player rolls a 9 and the other an 8, and then the next roll is 4 vs 8, the player rolling the 8s will have one: as the variance based player failed to reach the winning threshold in the allotted time (a better way to describe this would be to have players gaining hearts per turn given by a non-linear function dependent upon the number of turns).
為了更有效地(形象地)描述這個(Dice Theory的運作)赐俗,假設每個玩家擁有3顆心拉队。一個玩家失去擲骰子點數(shù)差的心數(shù),這樣兩個玩家持續(xù)擲直到一方死亡(心數(shù)歸零)阻逮。
這將發(fā)生什么粱快?如果一個玩家(玩家2,1144779999)擲出9點而另一個(玩家3叔扼,6666666666)擲出6點事哭,擲出6點的玩家會直接輸在這第一輪投擲。然而瓜富,如果一個玩家(玩家2)擲出9點而另一個(玩家1鳍咱,8888888888)擲出8點,下一輪中卻是4點對抗8點与柑,則第一輪中擲8的玩家獲勝谤辜。
波動基準的玩家(玩家2蓄坏,上限高穩(wěn)定性差;相對的丑念,玩家1/3的dice屬于穩(wěn)定基準)沒能在預設的時間內(nèi)達到獲勝門檻(戰(zhàn)術中的win condition)
(一個更好的描述是涡戳,讓玩家們每回合獲得一定的心數(shù),數(shù)量由一個以當前回合數(shù)為自變量x的非線性函數(shù)F(x)確定)
假設都是開局通過起手定勝負的話渠欺,3顆心可以類比BO3賽制的3個單局
但作者顯然是想類比回合,那這個函數(shù)F需要有一定的連續(xù)性(但不是連續(xù)函數(shù)continuous function)椎眯,因為T1的強牌型不太可能在T3突變成弱牌型挠将,概率分布如果簡單沿用起手的分布,跟現(xiàn)實也會有較大偏差
At this point we realize that if we are the variance player we have to just forgo the lost hands (because if we roll a 1 or a 4 in Yu-Gi-Oh! it’s not likely to turn into a 9) and find the winning threshold.
至此我們意識到作為波動基準的玩家基本可以放棄(優(yōu)化)必輸牌型(1或4點的牌型在游戲王中不太可能(通過構筑)變成9點的強牌型)编整,關鍵在于確定(在環(huán)境中)獲勝所需的閾值(最小值)舔稀。
Once we have done this, we transition into maximizing the likelihood we reach that threshold (as there is no difference between rolling an 11 against an 8 or a 17 against an 8), and then maximize it.
一旦我們作出這個判斷,我們就(從提高平均點數(shù))轉換思維到提高我們達到獲勝閾值的可能性(11打8掌测,和17打8内贮,(以獲勝為目的)是沒有區(qū)別的),并最大化這個概率汞斧。
此處應該是本文的核心觀點了夜郁,文章從一個簡單的概率模型展開了win-more這個概念的內(nèi)涵,并指明優(yōu)化構筑的一個方向是粘勒,犧牲已經(jīng)超過閾值的強牌型的上限竞端,舍棄或使更多原本閾值之下的弱牌型提高到閾值之上,來提高綜合勝率庙睡∈赂唬“只贏一點點”在很多兩方對局游戲中都已經(jīng)成為強者們的共識,比如圍棋AlphaGo的只贏半子乘陪,再比如格斗游戲頂級高手弱化猜中對手的擇在對局中的影響统台,專注猜不中時通過心理戰(zhàn)和預判在立回中獲得優(yōu)勢。
Of course, due to the sheer number of hands possible in a 37 card deck, our plot will not simply be us seeing the threshold and then having a line for two units at four, for two units at one, for four units at nine, and for two units at seven – if we were to graph it holistically, with ‘true’ values assigned to each hand, it would reflect a traditional, continuous graph much more due to the number of points.
當然啡邑,由于37張的卡組(Hoban時代標配3成金壓縮卡組)可能的起手數(shù)量(龐大)贱勃,如果我們對卡組(的骰面模型)完整地(用直方圖histogram)圖示化,假設我們(上帝視角)知道每個起手的“真實”價值谤逼,那么圖示中(模型中骰面9999774411的卡組)我們不會看到2單位的4值募寨,2單位的1值,4單位的9值森缠,2單位的7值拔鹰,而是一個(橫軸)點位遠大于(10)的常見的(走勢)連續(xù)的圖。
這里對模型中捕獲的差異開始精細化處理贵涵,之前為了講故事把所有卡組的起手分布都簡化到10個點位列肢,而實際卡組的起手數(shù)量上限是恰画,即使對同名卡做狀態(tài)壓縮仍然是一個龐大的數(shù)字。
有興趣可以看下作者對這個模型的編程實踐 GitHub - Inexorably/gwentCalc: Calculates the expected value of a deck, given assumed combo / other values.
At this point we would graph different builds, and see which has the largest unit distance above the threshold: in effect, which value of deck satisfies this for the largest percentage of hands. This would of course be extremely tedious to do by hand, and thus would be done by computer (if we generalize hands to a degree this is actually not hard to code).
Returning from this tangent to the concept of assigning different values to different faces, we have four dice, with the following faces:
Ten 8s (Tier One Consistency)
Ten 6s (Lower Tier Consistency)
Four 1s, Six 9s (Variance)
Two 1s, Two 4s, Two 7s, Four 9s (Variance with Dark Hole type card(s))
Our averages are 8, 6, 5.8, and 6, respectively. We previously examined the variance vs consistency match up, and saw that it was based on the winning threshold being reached as fast as possible (using the model with non-linear amount of hearts gained per turn to represent grind game). In Yu-Gi-Oh!, this is seen how a combo heavy, aggressive deck will generally win with the superior power it generates over a limited amount of turns – compared to the relatively linear power generation of a deck such as +1 Fire Fist or HAT, or will simply fail to win early and then get out grinded as the average power decreases with following turns. This is due to Yu-Gi-oh!’s lack of a restraining mana system – the only thing limiting the combo deck’s aggression is its ability (how long it takes) to convert hand advantage into field advantage. While we are on the Magic related note, it should be noted that dice interactions describe deck interactions between agro, combo, and control relatively well, while allowing us to easier compare this to Yu-Gi-Oh! despite the mana system.
Let us examine the consistency mirror, and the variance mirror. In the consistency mirror, the win will almost always go to the player with the higher average roll. For example, going back to January 2014 we see things like Bujins, Blackwings, and Hunders doing abysmally. This represents the concept of fairness that has been used increasingly in recent times – these decks are consistency based in rolls, and for the most part do not have high power variance rolls. Because of this, the winning threshold between these decks and a tier one consistency deck (+1 fire fists, represented by dice (1) above) is very rarely achieved in early game (first couple rolls), and almost without fail the lower tier consistency deck falls. From this, we see that there is almost no benefit in playing lower tier consistency based decks.
In contrast, the variance mirror includes two dice with large bounds in roll values. This is allows for the winning threshold to be achieved extremely easily, as if one person rolls an 8 and the other a 2, the game immediately ends. This concept is exemplified in a match up like Quickdraw quasar vs Karakuri. Essentially whoever bricks will lose immediately if the other does not brick equivalently (for example rolling a 3 vs a 2). However, at this point we notice that with the high variance rolls there is little to be gained by increasing your roll power – for example, if your opponent rolls a 2 there is no difference between rolling a 6 or rolling a 9. So, if the meta is mostly composed of variance based decks or dice, you should seek to restrict your high rolls to the minimum winning threshold, and increase your lower variance rolls enough to increase your win rate slightly (for example, adding dark hole type cards).
Referring to a post by Hoban in the ban list thread, we see him note that the format generally goes consistency -> variance in terms of tier one decks / dice. Of course, over time builds become refined and more consistent at hitting their goal (what they’re winning condition is). In terms of a constancy deck, which generally has a stable, non-combo based core, we might see slight adjustments to the monster line up (such as the addition of Card Car Ds to Fire Fist, though Card Car is not a true monster) and standardizations to the main deck, such as fiendish. This is because in a consistency-based meta, there is only one ‘best dice’, and thus this dice changes to combat itself better. This represents the first half of January 2014’s format. Then variance is introduced in the form of Mermail. Contrary to +1 Fire Fist, Mermail did in fact have the ability to brick, which resulted in something resembling dice (3) or dice (4) (this was further accentuated games two and three where the fire fist could draw a macro and win unless Mermails could quickly out it). However, as noted in the constancy vs variance section, there is no difference between losing with a difference of 10 cards in card advantage between you and your opponent, and a difference of 1 card – both mean a loss. Because of this, Mermail’s variance based style allowed it to accept the losses to macro / difi, and focus on the six or so faces of very high value rolls which were enough to reach the winning threshold over fire fist without progressing to the late game, or at least reach the late game with correct set-up (waters for Tidal in grave, and preferably a controller in hand).