Explaining predictive models: the Evidence Counterfactual

Imagine being targeted with an advertisement for this blog. You’d like to know: why did the AI model predict you’d be interested in the Faculty of Business and Economics’ blog, based on the hundreds of web pages you visited? The answer could be: because you visited www.great-business-faculties.com, www.datascience-in-business.org and www.living-in-antwerp.com: if you would not have visited these pages, you’d no longer be targeted with this specific ad. This explanation is an example of an imaginary “Evidence Counterfactual”.

The use of such browsing data, as well as Facebook likes or location data is often used in targeted advertising systems, and beyond. The predictive models are increasingly complex, while end users become more vocal in their wish for transparent and explainable systems.

When being the subject of such predictive models, businesses and citizens might ask: why I am being rejected for credit? Why I am being targeted with this ad? Etc. The European GDPR even provides the “right to obtain meaningful information about the logic involved” to data subjects who are involved in automated decision making.

In this post, you’ll learn more about the Evidence Counterfactual, an increasingly important approach within the “explainable AI” research domain, that helps understand the decisions of predictive systems that use Big Data. The Applied Data Mining research group has developed algorithms to provide such explanations and validated them in a variety of business domains.

Behavioral big data

More and more companies are tapping into a large pool of humanly-generated data, or “behavioral big data”. Think of a person liking Instagram posts, visiting different locations captured by their mobile GPS, searching Google, making online payments, connecting to people on LinkedIn, and so on. All these behavioral traces lead to artificial intelligent (AI) systems with very high predictive performance in a variety of application areas, ranging from finance to risk to marketing.

The goal of these AI systems is to use this data to predict a variable of interest, for example, a person’s personality traits, product interests, creditworthiness, and so on. The model uses a large number of small pieces of evidence to make predictions. Let’s refer to all that data as the “evidence pool“. The pieces of evidence are either “present” or “missing”. All pieces that are present can be used to make predictions.

Tourist or citizen?

To illustrate how behavioral big data can be seen as a “pool of evidence,” imagine a model that uses location data of people in New York City to predict if someone is a tourist or a citizen. Out of all possible places to go to (the “evidence pool”), a person will only visit a relatively small number of places each month.

These are the pieces of evidence that are “present” and are represented by a value of 1 (see Figure 1). All places that are not visited by that person are “missing” and get a corresponding zero value in the data matrix.

In Figure 1, for example, Anna visited 85 places out of the 50,000 possible places used by the model. She visited Times Square and Dumbo, but she did not visit Columbia University, making this a missing piece of evidence. The model decides she’s a tourist.

The intuition behind the Evidence Counterfactual

Explaining how predictive systems make decisions based on big data is challenging. Evidence Counterfactuals helps understand the reasons behind individual model predictions. This explanation approach3 identifies a causal relationship between two events: event A causes Event B, only if we observe a difference in B after changing A while keeping everything else constant.

The Evidence Counterfactual shows a subset of evidence (event A) that causally drives the model’s decision (event B). We imagine two worlds, identical in every way up until the point where the evidence set is present in one world, but not in the other. The first world is the “factual” world, and the unobserved world is the “counterfactual” world. To help clarify this, consider the following:

IF Anna did not visit Times Square and Dumbo, THEN the model’s prediction changes from tourist to NY citizen.

The pieces of evidence {Times Square, Dumbo} are a subset of the evidence of Anna (all the places she visited) and explain the model’s decision. Simply removing Times Square or Dumbo from her visited locations would not change the predicted class. Both locations need to be “removed” (feature value set to zero) to change the model’s decision.

The “factual world” is the one that’s observed and includes all the places Anna visited. The “counterfactual world” that results in a predicted class change is identical to the factual world in every way up until the two locations Times Square and Dumbo.

An important advantage of counterfactuals is that they do not require all features that are used in the model (the “evidence pool”) or all the evidence (e.g., all places Anna visited) to be part of the explanation. This is especially interesting in the context of humanly-generated big data: it allows us to explain predictions using concise and comprehensible explanations.

Computing Evidence Counterfactuals

The huge dimensionality of the behavioral data makes it infeasible to compute counterfactual explanations using a complete search algorithm (this search strategy would check all subsets of evidence as candidate explanations).

Alternatively, a heuristic search algorithm can be used to efficiently find counterfactuals. One existing approach is based on a best-first search and makes use of the model’s scoring function to first consider subsets of evidence that, when removed, reduce the predicted score the most in the direction of the opposite predicted class.

There are at least two weaknesses of this strategy:

  1. for some nonlinear models, removing one feature does not result in a predicted score change, which results in the search algorithm picking a random feature to expand in the first iteration;
  2. the search time is very sensitive to the size of the counterfactual explanation: the more evidence that needs to be removed, the longer it takes the algorithm to find the explanation.

As an alternative to the best-first search, we proposed a search strategy that chooses features according to their overall importance for the predicted score.5 The idea is that the more accurate the importance rankings are, the more likely it is to find a counterfactual explanation starting from removing the top-ranked feature up until a counterfactual explanation is found. The hybrid algorithm LIME-Counterfactual (LIME-C) seems a favorable alternative to the best-first search because of its good overall effectiveness and efficiency.

Other data and models

Evidence Counterfactuals can address various data types, from textual data and tabular data (e.g. standard Excel files) to image data. The issue is to define what it means for evidence to be “present” or “missing.” To compute counterfactuals, we need to define the notion of “removing evidence” or setting evidence to “missing.” In this post, we focused on behavioral big data. For these data, which is very sparse (a lot of zero values in the data matrix), it makes sense to represent evidence that’s “present” by those features (e.g., word or behavior) having a nonzero value.

Key takeaways

  • Predictive systems that are trained from humanly-generated Big Data have high predictive performance, however, explaining them is challenging.
  • Explaining data-driven decisions is important for a variety of reasons (increase trust and acceptance, improve models, gain insights, etc.), and for many stakeholders (data scientists, managers, decision subjects, etc.).
  • The Evidence Counterfactual is an explanation approach that can be applied across many relevant applications and highlights a key subset of evidence that led to a particular model decision.
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末竣蹦,一起剝皮案震驚了整個(gè)濱河市犁罩,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌娜搂,老刑警劉巖晓铆,帶你破解...
    沈念sama閱讀 218,386評(píng)論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異,居然都是意外死亡属铁,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,142評(píng)論 3 394
  • 文/潘曉璐 我一進(jìn)店門躬翁,熙熙樓的掌柜王于貴愁眉苦臉地迎上來焦蘑,“玉大人,你說我怎么就攤上這事盒发±觯” “怎么了?”我有些...
    開封第一講書人閱讀 164,704評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵宁舰,是天一觀的道長拼卵。 經(jīng)常有香客問我,道長蛮艰,這世上最難降的妖魔是什么腋腮? 我笑而不...
    開封第一講書人閱讀 58,702評(píng)論 1 294
  • 正文 為了忘掉前任,我火速辦了婚禮壤蚜,結(jié)果婚禮上即寡,老公的妹妹穿的比我還像新娘。我一直安慰自己仍律,他們只是感情好嘿悬,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,716評(píng)論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著水泉,像睡著了一般善涨。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上草则,一...
    開封第一講書人閱讀 51,573評(píng)論 1 305
  • 那天钢拧,我揣著相機(jī)與錄音,去河邊找鬼炕横。 笑死源内,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的份殿。 我是一名探鬼主播膜钓,決...
    沈念sama閱讀 40,314評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼嗽交,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了颂斜?” 一聲冷哼從身側(cè)響起夫壁,我...
    開封第一講書人閱讀 39,230評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎沃疮,沒想到半個(gè)月后盒让,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,680評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡司蔬,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,873評(píng)論 3 336
  • 正文 我和宋清朗相戀三年邑茄,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片俊啼。...
    茶點(diǎn)故事閱讀 39,991評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡肺缕,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出吨些,到底是詐尸還是另有隱情搓谆,我是刑警寧澤,帶...
    沈念sama閱讀 35,706評(píng)論 5 346
  • 正文 年R本政府宣布豪墅,位于F島的核電站,受9級(jí)特大地震影響黔寇,放射性物質(zhì)發(fā)生泄漏偶器。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,329評(píng)論 3 330
  • 文/蒙蒙 一缝裤、第九天 我趴在偏房一處隱蔽的房頂上張望屏轰。 院中可真熱鬧,春花似錦憋飞、人聲如沸霎苗。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,910評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽唁盏。三九已至,卻和暖如春检眯,著一層夾襖步出監(jiān)牢的瞬間厘擂,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,038評(píng)論 1 270
  • 我被黑心中介騙來泰國打工锰瘸, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留刽严,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 48,158評(píng)論 3 370
  • 正文 我出身青樓避凝,卻偏偏與公主長得像舞萄,于是被迫代替她去往敵國和親眨补。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,941評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容

  • 前言 Google Play應(yīng)用市場對(duì)于應(yīng)用的targetSdkVersion有了更為嚴(yán)格的要求倒脓。從 2018 年...
    申國駿閱讀 64,203評(píng)論 14 98
  • """1.個(gè)性化消息: 將用戶的姓名存到一個(gè)變量中撑螺,并向該用戶顯示一條消息。顯示的消息應(yīng)非常簡單把还,如“Hello ...
    她即我命閱讀 2,898評(píng)論 0 5
  • 我們都是軟弱的人实蓬,所以才會(huì)說謊。我們都是膽小的人吊履,所以才要武裝安皱。我們都是一群笨蛋,所以才會(huì)互相傷害艇炎。
    所羅門的偽證_dc0a閱讀 2,066評(píng)論 0 3
  • 為了讓我有一個(gè)更快速酌伊、更精彩、更輝煌的成長缀踪,我將開始這段刻骨銘心的自我蛻變之旅居砖!從今天開始,我將每天堅(jiān)持閱...
    李薇帆閱讀 1,717評(píng)論 0 2
  • 似乎最近一直都在路上驴娃,每次出來走的時(shí)候感受都會(huì)很不一樣奏候。 1、感恩一直遇到好心人唇敞,很幸運(yùn)蔗草。在路上總是...
    時(shí)間里的花Lily閱讀 1,192評(píng)論 0 1