CRISP data mining process

CRISP Data Mining Process is a process model with six phases that naturally describes the Data Science Life Cycle. It will help you plan, organize and implement your data science(or machine learning) project.

I: The data mining process

  • Business understanding – What does the business need?
  • Data understanding – What data do we have / need? Is it clean?
  • Data preparation – How do we organize the data for modeling?
  • Modeling – What modeling techniques should we apply?
  • Evaluation – Which model best meets the business objectives?
  • Deployment – How do stakeholders access the results?
The Most Common Methodology.jpeg

Data science teams that combine a loose implementation of CRISP-DM with overarching team-based agile project management approaches will likely see the best results. Even teams that don’t explicitly follow CRISP-DM, can still use the framework diagram to explain how the differences between data science and software projects.

II What are the 6 CRISP-DM Phases

2.1 Business Understanding

The Business Understanding phase focuses on understanding the objectives and requirements of the project. Aside from the third task, the three other tasks in this phase are foundational project management activities that are universal to most projects

    1. Determine business objectives: You should first “thoroughly understand, from a business perspective, what the customer really wants to accomplish.” (CRISP-DM Guide) and then define business success criteria.
    1. Assess situation: Determine resources availability, project requirements, assess risks and contingencies, and conduct a cost-benefit analysis.
    1. Determine data mining goals: In addition to defining the business objectives, you should also define what success looks like from a technical data mining perspective.
    1. Produce project plan: Select technologies and tools and define detailed plans for each project phase.

    2.2 Data Understanding

Next is the Data Understanding phase. Adding to the foundation of Business Understanding, it drives the focus to identify, collect, and analyze the data sets that can help you accomplish the project goals. This phase also has four tasks:

  • Collect initial data: Acquire the necessary data and (if necessary) load it into your analysis tool.

  • Describe data: Examine the data and document its surface properties like data format, number of records, or field identities.

  • Explore data: Dig deeper into the data. Query it, visualize it, and identify relationships among the data.

  • Verify data quality: How clean/dirty is the data?

2.3 Data Preparetion

This phase, which is often referred to as “data munging”, prepares the final data set(s) for modeling. It has five tasks:

  • Select data: Determine which data sets will be used and document reasons for inclusion/exclusion.

  • Clean data: Often this is the lengthiest task. Without it, you’ll likely fall victim to garbage-in, garbage-out. A common practice during this task is to correct, impute, or remove erroneous values.

  • Construct data: Derive new attributes that will be helpful. For example, derive someone’s body mass index from height and weight fields.

  • Integrate data: Create new data sets by combining data from multiple sources.

  • Format data: Re-format data as necessary. For example, you might convert string values that store numbers to numeric values so that you can perform mathematical operations.

2.4. Modeling

e.g
Unsupervised & supervised tasks:
Classification & probability estimation; Regression; Similarity matching; Clustering; Co-occurrence grouping; Profiling; Link prediction; Data reduction; Causal modelling

Here you’ll likely build and assess various models based on several different modeling techniques. This phase has four tasks:

  • Select modeling techniques: Determine which algorithms to try (e.g. regression, neural net).
  • Generate test design: Pending your modeling approach, you might need to split the data into training, test, and validation sets.
  • Build model: As glamorous as this might sound, this might just be executing a few lines of code like “reg = LinearRegression().fit(X, y)”.
  • Assess model: Generally, multiple models are competing against each other, and the data scientist needs to interpret the model results based on domain knowledge, the pre-defined success criteria, and the test design.

2.5 Evaluation

Whereas the Assess Model task of the Modeling phase focuses on technical model assessment, the Evaluation phase looks more broadly at which model best meets the business and what to do next. This phase has three tasks:

  • Evaluate results: Do the models meet the business success criteria? Which one(s) should we approve for the business?
  • Review process: Review the work accomplished. Was anything overlooked? Were all steps properly executed? Summarize findings and correct anything if needed.
  • Determine next stepsBased on the previous three tasks, determine whether to proceed to deployment, iterate further, or initiate new projects.

2.6 Deployment

A model is not particularly useful unless the customer can access its results. The complexity of this phase varies widely. This final phase has four tasks:

  • Plan deployment: Develop and document a plan for deploying the model.
  • Plan monitoring and maintenance: Develop a thorough monitoring and maintenance plan to avoid issues during the operational phase (or post-project phase) of a model.
  • **Produce final report: **The project team documents a summary of the project which might include a final presentation of data mining results.
  • Review project: Conduct a project retrospective about what went well, what could have been better, and how to improve in the future.
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末语盈,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子榜苫,更是在濱河造成了極大的恐慌跌帐,老刑警劉巖首懈,帶你破解...
    沈念sama閱讀 221,198評(píng)論 6 514
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異谨敛,居然都是意外死亡究履,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,334評(píng)論 3 398
  • 文/潘曉璐 我一進(jìn)店門脸狸,熙熙樓的掌柜王于貴愁眉苦臉地迎上來最仑,“玉大人,你說我怎么就攤上這事炊甲∧嗤” “怎么了?”我有些...
    開封第一講書人閱讀 167,643評(píng)論 0 360
  • 文/不壞的土叔 我叫張陵卿啡,是天一觀的道長吟吝。 經(jīng)常有香客問我,道長牵囤,這世上最難降的妖魔是什么爸黄? 我笑而不...
    開封第一講書人閱讀 59,495評(píng)論 1 296
  • 正文 為了忘掉前任,我火速辦了婚禮揭鳞,結(jié)果婚禮上炕贵,老公的妹妹穿的比我還像新娘。我一直安慰自己野崇,他們只是感情好称开,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,502評(píng)論 6 397
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著乓梨,像睡著了一般鳖轰。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上扶镀,一...
    開封第一講書人閱讀 52,156評(píng)論 1 308
  • 那天蕴侣,我揣著相機(jī)與錄音,去河邊找鬼臭觉。 笑死昆雀,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的蝠筑。 我是一名探鬼主播狞膘,決...
    沈念sama閱讀 40,743評(píng)論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼什乙!你這毒婦竟也來了挽封?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,659評(píng)論 0 276
  • 序言:老撾萬榮一對情侶失蹤臣镣,失蹤者是張志新(化名)和其女友劉穎辅愿,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體退疫,經(jīng)...
    沈念sama閱讀 46,200評(píng)論 1 319
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡渠缕,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,282評(píng)論 3 340
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了褒繁。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片亦鳞。...
    茶點(diǎn)故事閱讀 40,424評(píng)論 1 352
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖棒坏,靈堂內(nèi)的尸體忽然破棺而出燕差,到底是詐尸還是另有隱情,我是刑警寧澤坝冕,帶...
    沈念sama閱讀 36,107評(píng)論 5 349
  • 正文 年R本政府宣布徒探,位于F島的核電站,受9級(jí)特大地震影響喂窟,放射性物質(zhì)發(fā)生泄漏测暗。R本人自食惡果不足惜央串,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,789評(píng)論 3 333
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望碗啄。 院中可真熱鬧质和,春花似錦、人聲如沸稚字。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,264評(píng)論 0 23
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽胆描。三九已至瘫想,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間昌讲,已是汗流浹背国夜。 一陣腳步聲響...
    開封第一講書人閱讀 33,390評(píng)論 1 271
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留短绸,地道東北人支竹。 一個(gè)月前我還...
    沈念sama閱讀 48,798評(píng)論 3 376
  • 正文 我出身青樓,卻偏偏與公主長得像鸠按,于是被迫代替她去往敵國和親礼搁。 傳聞我的和親對象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,435評(píng)論 2 359

推薦閱讀更多精彩內(nèi)容

  • Database Analysis & Decision Support Market analysis & ma...
    Vince_zzhang閱讀 621評(píng)論 0 0
  • 16宿命:用概率思維提高你的勝算 以前的我是風(fēng)險(xiǎn)厭惡者目尖,不喜歡去冒險(xiǎn)馒吴,但是人生放棄了冒險(xiǎn),也就放棄了無數(shù)的可能瑟曲。 ...
    yichen大刀閱讀 6,056評(píng)論 0 4
  • 公元:2019年11月28日19時(shí)42分農(nóng)歷:二零一九年 十一月 初三日 戌時(shí)干支:己亥乙亥己巳甲戌當(dāng)月節(jié)氣:立冬...
    石放閱讀 6,886評(píng)論 0 2
  • 年紀(jì)越大饮戳,人的反應(yīng)就越遲鈍,腦子就越不好使洞拨,計(jì)劃稍有變化扯罐,就容易手忙腳亂,亂了方寸烦衣。 “玩壞了”也是如此歹河,不但會(huì)亂...
    玩壞了閱讀 2,148評(píng)論 2 1
  • 感動(dòng) 我在你的眼里的樣子,就是你的樣子花吟。 相互內(nèi)化 沒有絕對的善惡 有因必有果 當(dāng)你以自己的價(jià)值觀幸福感去要求其他...
    周粥粥叭閱讀 1,639評(píng)論 1 5