Eng: Applications of Data Analysis & KDD Process & CRISP-DM Methodology

Database Analysis & Decision Support

????Market analysis & management

? ? ? ? Target marketing, customer relationship management, market basket analysis, cross selling, market segmentation

? ? Risk analysis and management

? ? ? ? Forecasting, customer retention, improved underwriting, quality control, competitive analysis

? ? Fraud detection and management?

Other applications

? ? Text mining and web analysis

? ? Intelligent query answering


Market Analysis & Management

Data sources?

? ? credit card transactions, loyalty cards, discount coupons, customer complaint calls, social media, plus (public) lifestyle studies

Target marketing

? ? find clusters of 'model' customers who share same characteristics: interest, income level, spending habits, etc

Determine customer purchasing patterns over time

? ? conversion of sign to joint bank account: marriage ...?

Cross-market analysis

? ? associations / co-relations between product sales

? ? prediction based on the association information

Customer profiling

? ? data analytics can tell you what types of customers buy what products (clustering or classification)

Identifying customer requirements

? ? identify the best products for different customers?

? ? user prediction to find what factors will attract new customers

Provide summary information

? ? Various multidimensional summary reports

? ? Statistical summary information (mean and variance ...)


Corporate Analysis & Risk Management

Finance planning and asset evaluation

? ? Cash flow analysis and prediction

? ? Contingent claim analysis to evaluate assets

? ? Cross-sectional and time series analysis (financial-ratio, trend analysis, ...)

Resource planning

? ? summarise and compare the resources and spending?

Competition

? ? Monitor (predict) competitors and market directions

? ? group customers into classes and a class-based pricing procedure

? ? set pricing strategy in a highly competitive market


Fraud Detection & Management?

Applications?

? ? health care, retail, credit card services, telecommunications (phone card fraud) ..

Approach?

? ? use historical data to build models of fraudulent behaviour and use data mining to help identify similar instances.

Examples

? ? Auto insurance: detect groups of people who stage accidents to collect on insurance

? ? Money laundering: detect suspicious money transactions

? ? Medical insurance: detect professional patients and rings of doctors and rings of references


Other applications

? ? Sports

? ? ? ? Moneyball

? ? Astronomy

? ? ? ? JPL and the Palomar Observatory discovered 22 quasars using data analytics


KDD process: knowledge process database?

Iterative process, not waterfall

Learn the application domain (prior knowledge & goals)

Create target data set: data selection

Data cleaning and preprocessing

Data reduction and transformation

? ? Find useful features, dimensionality/variable reduction, invariant representation

Choose functions of data mining: the 'data mining problem'

? ? Summarisation, classification, regression, association, clustering

Choose the data mining algorithms

Data mining: find pattern of interest

Pattern evaluation and knowledge presentation

? ? Visualisation, transformation, remove redundant patterns, ...

Use of discovered knowledge


CRISP-DM methodology: CRoss-Industry Standard Process for Data Mining

:

Business Understanding

????Determine business objectives

????Assess situation

????Determine data mining goals

????Produce project plan

Data Understanding

? ? Collect initial data

? ? Describe data

? ? ? ? Data description report?

? ? Explore data

? ? ? ? What is immediately obvious?

? ? Verify data quality

? ? ? ? What problems with the data? Sometimes called a data audit

Data Preparation

? ? Select data

? ? ? ? What pieces of data are needed and why?

? ? Clean data?

? ? ? ? Deal with the data quality problems found earlier. Maybe 60+% of effort?

? ? Construct data

? ? ? ? May need to create new instances and / or attributes.

? ? Integrate data

? ? ? ? May need to combine data from different tables or records into the one table or record

? ? Format data

? ? ? ? May need to change the format of the data. e.g. dates, remove illegal characters,...

Modelling

? ? Select the modelling techniques

? ? ? ? Considering the assumptions each technique makes

? ? Generate test design

? ? ? ? Work out how you're going to test the model quality and validity

? ? Build the model

? ? ? ? Run the modelling tool on the prepared data t o create a model?

? ? Assess the model

? ? ? ? Judge the success of the model, based on its accuracy, generality, the test design and the success criteria possibly with assistance from domain experts

Evaluation

? ? Evaluate results

? ? ? ? Based on the original business objectives (as opposed to accuracy and generality in the modelling phase)

? ? Review process

? ? ? ? Quality assurance and did the project miss any important factor or task in the business problem?

? ? Determine next steps

? ? ? ? Do you need to do something else, or can we move to deployment?

Deployment

? ? Plan deployment

? ? ? ? Develop a strategy for getting the insights (and possibly model) into the business

? ? Plan monitoring and maintenance

? ? ? ? How do you maintain the deployed model

? ? Produce final report?

? ? ? ? Describing all the previous steps and possibly a presentation to the customer

? ? Review project

? ? ? ? Reflect on the entire project. What worked彩郊?What didn't ? Hints for future?


Feature Types & their Operations

Data mining methodology

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末爵赵,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子焰宣,更是在濱河造成了極大的恐慌,老刑警劉巖薛躬,帶你破解...
    沈念sama閱讀 221,198評(píng)論 6 514
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件亦鳞,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡迅矛,警方通過(guò)查閱死者的電腦和手機(jī)妨猩,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,334評(píng)論 3 398
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)秽褒,“玉大人壶硅,你說(shuō)我怎么就攤上這事∠澹” “怎么了庐椒?”我有些...
    開(kāi)封第一講書(shū)人閱讀 167,643評(píng)論 0 360
  • 文/不壞的土叔 我叫張陵,是天一觀(guān)的道長(zhǎng)蚂踊。 經(jīng)常有香客問(wèn)我约谈,道長(zhǎng),這世上最難降的妖魔是什么犁钟? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 59,495評(píng)論 1 296
  • 正文 為了忘掉前任棱诱,我火速辦了婚禮,結(jié)果婚禮上涝动,老公的妹妹穿的比我還像新娘迈勋。我一直安慰自己,他們只是感情好醋粟,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,502評(píng)論 6 397
  • 文/花漫 我一把揭開(kāi)白布靡菇。 她就那樣靜靜地躺著担败,像睡著了一般。 火紅的嫁衣襯著肌膚如雪镰官。 梳的紋絲不亂的頭發(fā)上提前,一...
    開(kāi)封第一講書(shū)人閱讀 52,156評(píng)論 1 308
  • 那天,我揣著相機(jī)與錄音泳唠,去河邊找鬼狈网。 笑死,一個(gè)胖子當(dāng)著我的面吹牛笨腥,可吹牛的內(nèi)容都是我干的拓哺。 我是一名探鬼主播,決...
    沈念sama閱讀 40,743評(píng)論 3 421
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼脖母,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼士鸥!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起谆级,我...
    開(kāi)封第一講書(shū)人閱讀 39,659評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤烤礁,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后肥照,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體脚仔,經(jīng)...
    沈念sama閱讀 46,200評(píng)論 1 319
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,282評(píng)論 3 340
  • 正文 我和宋清朗相戀三年舆绎,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了鲤脏。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 40,424評(píng)論 1 352
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡吕朵,死狀恐怖猎醇,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情努溃,我是刑警寧澤硫嘶,帶...
    沈念sama閱讀 36,107評(píng)論 5 349
  • 正文 年R本政府宣布,位于F島的核電站茅坛,受9級(jí)特大地震影響音半,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜贡蓖,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,789評(píng)論 3 333
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望煌茬。 院中可真熱鬧斥铺,春花似錦、人聲如沸坛善。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 32,264評(píng)論 0 23
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至剔交,卻和暖如春肆饶,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背岖常。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 33,390評(píng)論 1 271
  • 我被黑心中介騙來(lái)泰國(guó)打工驯镊, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人竭鞍。 一個(gè)月前我還...
    沈念sama閱讀 48,798評(píng)論 3 376
  • 正文 我出身青樓板惑,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親偎快。 傳聞我的和親對(duì)象是個(gè)殘疾皇子冯乘,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,435評(píng)論 2 359

推薦閱讀更多精彩內(nèi)容

  • 我不想再結(jié)婚,也不想生寶寶晒夹。 因?yàn)椤绻莻€(gè)人不是你裆馒,我說(shuō)服不了我自己。 不管以后你怎么對(duì)我丐怯,我們會(huì)怎樣领追,我都不怪...
    李玉榮_a379閱讀 119評(píng)論 0 0
  • 裝修新房時(shí)舔亭,每個(gè)家庭都會(huì)裝電熱水器些膨,而且電熱水器日復(fù)一日,年復(fù)一年的都在使用钦铺《┪恚可是,你也許不知道矛洞,我們很多家庭熱水...
    環(huán)保居閱讀 401評(píng)論 0 0
  • MySQL不同存儲(chǔ)引擎可能會(huì)有不同洼哎。下面的內(nèi)容以InnoDB為主。 選擇數(shù)據(jù)類(lèi)型的步驟 確定合適的大類(lèi)型:數(shù)字沼本、字...
    linjinhe閱讀 1,700評(píng)論 0 3