Database Analysis & Decision Support
????Market analysis & management
? ? ? ? Target marketing, customer relationship management, market basket analysis, cross selling, market segmentation
? ? Risk analysis and management
? ? ? ? Forecasting, customer retention, improved underwriting, quality control, competitive analysis
? ? Fraud detection and management?
Other applications
? ? Text mining and web analysis
? ? Intelligent query answering
Market Analysis & Management
Data sources?
? ? credit card transactions, loyalty cards, discount coupons, customer complaint calls, social media, plus (public) lifestyle studies
Target marketing
? ? find clusters of 'model' customers who share same characteristics: interest, income level, spending habits, etc
Determine customer purchasing patterns over time
? ? conversion of sign to joint bank account: marriage ...?
Cross-market analysis
? ? associations / co-relations between product sales
? ? prediction based on the association information
Customer profiling
? ? data analytics can tell you what types of customers buy what products (clustering or classification)
Identifying customer requirements
? ? identify the best products for different customers?
? ? user prediction to find what factors will attract new customers
Provide summary information
? ? Various multidimensional summary reports
? ? Statistical summary information (mean and variance ...)
Corporate Analysis & Risk Management
Finance planning and asset evaluation
? ? Cash flow analysis and prediction
? ? Contingent claim analysis to evaluate assets
? ? Cross-sectional and time series analysis (financial-ratio, trend analysis, ...)
Resource planning
? ? summarise and compare the resources and spending?
Competition
? ? Monitor (predict) competitors and market directions
? ? group customers into classes and a class-based pricing procedure
? ? set pricing strategy in a highly competitive market
Fraud Detection & Management?
Applications?
? ? health care, retail, credit card services, telecommunications (phone card fraud) ..
Approach?
? ? use historical data to build models of fraudulent behaviour and use data mining to help identify similar instances.
Examples
? ? Auto insurance: detect groups of people who stage accidents to collect on insurance
? ? Money laundering: detect suspicious money transactions
? ? Medical insurance: detect professional patients and rings of doctors and rings of references
Other applications
? ? Sports
? ? ? ? Moneyball
? ? Astronomy
? ? ? ? JPL and the Palomar Observatory discovered 22 quasars using data analytics
KDD process: knowledge process database?
Learn the application domain (prior knowledge & goals)
Create target data set: data selection
Data cleaning and preprocessing
Data reduction and transformation
? ? Find useful features, dimensionality/variable reduction, invariant representation
Choose functions of data mining: the 'data mining problem'
? ? Summarisation, classification, regression, association, clustering
Choose the data mining algorithms
Data mining: find pattern of interest
Pattern evaluation and knowledge presentation
? ? Visualisation, transformation, remove redundant patterns, ...
Use of discovered knowledge
CRISP-DM methodology: CRoss-Industry Standard Process for Data Mining
Business Understanding
????Determine business objectives
????Assess situation
????Determine data mining goals
????Produce project plan
Data Understanding
? ? Collect initial data
? ? Describe data
? ? ? ? Data description report?
? ? Explore data
? ? ? ? What is immediately obvious?
? ? Verify data quality
? ? ? ? What problems with the data? Sometimes called a data audit
Data Preparation
? ? Select data
? ? ? ? What pieces of data are needed and why?
? ? Clean data?
? ? ? ? Deal with the data quality problems found earlier. Maybe 60+% of effort?
? ? Construct data
? ? ? ? May need to create new instances and / or attributes.
? ? Integrate data
? ? ? ? May need to combine data from different tables or records into the one table or record
? ? Format data
? ? ? ? May need to change the format of the data. e.g. dates, remove illegal characters,...
Modelling
? ? Select the modelling techniques
? ? ? ? Considering the assumptions each technique makes
? ? Generate test design
? ? ? ? Work out how you're going to test the model quality and validity
? ? Build the model
? ? ? ? Run the modelling tool on the prepared data t o create a model?
? ? Assess the model
? ? ? ? Judge the success of the model, based on its accuracy, generality, the test design and the success criteria possibly with assistance from domain experts
Evaluation
? ? Evaluate results
? ? ? ? Based on the original business objectives (as opposed to accuracy and generality in the modelling phase)
? ? Review process
? ? ? ? Quality assurance and did the project miss any important factor or task in the business problem?
? ? Determine next steps
? ? ? ? Do you need to do something else, or can we move to deployment?
Deployment
? ? Plan deployment
? ? ? ? Develop a strategy for getting the insights (and possibly model) into the business
? ? Plan monitoring and maintenance
? ? ? ? How do you maintain the deployed model
? ? Produce final report?
? ? ? ? Describing all the previous steps and possibly a presentation to the customer
? ? Review project
? ? ? ? Reflect on the entire project. What worked彩郊?What didn't ? Hints for future?
Feature Types & their Operations