2.1 Wavelet Transform
Question 1
step1: Calculate the average of input samples
step2: Calculate the difference of input samples
step3: Regard averages as input sequence. Repeat step1 and step2 until only one output average.
Quention 2
Original Signal | 1 | 4 | 2 | 3 | -2 | -1 | 2 | 1 |
---|---|---|---|---|---|---|---|---|
1.Iteration | 3.53 | 3.53 | -2.12 | 2.12 | -2.12 | -0.71 | -0.71 | 0.71 |
2.Iteration | 5.00 | 0.00 | 0.00 | -3.00 | ||||
3.Iteration | 3.54 | 3.54 | ||||||
Haar wavelet transformation | 3.54 | 3.54 | 0.00 | -3.00 | -2.12 | -0.71 | -0.71 | 0.71 |
2.2 Principal Components Analysis
import numpy as np
dimension = 3
data_num = 7
data = [[-1, -2, -3, 1, 2, 3, 1], [-1, -1, -2, 1, 1, 2, 2],
[1, 4, -2, 1, 2, 1, 4]]
data_mean = [np.mean(data[i]) for i in range(dimension)]
# normalize data by average
for i in range(0, dimension):
for j in range(0, len(data[0])):
data[i][j] -= data_mean[i]
# calculate covariance matrix according to definition
cov = []
for i in range(dimension):
for j in range(dimension):
sumOfRowMulti = 0
for k in range(0, data_num):
sumOfRowMulti += data[i][k] * data[j][k]
cov.append(sumOfRowMulti / (data_num - 1))
cov_matrix = np.asarray(cov).reshape(dimension, dimension)
print 'covariance matrix is:'
print cov_matrix
# calculate eigen value and eigen vector using numpy function
eig_val, eig_vec = np.linalg.eig(cov_matrix)
for i in range(dimension):
print 'eigen value', i, eig_val[
i], ' corresponding eigen vector:', eig_vec[:, i]
# sort eigen value in descending order
eig_val_sorted = np.sort(eig_val)[::-1]
# calcuate propotion
# propotion=sum of eigen value of first k components/sum of all n eigen values
propotion = (eig_val_sorted[0] + eig_val_sorted[1]) / np.sum(eig_val_sorted)
print 'proportion of total population variance explained by the first two components: %.4f' % propotion
The result is:
3 Pattern Discovery
Question 1
Step 1: Discretization
Web page
- Sorted data for web pages : 7,9,9,11,13,14,18,35,36,45
- Partition into 4 equal-frequency (equal-depth) bins:
- web1: 7,9,9
- web2: 11,13
- web3: 14,18,35
- web4: 36,45
Session length
- Find the max and min value of session length: 267,2017
- Calculate the width of intervals: width = ( 2017 – 267 )/ 3≈583.4
- Partition into 3 equal-frequency (equal-width) bins:
- len1 (267~850.4): 267,598,672
- len2 (850.3~1433.8): 898,998,1213,1345
- len3(1433.8~2017.2): 1543,1702,2017
Transactions table after discretization
Session ID | Country | Session Length | Web Pages | Buy |
---|---|---|---|---|
1 | NA | len2 | web1 | no |
2 | AS | len3 | web2 | yes |
3 | EU | len1 | web3 | yes |
4 | EU | len2 | web4 | no |
5 | NA | len1 | web1 | no |
6 | AS | len2 | web3 | yes |
7 | AS | len3 | web3 | yes |
8 | AS | len1 | web1 | no |
9 | NA | len3 | web2 | no |
10 | EU | len2 | web4 | yes |
Step 2: Apriori algorithm
1st scan
generate length 1 candidate itemsets
itemset | sup |
---|---|
NA | 0.3 |
AS | 0.4 |
EU | 0.3 |
len1 | 0.3 |
len2 | 0.4 |
len3 | 0.3 |
web1 | 0.3 |
web2 | 0.2 |
web3 | 0.3 |
web4 | 0.2 |
yes | 0.5 |
no | 0.5 |
prune infrequent itemsets
itemset | sup |
---|---|
NA | 0.3 |
AS | 0.4 |
EU | 0.3 |
len1 | 0.3 |
len2 | 0.4 |
len3 | 0.3 |
web1 | 0.3 |
web3 | 0.3 |
yes | 0.5 |
no | 0.5 |
2nd scan
generate length 2 candidate itemsets
itemset | sup | itemset | sup |
---|---|---|---|
NA,len1 | 0.1 | EU,yes | 0.2 |
NA,len2 | 0.1 | EU,no | 0.1 |
NA,len3 | 0.1 | len1,web1 | 0.2 |
AS,len1 | 0.1 | len2,web1 | 0.1 |
AS,len2 | 0.1 | len3,web1 | 0 |
AS,len3 | 0.2 | len1,web3 | 0.1 |
EU,len1 | 0.1 | len2,web3 | 0.1 |
EU,len2 | 0.2 | len3,web3 | 0.1 |
EU,len3 | 0 | len1,yes | 0.1 |
NA,web1 | 0.2 | len2,yes | 0.2 |
NA,web3 | 0 | len3,yes | 0.2 |
AS,web1 | 0.1 | len1,no | 0.2 |
AS,web3 | 0.2 | len2,no | 0.2 |
EU,web1 | 0 | len3,no | 0.1 |
EU,web3 | 0.1 | web1,yes | 0 |
NA,yes | 0 | web1,no | 0.3 |
NA,no | 0.3 | web3,yes | 0.3 |
AS,yes | 0.3 | web3,no | 0 |
AS,no | 0.1 |
prune infrequent itemsets
itemset | sup |
---|---|
NA,no | 0.3 |
AS,yes | 0.3 |
web1,no | 0.3 |
web3,yes | 0.3 |
3rd scan
generate length 3 candidate itemsets
no length 3 candidate
Step 3: Get frequent itemsets
{web3} (support = 0.3)
{web1} (support = 0.3)
{len3} (support = 0.3)
{len1} (support = 0.3)
{NA} (support = 0.3)
{EU} (support = 0.3)
{len2} (support = 0.4)
{AS} (support = 0.4)
{yes} (support = 0.5)
{no} (support = 0.5)
{web3,yes} (support = 0.3)
{web1,no} (support = 0.3)
{NA,no} (support = 0.3)
{AS,yes} (support = 0.3)
Question 2
Step 1: Discretization
same as question question 1
Step 2: Deduce the ordered frequent items
item | sup |
---|---|
no | 0.5 |
yes | 0.5 |
AS | 0.4 |
len2 | 0.4 |
EU | 0.3 |
NA | 0.3 |
len1 | 0.3 |
len3 | 0.3 |
web1 | 0.3 |
web3 | 0.3 |
ID | Frequent Items |
---|---|
1 | no,len2,NA,web1 |
2 | yes,AS,len3 |
3 | yes,EU,len1,web3 |
4 | no,len2,EU |
5 | no,NA,len1,web1 |
6 | yes,AS,len2,web3 |
7 | yes,AS,len3,web3 |
8 | no,AS,len1,web1 |
9 | no,NA,len3 |
10 | yes,len2,EU |
Step 3: Construct FP tree
Step 4: construct conditional FP-tree for each item
Cond. FP-tree on "web3"
{web3} (support = 0.3)
Cond. FP-tree on "web1"
{web1} (support = 0.3)
Cond. FP-tree on "len3"
{len3} (support = 0.3)
Cond. FP-tree on "len1"
{len1} (support = 0.3)
Cond. FP-tree on "NA"
{NA} (support = 0.3)
Cond. FP-tree on "EU"
{EU} (support = 0.3)
Cond. FP-tree on "len2"
{len2} (support = 0.4)
Cond. FP-tree on "AS"
{AS} (support = 0.4)
Cond. FP-tree on "yes"
{yes} (support = 0.5)
Cond. FP-tree on "no"
{no} (support = 0.5)
Step 5: Determine frequent patterns
{web3} (support = 0.3)
{web3,yes} (support = 0.3)
{web1} (support = 0.3)
{web1,no} (support = 0.3)
{len3} (support = 0.3)
{len1} (support = 0.3)
{NA} (support = 0.3)
{NA,no} (support = 0.3)
{EU} (support = 0.3)
{len2} (support = 0.4)
{AS} (support = 0.4)
{AS,yes} (support = 0.3)
{yes} (support = 0.5)
{no} (support = 0.5)
Question 3
closed frequent patterns
{web3,yes}
{web1,no}
{len3}
{len1}
{NA,no}
{EU}
{len2}
{AS}
{AS,yes}
{yes}
{no}
maximal frequent patterns
{web3,yes}
{web1,no}
{len3}
{len1}
{NA,no}
{EU}
{len2}
{AS,yes}