177转绷、基于Python的Apriori和FP-growth關(guān)聯(lián)分析算法分析淘寶用戶(hù)購(gòu)物關(guān)聯(lián)度

關(guān)聯(lián)分析用于發(fā)現(xiàn)用戶(hù)購(gòu)買(mǎi)不同的商品之間存在關(guān)聯(lián)和相關(guān)聯(lián)系悼瓮，比如A商品和B商品存在很強(qiáng)的相關(guān)性戈毒，常用于實(shí)體商店或在線電商的推薦系統(tǒng)，例如某一客戶(hù)購(gòu)買(mǎi)A商品横堡，那么他很有可能會(huì)購(gòu)買(mǎi)B商品埋市，通過(guò)大量銷(xiāo)售數(shù)據(jù)找到經(jīng)常在一起購(gòu)買(mǎi)的商品組合，可以了解用戶(hù)的購(gòu)買(mǎi)行為命贴，根據(jù)銷(xiāo)售的商品推薦關(guān)聯(lián)商品從而給出購(gòu)買(mǎi)建議道宅，尋找銷(xiāo)售新的增長(zhǎng)點(diǎn)。

一胸蛛、數(shù)據(jù)來(lái)源及說(shuō)明

https://tianchi.aliyun.com/dataset/dataDetail?dataId=46&userId=1

本文從數(shù)據(jù)集中選取包含了2014年11月18日至2014年12月18日之間培己，10000名隨機(jī)用戶(hù)共12256906條行為數(shù)據(jù)，數(shù)據(jù)集的每一行表示一條用戶(hù)行為胚泌，共6列省咨。

列字段包含以下：

user_id：用戶(hù)身份

item_id：商品ID

behavior_type：用戶(hù)行為類(lèi)型（包含點(diǎn)擊、收藏玷室、加購(gòu)物車(chē)零蓉、購(gòu)買(mǎi)四種行為，分別用數(shù)字1穷缤、2敌蜂、3、4表示）

user_geohash：地理位置（有空值）

item_category：品類(lèi)ID（商品所屬的品類(lèi)）

time：用戶(hù)行為發(fā)生的時(shí)間

數(shù)據(jù)結(jié)構(gòu)如下：

1.數(shù)據(jù)結(jié)構(gòu).png

在本次分析中只選用user_id和item_id這兩個(gè)字段

二津肛、提出問(wèn)題

1章喉、用戶(hù)購(gòu)買(mǎi)哪種商品次數(shù)最多

2、用戶(hù)購(gòu)買(mǎi)的商品中，哪些商品組合關(guān)聯(lián)度高

三秸脱、數(shù)據(jù)清洗和構(gòu)建模型

關(guān)聯(lián)分析算法常用Apriori算法和FP-growth算法

（一） Apriori算法

1落包、Apriori算法基本原理

Apriori算法是經(jīng)典的挖掘頻繁項(xiàng)集和關(guān)聯(lián)規(guī)則的數(shù)據(jù)挖掘算法，可以從大規(guī)模數(shù)據(jù)集中尋找物品間的隱含關(guān)系摊唇。其核心思想是通過(guò)連接產(chǎn)生候選項(xiàng)及其支持度咐蝇，然后通過(guò)剪枝生成頻繁項(xiàng)集。

項(xiàng)集：包含0個(gè)或者多個(gè)項(xiàng)的集合稱(chēng)為項(xiàng)集巷查。在購(gòu)物藍(lán)事務(wù)中有序，每一樣商品就是一個(gè)項(xiàng)，一次購(gòu)買(mǎi)行為包含了多個(gè)項(xiàng)岛请，把其中的項(xiàng)組合起來(lái)就構(gòu)成了項(xiàng)集

支持度計(jì)數(shù)：項(xiàng)集在事務(wù)中出現(xiàn)的次數(shù)

頻繁項(xiàng)集：經(jīng)常出現(xiàn)在一塊的物品的集合

關(guān)聯(lián)規(guī)則：暗示兩種物品之間可能存在很強(qiáng)的關(guān)系

Support（支持度）：表示同時(shí)包含 A 和 B 的事務(wù)占所有事務(wù)的比例旭寿。如果用 P(A) 表示包含 A 的事務(wù)的比例，那么 Support = P（A & B）

Confidence（可信度/置信度）：表示包含 A 的事務(wù)中同時(shí)包含 B 的事務(wù)的比例崇败，即同時(shí)包含 A 和 B 的事務(wù)占包含 A 的事務(wù)的比例许师。公式表達(dá)：Confidence = P（A & B）/ P(A)

Apriori算法兩個(gè)重要的定律：

定律1：如果一個(gè)集合是頻繁項(xiàng)集，則它的所有子集都是頻繁項(xiàng)集

定律2：如果一個(gè)集合不是頻繁項(xiàng)集僚匆，則它的所有超集都不是頻繁項(xiàng)集

2微渠、Apriori算法實(shí)現(xiàn)基本過(guò)程如下：

1）創(chuàng)建初始候選集,篩選候選項(xiàng)1項(xiàng)集

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

## 方法一：
def apriori(data_set):
    """創(chuàng)建初始候選集,候選項(xiàng)1項(xiàng)集"""
    print('創(chuàng)建初始候選項(xiàng)1項(xiàng)集')
    c1 = set()
    for items in data_set:
        for item in items:
            # frozenset()返回一個(gè)凍結(jié)的集合,凍結(jié)后集合不能再添加或刪除任何元素
            item_set = frozenset([item]) 
            c1.add(item_set)

2）從候選項(xiàng)集中選出滿足最小支持度的頻繁項(xiàng)集并計(jì)算支持度

def generate_freq_supports(data_set, item_set, min_support):
    """從候選項(xiàng)集中選出頻繁項(xiàng)集并計(jì)算支持度"""
    print('篩選頻繁項(xiàng)集并計(jì)算支持度')
    freq_set = set()  # 保存頻繁項(xiàng)集元素
    item_count = {}  # 保存元素頻次，用于計(jì)算支持度
    supports = {}  # 保存支持度

    # 如果項(xiàng)集中元素在數(shù)據(jù)集中則計(jì)數(shù)
    for record in data_set:
        for item in item_set:
            # issubset()方法用于判斷集合的所有元素是否都包含在指定集合中
            if item.issubset(record):  
                if item not in item_count:
                    item_count[item] = 1
                else:
                    item_count[item] += 1

    data_len = float(len(data_set))

    # 計(jì)算項(xiàng)集支持度
    for item in item_count:
        if (item_count[item] / data_len) >= min_support:
            freq_set.add(item)
            supports[item] = item_count[item] / data_len

    return freq_set, supports

3）根據(jù)頻繁項(xiàng)集咧擂，生成新的候選項(xiàng)1項(xiàng)集

def generate_new_combinations(freq_set, k):
    """
    根據(jù)頻繁項(xiàng)集逞盆，生成新的候選項(xiàng)1項(xiàng)集
    參數(shù)：頻繁項(xiàng)集列表 freq_set 與項(xiàng)集元素個(gè)數(shù) k
    """
    print('生成新組合')
    new_combinations = set()  # 保存新組合
    sets_len = len(freq_set)  # 集合含有元素個(gè)數(shù)，用于遍歷求得組合
    freq_set_list = list(freq_set)  # 集合轉(zhuǎn)為列表用于索引

    for i in range(sets_len):
        for j in range(i + 1, sets_len):
            l1 = list(freq_set_list[i])
            l2 = list(freq_set_list[j])
            l1.sort()
            l2.sort()

            # 若兩個(gè)集合的前k-2個(gè)項(xiàng)相同時(shí),則將兩個(gè)集合合并
            if l1[0:k-2] == l2[0:k-2]:
                freq_item = freq_set_list[i] | freq_set_list[j]
                new_combinations.add(freq_item)

    return new_combinations

4）循環(huán)生成候選集并計(jì)算其支持度

def apriori(data_set, min_support, max_len=None):
    """循環(huán)生成候選集并計(jì)算其支持度"""
    print('循環(huán)生成候選集')
    max_items = 2  # 初始項(xiàng)集元素個(gè)數(shù)
    freq_sets = []  # 保存所有頻繁項(xiàng)集
    supports = {}  # 保存所有支持度

    # 候選項(xiàng)1項(xiàng)集
    c1 = set()
    for items in data_set:
        for item in items:
            item_set = frozenset([item])
            c1.add(item_set)

    # 頻繁項(xiàng)1項(xiàng)集及其支持度
    l1, support1 = generate_freq_supports(data_set, c1, min_support)

    freq_sets.append(l1)
    supports.update(support1)

    if max_len is None:
        max_len = float('inf')

    while max_items and max_items <= max_len:
        # 生成候選集
        ci = generate_new_combinations(freq_sets[-1], max_items) 
        # 生成頻繁項(xiàng)集和支持度
        li, support = generate_freq_supports(data_set, ci, min_support)  

        # 如果有頻繁項(xiàng)集則進(jìn)入下個(gè)循環(huán)
        if li:
            freq_sets.append(li)
            supports.update(support)
            max_items += 1
        else:
            max_items = 0

    return freq_sets, supports

5）從頻繁項(xiàng)集中挖掘關(guān)聯(lián)規(guī)則松申，篩選滿足最小可信度的關(guān)聯(lián)規(guī)則

def association_rules(freq_sets, supports, min_conf):
    """生成關(guān)聯(lián)規(guī)則"""
    print('生成關(guān)聯(lián)規(guī)則')
    rules = []
    max_len = len(freq_sets)

    # 篩選符合規(guī)則的頻繁集計(jì)算置信度云芦，滿足最小置信度的關(guān)聯(lián)規(guī)則添加到列表
    for k in range(max_len - 1):
        for freq_set in freq_sets[k]:
            for sub_set in freq_sets[k + 1]:
                if freq_set.issubset(sub_set):
                    frq = supports[sub_set]
                    conf = supports[sub_set] / supports[freq_set]
                    rule = (freq_set, sub_set - freq_set, frq, conf)
                    if conf >= min_conf:
                        print(freq_set,"-->",sub_set - freq_set,'frq:',frq,'conf:',conf)
                        rules.append(rule)
    return rules

這里先用測(cè)試數(shù)據(jù)測(cè)試算法是否可行，設(shè)置最小支持度為0.5贸桶，最小置信度為0.7：

if __name__ == '__main__':
    # 創(chuàng)建測(cè)試數(shù)據(jù)
    dic = {'user_id':[111,111,  
                      112,112,112,112,
                      113,113,113,113,
                      114,114,114,114,
                      115,115,115,115],
           'item_id':['豆奶','萵苣',        
                      '萵苣','尿布','葡萄酒','甜菜',
                      '豆奶','尿布','葡萄酒','橙汁',
                      '萵苣','豆奶','尿布','葡萄酒',
                      '萵苣','豆奶','尿布','橙汁']}
    data = pd.DataFrame(dic)

    # 關(guān)聯(lián)規(guī)則中不考慮多次購(gòu)買(mǎi)同一件物品舅逸，刪除重復(fù)數(shù)據(jù)
    data = data.drop_duplicates()

    # 初始化列表
    data_set = []

    # 分組聚合，同一用戶(hù)購(gòu)買(mǎi)多種商品的合并為一條數(shù)據(jù)皇筛，只有1件商品的沒(méi)有意義琉历，需要進(jìn)行過(guò)濾
    groups = data.groupby(by='user_id')
    for group in groups:
        if len(group[1]) >= 2:
            data_set.append(group[1]['item_id'].tolist())
            

    # L為頻繁項(xiàng)集，support_data為支持度
    L, support_data = apriori(data_set, min_support=0.5) 
    association_rules = association_rules(L, support_data, min_conf=0.7)
#    print('關(guān)聯(lián)規(guī)則：\n',association_rules)

結(jié)果如下：

2.結(jié)果.png

這里還可以直接調(diào)包Mlxtend實(shí)現(xiàn)：

# 方法二：Mlxtend實(shí)現(xiàn)
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

# 創(chuàng)建測(cè)試數(shù)據(jù)
dic = {'user_id':[111,111,  
                  112,112,112,112,
                  113,113,113,113,
                  114,114,114,114,
                  115,115,115,115],
       'item_id':['豆奶','萵苣',        
                  '萵苣','尿布','葡萄酒','甜菜',
                  '豆奶','尿布','葡萄酒','橙汁',
                  '萵苣','豆奶','尿布','葡萄酒',
                  '萵苣','豆奶','尿布','橙汁']}
data = pd.DataFrame(dic)

# 關(guān)聯(lián)規(guī)則中不考慮多次購(gòu)買(mǎi)同一件物品水醋，刪除重復(fù)數(shù)據(jù)
data = data.drop_duplicates()

# 初始化列表
data_set = []

# 分組聚合旗笔，同一用戶(hù)購(gòu)買(mǎi)多種商品的合并為一條數(shù)據(jù)，只有1件商品的沒(méi)有意義拄踪，需要進(jìn)行過(guò)濾
groups = data.groupby(by='user_id')
for group in groups:
    if len(group[1]) >= 2:
        data_set.append(group[1]['item_id'].tolist())

te = TransactionEncoder()
te_ary = te.fit(data_set).transform(data_set)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)
rules = association_rules(frequent_itemsets, min_threshold=0.3)
print('關(guān)聯(lián)規(guī)則：\n',rules)

通過(guò)測(cè)試數(shù)據(jù)可知蝇恶，Apriori算法能正常使用，接下來(lái)直接讀取淘寶用戶(hù)行為數(shù)據(jù)惶桐，分析用戶(hù)購(gòu)買(mǎi)的商品組合關(guān)聯(lián)度撮弧，設(shè)置最小支持度為0.05潘懊，最小置信度為0.3：

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

## 方法一：
def apriori(data_set):
    """創(chuàng)建初始候選集,候選項(xiàng)1項(xiàng)集"""
    print('創(chuàng)建初始候選項(xiàng)1項(xiàng)集')
    c1 = set()
    for items in data_set:
        for item in items:
            # frozenset()返回一個(gè)凍結(jié)的集合,凍結(jié)后集合不能再添加或刪除任何元素
            item_set = frozenset([item]) 
            c1.add(item_set)

def generate_freq_supports(data_set, item_set, min_support):
    """從候選項(xiàng)集中選出頻繁項(xiàng)集并計(jì)算支持度"""
    print('篩選頻繁項(xiàng)集并計(jì)算支持度')
    freq_set = set()  # 保存頻繁項(xiàng)集元素
    item_count = {}  # 保存元素頻次，用于計(jì)算支持度
    supports = {}  # 保存支持度

    # 如果項(xiàng)集中元素在數(shù)據(jù)集中則計(jì)數(shù)
    for record in data_set:
        for item in item_set:
            # issubset()方法用于判斷集合的所有元素是否都包含在指定集合中
            if item.issubset(record):  
                if item not in item_count:
                    item_count[item] = 1
                else:
                    item_count[item] += 1

    data_len = float(len(data_set))

    # 計(jì)算項(xiàng)集支持度
    for item in item_count:
        if (item_count[item] / data_len) >= min_support:
            freq_set.add(item)
            supports[item] = item_count[item] / data_len

    return freq_set, supports

def generate_new_combinations(freq_set, k):
    """
    根據(jù)頻繁項(xiàng)集贿衍，生成新的候選項(xiàng)1項(xiàng)集
    參數(shù)：頻繁項(xiàng)集列表 freq_set 與項(xiàng)集元素個(gè)數(shù) k
    """
    print('生成新組合')
    new_combinations = set()  # 保存新組合
    sets_len = len(freq_set)  # 集合含有元素個(gè)數(shù)授舟，用于遍歷求得組合
    freq_set_list = list(freq_set)  # 集合轉(zhuǎn)為列表用于索引

    for i in range(sets_len):
        for j in range(i + 1, sets_len):
            l1 = list(freq_set_list[i])
            l2 = list(freq_set_list[j])
            l1.sort()
            l2.sort()

            # 若兩個(gè)集合的前k-2個(gè)項(xiàng)相同時(shí),則將兩個(gè)集合合并
            if l1[0:k-2] == l2[0:k-2]:
                freq_item = freq_set_list[i] | freq_set_list[j]
                new_combinations.add(freq_item)

    return new_combinations

def apriori(data_set, min_support, max_len=None):
    """循環(huán)生成候選集并計(jì)算其支持度"""
    print('循環(huán)生成候選集')
    max_items = 2  # 初始項(xiàng)集元素個(gè)數(shù)
    freq_sets = []  # 保存所有頻繁項(xiàng)集
    supports = {}  # 保存所有支持度

    # 候選項(xiàng)1項(xiàng)集
    c1 = set()
    for items in data_set:
        for item in items:
            item_set = frozenset([item])
            c1.add(item_set)

    # 頻繁項(xiàng)1項(xiàng)集及其支持度
    l1, support1 = generate_freq_supports(data_set, c1, min_support)

    freq_sets.append(l1)
    supports.update(support1)

    if max_len is None:
        max_len = float('inf')

    while max_items and max_items <= max_len:
        # 生成候選集
        ci = generate_new_combinations(freq_sets[-1], max_items) 
        # 生成頻繁項(xiàng)集和支持度
        li, support = generate_freq_supports(data_set, ci, min_support)  

        # 如果有頻繁項(xiàng)集則進(jìn)入下個(gè)循環(huán)
        if li:
            freq_sets.append(li)
            supports.update(support)
            max_items += 1
        else:
            max_items = 0

    return freq_sets, supports

def association_rules(freq_sets, supports, min_conf):
    """生成關(guān)聯(lián)規(guī)則"""
    print('生成關(guān)聯(lián)規(guī)則')
    rules = []
    max_len = len(freq_sets)

    # 篩選符合規(guī)則的頻繁集計(jì)算置信度，滿足最小置信度的關(guān)聯(lián)規(guī)則添加到列表
    for k in range(max_len - 1):
        for freq_set in freq_sets[k]:
            for sub_set in freq_sets[k + 1]:
                if freq_set.issubset(sub_set):
                    frq = supports[sub_set]
                    conf = supports[sub_set] / supports[freq_set]
                    rule = (freq_set, sub_set - freq_set, frq, conf)
                    if conf >= min_conf:
                        print(freq_set,"-->",sub_set - freq_set,'frq:',frq,'conf:',conf)
                        rules.append(rule)
    return rules

if __name__ == '__main__':    
    # 讀取淘寶用戶(hù)行為數(shù)據(jù)
    data = pd.read_csv(r'D:\關(guān)聯(lián)分析\關(guān)聯(lián)分析2\tianchi_mobile_recommend_train_user.zip',encoding='ansi')
    data = data[['user_id','item_id']]

    # 關(guān)聯(lián)規(guī)則中不考慮多次購(gòu)買(mǎi)同一件物品舌厨，刪除重復(fù)數(shù)據(jù)
    data = data.drop_duplicates()

    # 初始化列表
    data_set = []

    # 分組聚合岂却，同一用戶(hù)購(gòu)買(mǎi)多種商品的合并為一條數(shù)據(jù)忿薇，只有1件商品的沒(méi)有意義裙椭，需要進(jìn)行過(guò)濾
    groups = data.groupby(by='user_id')
    for group in groups:
        if len(group[1]) >= 2:
            data_set.append(group[1]['item_id'].tolist())
            

    # L為頻繁項(xiàng)集，support_data為支持度
    L, support_data = apriori(data_set, min_support=0.05) 
    association_rules = association_rules(L, support_data, min_conf=0.3)
#    print('關(guān)聯(lián)規(guī)則：\n',association_rules)

結(jié)果程序運(yùn)行超過(guò)半小時(shí)仍未出結(jié)果署浩，這是由于Apriori算法使用多重嵌套for循環(huán)進(jìn)行計(jì)算揉燃，每次更新頻繁項(xiàng)集都需要掃描一次整個(gè)數(shù)據(jù)集，當(dāng)數(shù)據(jù)量過(guò)大時(shí)效率不高筋栋。這里由于Apriori算法的性能限制炊汤，所以考慮用FP-Growth算法尋找頻繁項(xiàng)集。

（二） FP-growth算法

1弊攘、FP-growth算法基本原理

FP-growth算法基于Apriori構(gòu)建抢腐，但采用了高級(jí)的數(shù)據(jù)結(jié)構(gòu)減少掃描次數(shù)，大大加快了算法速度襟交。FP-growth算法只需要對(duì)數(shù)據(jù)庫(kù)進(jìn)行兩次掃描迈倍，而Apriori算法對(duì)于每個(gè)潛在的頻繁項(xiàng)集都會(huì)掃描數(shù)據(jù)集判定給定模式是否頻繁，因此FP-growth算法的速度要比Apriori算法快捣域。

優(yōu)點(diǎn)：一般要快于Apriori啼染。

缺點(diǎn)：實(shí)現(xiàn)比較困難，在某些數(shù)據(jù)集上性能會(huì)下降焕梅。

適用數(shù)據(jù)類(lèi)型：離散型數(shù)據(jù)迹鹅。

這里涉及到另外一個(gè)指標(biāo)：提升度（Lift）

Lift（提升度）：表示“包含 A 的事務(wù)中同時(shí)包含 B 的事務(wù)的比例”與“包含 B 的事務(wù)的比例”的比值。公式表達(dá)：Lift = ( P（A & B）/ P(A) ) / P(B) = P（A & B）/ P(A) / P(B)贞言。

提升度反映了關(guān)聯(lián)規(guī)則中的 A 與 B 的相關(guān)性斜棚，提升度 > 1 且越高表明正相關(guān)性越高，提升度 < 1 且越低表明負(fù)相關(guān)性越高该窗，提升度 = 1 表明沒(méi)有相關(guān)性打肝。

2、FP-growth算法實(shí)現(xiàn)基本過(guò)程如下：

（原文鏈接：https://blog.csdn.net/youhuakongzhi/article/details/87943503）

1）第一次掃描數(shù)據(jù)挪捕，得到所有頻繁一項(xiàng)集的的計(jì)數(shù)粗梭。然后刪除支持度低于閾值的項(xiàng)，將1項(xiàng)頻繁集放入項(xiàng)頭表级零，并按照支持度降序排列断医。

2）第二次掃描數(shù)據(jù)滞乙，將讀到的原始數(shù)據(jù)剔除非頻繁1項(xiàng)集，并按照支持度降序排列鉴嗤。至此斩启，通過(guò)兩次掃描數(shù)據(jù)建立項(xiàng)頭表

3）構(gòu)建FP樹(shù)。讀入排序后的數(shù)據(jù)集醉锅，插入FP樹(shù)兔簇，插入時(shí)按照排序后的順序，插入FP樹(shù)中硬耍，排序靠前的節(jié)點(diǎn)是父節(jié)點(diǎn)垄琐，而靠后的是子節(jié)點(diǎn)。如果有共同的鏈表经柴，則對(duì)應(yīng)的公用父節(jié)點(diǎn)計(jì)數(shù)加1狸窘。插入后，如果有新節(jié)點(diǎn)出現(xiàn)坯认，則項(xiàng)頭表對(duì)應(yīng)的節(jié)點(diǎn)會(huì)通過(guò)節(jié)點(diǎn)鏈表鏈接上新節(jié)點(diǎn)翻擒。直到所有的數(shù)據(jù)都插入到FP樹(shù)后，F(xiàn)P樹(shù)的建立完成牛哺。

4）挖掘頻繁項(xiàng)集陋气。從項(xiàng)頭表的底部項(xiàng)依次向上找到項(xiàng)頭表項(xiàng)對(duì)應(yīng)的條件模式基。從條件模式基遞歸挖掘得到項(xiàng)頭表項(xiàng)的頻繁項(xiàng)集引润，同時(shí)返回頻繁項(xiàng)集對(duì)應(yīng)的節(jié)點(diǎn)計(jì)數(shù)值巩趁。

5）如果不限制頻繁項(xiàng)集的項(xiàng)數(shù)，則返回步驟4所有的頻繁項(xiàng)集椰拒，否則只返回滿足項(xiàng)數(shù)要求的頻繁項(xiàng)集晶渠。

我在這里使用FP-growth算法挖掘頻繁項(xiàng)集，把整個(gè)FP-growth算法封裝到fp_growth包中燃观，

fp_growth.py代碼如下：

 # encoding: utf-8

"""
fp-growth算法是一個(gè)生成頻繁項(xiàng)集的算法褒脯，其主要利用了FP樹(shù)的數(shù)據(jù)結(jié)構(gòu)，
整個(gè)生成過(guò)程只需要遍歷數(shù)據(jù)集2次
"""
from collections import defaultdict, namedtuple
"""
collections模塊中的defaultdict繼承自dict缆毁，namedtuple繼承自tuple
defaultdict會(huì)構(gòu)建一個(gè)類(lèi)似dict的對(duì)象番川，該對(duì)象具有默認(rèn)值
當(dāng)dict不存在的key時(shí)會(huì)報(bào)KeyError錯(cuò)誤，調(diào)用defaultdict時(shí)遇到KeyError錯(cuò)誤會(huì)用默認(rèn)值填充
namedtuple主要用來(lái)產(chǎn)生可以使用名稱(chēng)來(lái)訪問(wèn)元素的數(shù)據(jù)對(duì)象脊框，通常用來(lái)增強(qiáng)代碼的可讀性
"""


def find_frequent_itemsets(transactions, minimum_support, include_support=False):
    """
    挖掘頻繁項(xiàng)集颁督，生成頻繁項(xiàng)集和對(duì)應(yīng)支持度（頻數(shù)）
    """
    items = defaultdict(lambda: 0)  # mapping from items to their supports

    # Load the passed-in transactions and count the support that individual
    # items have.
    for transaction in transactions:
        for item in transaction:
            items[item] += 1

    # Remove infrequent items from the item support dictionary.
    items = dict((item, support) for item, support in items.items()
        if support >= minimum_support)

    # Build our FP-tree. Before any transactions can be added to the tree, they
    # must be stripped of infrequent items and their surviving items must be
    # sorted in decreasing order of frequency.
    def clean_transaction(transaction):
        transaction = filter(lambda v: v in items, transaction)
        transaction_list = list(transaction)   # 為了防止變量在其他部分調(diào)用，這里引入臨時(shí)變量transaction_list
        transaction_list.sort(key=lambda v: items[v], reverse=True)
        return transaction_list

    master = FPTree()
    for transaction in map(clean_transaction, transactions):
        master.add(transaction)

    def find_with_suffix(tree, suffix):
        for item, nodes in tree.items():
            support = sum(n.count for n in nodes)
            if support >= minimum_support and item not in suffix:
                # New winner!
                found_set = [item] + suffix
                yield (found_set, support) if include_support else found_set

                # Build a conditional tree and recursively search for frequent
                # itemsets within it.
                cond_tree = conditional_tree_from_paths(tree.prefix_paths(item))
                for s in find_with_suffix(cond_tree, found_set):
                    yield s # pass along the good news to our caller

    # Search for frequent itemsets, and yield the results we find.
    for itemset in find_with_suffix(master, []):
        yield itemset

class FPTree(object):
    """
    構(gòu)建FP樹(shù)
    所有的項(xiàng)必須作為字典的鍵或集合成員
    """

    Route = namedtuple('Route', 'head tail')

    def __init__(self):
        # The root node of the tree.
        self._root = FPNode(self, None, None)

        # A dictionary mapping items to the head and tail of a path of
        # "neighbors" that will hit every node containing that item.
        self._routes = {}

    @property
    def root(self):
        """The root node of the tree."""
        return self._root

    def add(self, transaction):
        """Add a transaction to the tree."""
        point = self._root

        for item in transaction:
            next_point = point.search(item)
            if next_point:
                # There is already a node in this tree for the current
                # transaction item; reuse it.
                next_point.increment()
            else:
                # Create a new point and add it as a child of the point we're
                # currently looking at.
                next_point = FPNode(self, item)
                point.add(next_point)

                # Update the route of nodes that contain this item to include
                # our new node.
                self._update_route(next_point)

            point = next_point

    def _update_route(self, point):
        """Add the given node to the route through all nodes for its item."""
        assert self is point.tree

        try:
            route = self._routes[point.item]
            route[1].neighbor = point # route[1] is the tail
            self._routes[point.item] = self.Route(route[0], point)
        except KeyError:
            # First node for this item; start a new route.
            self._routes[point.item] = self.Route(point, point)

    def items(self):
        """
        Generate one 2-tuples for each item represented in the tree. The first
        element of the tuple is the item itself, and the second element is a
        generator that will yield the nodes in the tree that belong to the item.
        """
        for item in self._routes.keys():
            yield (item, self.nodes(item))

    def nodes(self, item):
        """
        Generate the sequence of nodes that contain the given item.
        """

        try:
            node = self._routes[item][0]
        except KeyError:
            return

        while node:
            yield node
            node = node.neighbor

    def prefix_paths(self, item):
        """Generate the prefix paths that end with the given item."""

        def collect_path(node):
            path = []
            while node and not node.root:
                path.append(node)
                node = node.parent
            path.reverse()
            return path

        return (collect_path(node) for node in self.nodes(item))

    def inspect(self):
        print('Tree:')
        self.root.inspect(1)

        print
        print('Routes:')
        for item, nodes in self.items():
            print('  %r' % item)
            for node in nodes:
                print('    %r' % node)

def conditional_tree_from_paths(paths):
    """從給定的前綴路徑構(gòu)建一個(gè)條件fp樹(shù)."""
    tree = FPTree()
    condition_item = None
    items = set()

    # Import the nodes in the paths into the new tree. Only the counts of the
    # leaf notes matter; the remaining counts will be reconstructed from the
    # leaf counts.
    for path in paths:
        if condition_item is None:
            condition_item = path[-1].item

        point = tree.root
        for node in path:
            next_point = point.search(node.item)
            if not next_point:
                # Add a new node to the tree.
                items.add(node.item)
                count = node.count if node.item == condition_item else 0
                next_point = FPNode(tree, node.item, count)
                point.add(next_point)
                tree._update_route(next_point)
            point = next_point

    assert condition_item is not None

    # Calculate the counts of the non-leaf nodes.
    for path in tree.prefix_paths(condition_item):
        count = path[-1].count
        for node in reversed(path[:-1]):
            node._count += count

    return tree

class FPNode(object):
    """FP樹(shù)節(jié)點(diǎn)"""

    def __init__(self, tree, item, count=1):
        self._tree = tree
        self._item = item
        self._count = count
        self._parent = None
        self._children = {}
        self._neighbor = None

    def add(self, child):
        """Add the given FPNode `child` as a child of this node."""

        if not isinstance(child, FPNode):
            raise TypeError("Can only add other FPNodes as children")

        if not child.item in self._children:
            self._children[child.item] = child
            child.parent = self

    def search(self, item):
        """
        Check whether this node contains a child node for the given item.
        If so, that node is returned; otherwise, `None` is returned.
        """
        try:
            return self._children[item]
        except KeyError:
            return None

    def __contains__(self, item):
        return item in self._children

    @property
    def tree(self):
        """The tree in which this node appears."""
        return self._tree

    @property
    def item(self):
        """The item contained in this node."""
        return self._item

    @property
    def count(self):
        """The count associated with this node's item."""
        return self._count

    def increment(self):
        """Increment the count associated with this node's item."""
        if self._count is None:
            raise ValueError("Root nodes have no associated count.")
        self._count += 1

    @property
    def root(self):
        """True if this node is the root of a tree; false if otherwise."""
        return self._item is None and self._count is None

    @property
    def leaf(self):
        """True if this node is a leaf in the tree; false if otherwise."""
        return len(self._children) == 0

    @property
    def parent(self):
        """The node's parent"""
        return self._parent

    @parent.setter
    def parent(self, value):
        if value is not None and not isinstance(value, FPNode):
            raise TypeError("A node must have an FPNode as a parent.")
        if value and value.tree is not self.tree:
            raise ValueError("Cannot have a parent from another tree.")
        self._parent = value

    @property
    def neighbor(self):
        """
        The node's neighbor; the one with the same value that is "to the right"
        of it in the tree.
        """
        return self._neighbor

    @neighbor.setter
    def neighbor(self, value):
        if value is not None and not isinstance(value, FPNode):
            raise TypeError("A node must have an FPNode as a neighbor.")
        if value and value.tree is not self.tree:
            raise ValueError("Cannot have a neighbor from another tree.")
        self._neighbor = value

    @property
    def children(self):
        """The nodes that are children of this node."""
        return tuple(self._children.itervalues())

    def inspect(self, depth=0):
        print(('  ' * depth) + repr(self))
        for child in self.children:
            child.inspect(depth + 1)

    def __repr__(self):
        if self.root:
            return "<%s (root)>" % type(self).__name__
        return "<%s %r (%r)>" % (type(self).__name__, self.item, self.count)


if __name__ == '__main__':
    from optparse import OptionParser
    import csv

    p = OptionParser(usage='%prog data_file')
    p.add_option('-s', '--minimum-support', dest='minsup', type='int',
        help='Minimum itemset support (default: 2)')
    p.add_option('-n', '--numeric', dest='numeric', action='store_true',
        help='Convert the values in datasets to numerals (default: false)')
    p.set_defaults(minsup=2)
    p.set_defaults(numeric=False)

    options, args = p.parse_args()
    if len(args) < 1:
        p.error('must provide the path to a CSV file to read')

    transactions = []
    with open(args[0]) as database:
        for row in csv.reader(database):
            if options.numeric:
                transaction = []
                for item in row:
                    transaction.append(long(item))
                transactions.append(transaction)
            else:
                transactions.append(row)

    result = []
    for itemset, support in find_frequent_itemsets(transactions, options.minsup, True):
        result.append((itemset, support))

    result = sorted(result, key=lambda i: i[0])
    for itemset, support in result:
        print(str(itemset) + ' ' + str(support))

6）從頻繁項(xiàng)集中挖掘關(guān)聯(lián)規(guī)則浇雹，篩選滿足最小置信度的關(guān)聯(lián)規(guī)則沉御，計(jì)算支持度和置信度。

在這里我最小支持度（頻數(shù)）設(shè)置為50昭灵，最小置信度設(shè)置為0.3吠裆，代碼如下：

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
import itertools            # itertools用于高效循環(huán)的迭代函數(shù)集合
import sys,time
sys.path.append(r"D:\關(guān)聯(lián)分析\關(guān)聯(lián)分析2") # 調(diào)包失敗時(shí)伐谈，把包所在路徑添加到系統(tǒng)路徑中
import fp_growth as fpg     # 導(dǎo)入fp-growth算法

start =time.time()

def convertData(data):
    """數(shù)據(jù)轉(zhuǎn)換為同一單號(hào)包含多個(gè)商品的列表"""
    # 關(guān)聯(lián)規(guī)則中不考慮多次購(gòu)買(mǎi)同一件物品，刪除重復(fù)數(shù)據(jù)
    data = data.drop_duplicates()
    # 初始化列表
    itemSets = []
    it_list = []
    # 分組聚合试疙，同一用戶(hù)購(gòu)買(mǎi)多種商品的合并為一條數(shù)據(jù)诵棵，只有1件商品的沒(méi)有意義，需要進(jìn)行過(guò)濾
    groups = data.groupby(by=['user_id'])
    for group in groups:
        if len(group[1]) >= 2:
            itemSets.append(group[1]['item_id'].tolist())
    return itemSets

def generate_association_rules(patterns, total, min_confidence):
    """
    生成關(guān)聯(lián)規(guī)則祝旷，計(jì)算支持度履澳、置信度和提升度
    """
    antecedent_list = []
    consequent_list = []
    
    support_list = []
    confidence_list = []
    lift_list = []
    
    count_antecedent = []
    p_antecedent = []
    count_consequent = []
    p_consequent = []
    count_ant_con = []

    for itemset in patterns.keys():
        upper_support = patterns[itemset]   # A & B
 
        for i in range(1, len(itemset)):
            for antecedent in itertools.combinations(itemset, i):   
                """
                itertools.combinations()用于創(chuàng)建一個(gè)迭代器，
                返回iterable中所有長(zhǎng)度為r的子序列怀跛，
                返回的子序列中的項(xiàng)按輸入iterable中的順序排序
                """
                antecedent = tuple(sorted(antecedent))
                consequent = tuple(sorted(set(itemset) - set(antecedent)))
 
                if antecedent in patterns:
                    lower_support = patterns[antecedent]              # A
                    consequent_support = patterns[consequent]         # B
                    p_lower_support = lower_support / total           # P（A）
                    p_consequent_support = consequent_support / total # P（B）
                    support = round(float(upper_support) / total, 6)  # 支持度Support = P（A & B）
                    confidence = float(upper_support) / lower_support # 置信度Confidence = P（A & B）/ P(A)
                    lift = confidence / p_consequent_support          # 提升度Lift = ( P（A & B）/ P(A) ) / P(B) = P（A & B）/ P(A) / P(B)

                    if confidence >= min_confidence:   
                        antecedent_list.append(list(antecedent))
                        consequent_list.append(list(consequent))
                        support_list.append(support)
                        confidence_list.append(confidence)
                        lift_list.append(lift)
                        
                        count_antecedent.append(lower_support)       # count(A)
                        p_antecedent.append(p_lower_support)         # P（A）
                        count_consequent.append(consequent_support)  # count(B)
                        p_consequent.append(p_consequent_support)    # P（B）
                        count_ant_con.append(upper_support)          # count(AB)

    rules_col = {'antecedent':antecedent_list,
                 'consequent':consequent_list,
                 'count_antecedent':count_antecedent,
                 'antecedent_support':p_antecedent,
                 'count_consequent':count_consequent,
                 'consequent_support':p_consequent,
                 'count_ant_con':count_ant_con,
                 'support':support_list,
                 'confidence':confidence_list,
                 'lift':lift_list}

    rules = pd.DataFrame(rules_col)
#    col = ['antecedent','consequent','count_antecedent','antecedent_support',
#           'count_consequent','consequent_support','count_ant_con','support',
#           'confidence','lift']
    col = ['antecedent','consequent','support','confidence','lift']
    rules = rules[col]
    rules.sort_values(by=['support','confidence'],ascending=False,inplace=True)
    return rules



if __name__ == '__main__':
    
    # 導(dǎo)入數(shù)據(jù)
    data = pd.read_csv(r'D:\關(guān)聯(lián)分析\關(guān)聯(lián)分析2\tianchi_mobile_recommend_train_user.zip',encoding='ansi')
    data = data[['user_id','item_id']]

    # 轉(zhuǎn)換數(shù)據(jù)
    dataset = convertData(data)
    total = len(dataset)
    print('總訂單數(shù)：',total)

    '''
    find_frequent_itemsets()調(diào)用函數(shù)生成頻繁項(xiàng)集和頻數(shù)
    minimum_support表示設(shè)置最小支持度（頻數(shù)）距贷，即頻數(shù)大于等于minimum_support，保存此頻繁項(xiàng)敌完，否則刪除
    include_support表示返回結(jié)果是否包含支持度（頻數(shù)）储耐，若include_support=True羊初，返回結(jié)果中包含itemset和support滨溉，否則只返回itemset
    '''
    frequent_itemsets = fpg.find_frequent_itemsets(dataset, minimum_support=50, include_support=True)

    result = []
    for itemset, support in frequent_itemsets:    # 將generator結(jié)果存入list
        result.append((itemset, support))

    result = sorted(result, key=lambda i: i[0])   # 排序后輸出

    item_list = []
    itemset_list = []
    support_list = []
    for itemset, support in result:
#        print(str(itemset) + ' ' + str(support)) #頻繁項(xiàng)集和出現(xiàn)次數(shù)
        item_list.append(itemset)                 # 保存為列表，用于輸出頻繁項(xiàng)集結(jié)果
        
        itemset = tuple(sorted(itemset))          # 先轉(zhuǎn)換為元組长赞，用于后續(xù)生成關(guān)聯(lián)規(guī)則
        itemset_list.append(itemset)
        support_list.append(support)
    
    
    # 構(gòu)建字典
    patterns = dict(zip(itemset_list,support_list))
    print('頻繁項(xiàng)集總數(shù)：',len(patterns))
    
    # 生成關(guān)聯(lián)規(guī)則，計(jì)算支持度、置信度和提升度
    # min_confidence代表最小置信度
    rules = generate_association_rules(patterns,total,min_confidence=0.3)  
    print('關(guān)聯(lián)規(guī)則：\n',rules.head())
    print('結(jié)果總數(shù)：',len(rules))
    

    ## 輸出結(jié)果次企，輸出到同一份excel文件不同的工作表中
    # 輸出頻繁集
    sup = {'item_id':item_list,'frequency':support_list}
    sup = pd.DataFrame(sup)
    sup['support'] = round(sup['frequency'] / float(total), 6)
    sup.sort_values(by=['support'],ascending=False,inplace=True)
    sup_col = ['item_id','frequency','support']
    sup = sup[sup_col]
    
    writer = pd.ExcelWriter(r'D:\關(guān)聯(lián)分析\關(guān)聯(lián)分析2\result\fp-growth-result.xlsx')
    sup.to_excel(excel_writer=writer,sheet_name='support',index=False)
    # 輸出關(guān)聯(lián)規(guī)則
    rules.to_excel(excel_writer=writer,sheet_name='rules',index=False)  
    
    end = time.time()
    print('Running time: %s Seconds'%(end-start))

運(yùn)行結(jié)果部分截圖如下

頻繁集結(jié)果：

3.項(xiàng)頭表結(jié)果.png

關(guān)聯(lián)規(guī)則：

4.關(guān)聯(lián)規(guī)則.png

四虽画、結(jié)論

業(yè)務(wù)結(jié)論：

1、頻繁項(xiàng)集我按支持度進(jìn)行了排序贩据，商品編號(hào)112921337最受歡迎栋操，遠(yuǎn)遠(yuǎn)高于其他商品；

2饱亮、從總體上看矾芙，所有組合商品中支持度數(shù)值偏低，這是由于平臺(tái)銷(xiāo)售的商品種類(lèi)繁多近上；

3剔宪、所有商品組合按支持度從高到低排序，

5.png

商品組合中 [387911330] à [97655171] 和 [97655171] à [387911330] 支持度最高壹无，但是商品組合[387911330] à [97655171]的置信度最高葱绒，表示購(gòu)買(mǎi)商品編號(hào)387911330的用戶(hù)中有44%會(huì)購(gòu)買(mǎi)商品編號(hào)97655171，可以對(duì)這兩種商品進(jìn)行捆綁銷(xiāo)售斗锭；

4地淀、置信度最高的商品組合是

6.png

購(gòu)買(mǎi)商品126902916的用戶(hù)最大可能會(huì)購(gòu)買(mǎi)商品125666923，但出現(xiàn)的概率偏低岖是；

技術(shù)結(jié)論：

1帮毁、 Apriori和FP-Growth這兩個(gè)算法都有個(gè)特點(diǎn)她倘，就是當(dāng)選取支持度很小時(shí)，計(jì)算時(shí)間明顯變長(zhǎng)作箍，性能影響很大硬梁；

2、數(shù)據(jù)量大時(shí)優(yōu)先考慮FP-Growth算法查找頻繁集胞得；

3荧止、用FP-Growth算法尋找頻繁項(xiàng)集，支持度先設(shè)的大一些阶剑，一般可設(shè)項(xiàng)集數(shù)目的1/10跃巡，這里的項(xiàng)集數(shù)目不是指原始表數(shù)據(jù)，而是事務(wù)數(shù)據(jù)集牧愁；

4素邪、根據(jù)業(yè)務(wù)經(jīng)驗(yàn)，選擇恰當(dāng)?shù)闹С侄燃翱尚哦戎戆耄艜?huì)分析出恰當(dāng)?shù)慕Y(jié)論兔朦。尤其是支持度，選大了磨确，會(huì)過(guò)濾掉可能關(guān)鍵或有意義的關(guān)聯(lián)規(guī)則輸出沽甥；選小了，會(huì)產(chǎn)生太多頻繁項(xiàng)集及FP條件樹(shù)乏奥，干擾分析結(jié)果摆舟；

5、如果有數(shù)據(jù)聚合處理需求邓了，應(yīng)盡量減少for循環(huán)使用恨诱，尤其是嵌套兩層以上的情形，這對(duì)性能會(huì)是一個(gè)災(zāi)難骗炉。有條件的情況下照宝，可使用DataFrame包的groupby函數(shù)、pivot_table函數(shù)痕鳍，以及pandas包的merge函數(shù)等硫豆，能獲得極大的性能提升。

參考鏈接：https://blog.csdn.net/youhuakongzhi/article/details/87943503

https://zhuanlan.zhihu.com/p/37253762

最后編輯于：2019.08.11 16:48:54

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末笼呆，一起剝皮案震驚了整個(gè)濱河市熊响，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌诗赌，老刑警劉巖汗茄，帶你破解...
沈念sama閱讀 207,113評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異铭若，居然都是意外死亡洪碳，警方通過(guò)查閱死者的電腦和手機(jī)递览，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,644評(píng)論 2贊 381
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門(mén)，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)瞳腌，“玉大人绞铃，你說(shuō)我怎么就攤上這事∩┦蹋” “怎么了儿捧？”我有些...
開(kāi)封第一講書(shū)人閱讀 153,340評(píng)論 0贊 344
道士緝兇錄：失蹤的賣(mài)姜人
文/不壞的土叔我叫張陵，是天一觀的道長(zhǎng)挑宠。經(jīng)常有香客問(wèn)我菲盾，道長(zhǎng)，這世上最難降的妖魔是什么各淀？我笑而不...
開(kāi)封第一講書(shū)人閱讀 55,449評(píng)論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任懒鉴，我火速辦了婚禮，結(jié)果婚禮上碎浇，老公的妹妹穿的比我還像新娘临谱。我一直安慰自己，他們只是感情好南捂，可當(dāng)我...
茶點(diǎn)故事閱讀 64,445評(píng)論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布吴裤。她就那樣靜靜地躺著旧找，像睡著了一般溺健。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上钮蛛，一...
開(kāi)封第一講書(shū)人閱讀 49,166評(píng)論 1贊 284
城市分裂傳說(shuō)
那天鞭缭，我揣著相機(jī)與錄音，去河邊找鬼魏颓。笑死岭辣，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的甸饱。我是一名探鬼主播沦童，決...
沈念sama閱讀 38,442評(píng)論 3贊 401
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼叹话！你這毒婦竟也來(lái)了偷遗？” 一聲冷哼從身側(cè)響起，我...
開(kāi)封第一講書(shū)人閱讀 37,105評(píng)論 0贊 261
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤驼壶，失蹤者是張志新（化名）和其女友劉穎氏豌，沒(méi)想到半個(gè)月后，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體热凹，經(jīng)...
沈念sama閱讀 43,601評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡泵喘，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,066評(píng)論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年泪电，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片纪铺。...
茶點(diǎn)故事閱讀 38,161評(píng)論 1贊 334
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡相速，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出鲜锚，到底是詐尸還是另有隱情和蚪，我是刑警寧澤，帶...
沈念sama閱讀 33,792評(píng)論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布烹棉，位于F島的核電站攒霹，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏浆洗。R本人自食惡果不足惜催束，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,351評(píng)論 3贊 307
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望伏社。院中可真熱鬧抠刺，春花似錦、人聲如沸摘昌。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 30,352評(píng)論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)聪黎。三九已至罕容，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間稿饰，已是汗流浹背锦秒。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 31,584評(píng)論 1贊 261
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留喉镰，地道東北人旅择。一個(gè)月前我還...
沈念sama閱讀 45,618評(píng)論 2贊 355
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像侣姆，于是被迫代替她去往敵國(guó)和親生真。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,916評(píng)論 2贊 344

177、基于Python的Apriori和FP-growth關(guān)聯(lián)分析算法分析淘寶用戶(hù)購(gòu)物關(guān)聯(lián)度

177转绷、基于Python的Apriori和FP-growth關(guān)聯(lián)分析算法分析淘寶用戶(hù)購(gòu)物關(guān)聯(lián)度

推薦閱讀更多精彩內(nèi)容