AB Test
AB test完整流程
1.建立實(shí)驗(yàn)?zāi)繕?biāo)
2.提出實(shí)驗(yàn)假設(shè)
3.實(shí)驗(yàn)方案設(shè)計(jì)
4.開(kāi)發(fā)需求文檔
5.開(kāi)始實(shí)驗(yàn)
6.采集分析評(píng)估數(shù)據(jù)
7.發(fā)布產(chǎn)品
實(shí)驗(yàn)設(shè)計(jì)流程
確定監(jiān)測(cè)指標(biāo)(核心指標(biāo)(本次實(shí)驗(yàn)需要提升的指標(biāo)))
確定實(shí)驗(yàn)受眾(所選擇的用戶群體)
確定樣本量
根據(jù)預(yù)估的功能體校數(shù)值和顯著性水平得出確定實(shí)驗(yàn)時(shí)長(zhǎng)
時(shí)長(zhǎng)一般不超過(guò)兩周
開(kāi)發(fā)需求文檔
- 實(shí)驗(yàn)?zāi)康?/li>
- 實(shí)驗(yàn)假設(shè)
- 實(shí)驗(yàn)方案
- 流量配置
3.進(jìn)行假設(shè)檢驗(yàn)并判斷實(shí)驗(yàn)結(jié)果
- 假設(shè)檢驗(yàn)(假設(shè)檢驗(yàn)前進(jìn)行數(shù)據(jù)清洗)
原假設(shè)H0與備擇假設(shè)H1
雙側(cè)檢驗(yàn)還是單側(cè)檢驗(yàn)
根據(jù)樣本量(N>30?)確定是T檢驗(yàn)還是Z檢驗(yàn)
計(jì)算統(tǒng)計(jì)量T值或Z值從而得出P值
p值:實(shí)驗(yàn)組與對(duì)照組之間的差異是否顯著
p值<0.05院促,實(shí)驗(yàn)帶來(lái)的差異是顯著的
-注意點(diǎn)
-實(shí)驗(yàn)組與對(duì)照組的用戶分布要一致
-2個(gè)實(shí)驗(yàn)改動(dòng)的變量相互獨(dú)立
-排除實(shí)驗(yàn)結(jié)果由偶然因素引發(fā)的(兩類錯(cuò)誤)
python實(shí)現(xiàn)
import numpy as np
import pandas as pd
ba_sales_data=pd.read_csv("E:/miki/coupang/ba_sales_data.csv")
ba_sales_data
ba_sales_data.info()
ba_sales_data.describe()
#刪除重復(fù)值
df_noDup=ba_sales_data.drop_duplicates()
df_noDup
df_noDup.describe()
#sale_price<0的值需要去除
df_noDup_del=df_noDup[df_noDup['sale_price']>0]
df_noDup_del
#將age字段中為999的替換為空
df_noDup_del['age']=df_noDup_del['age'].replace(999,np.nan)
df_noDup_del.info()
#將清洗好的數(shù)據(jù)存入excel中
writer = pd.ExcelWriter('E:/miki/coupang/df_noDup_del.xlsx')
data.to_excel(writer, 'df_noDup_del', float_format='%.5f')
writer.save()
writer.close()
#匯總每個(gè)用戶的總消費(fèi)金額
#計(jì)算實(shí)驗(yàn)組和對(duì)照組的平均消費(fèi)金額
import pymysql
#打開(kāi)數(shù)據(jù)庫(kù)連接
cursor=conn.cursor()
conn = pymysql.connect(host='localhost',user = "root",passwd = "root",db = "miki")
sql="select user_id,test_option,sum(sale_price) as sp from (select distinct * from df_noDup_del) a group by user_id,test_option order by test_option"
cursor.execute(sql,args=None)
name_list=[]
myresult = cursor.fetchall()
result=list(myresult)
#將sql數(shù)據(jù)結(jié)果轉(zhuǎn)換為dateframe
from pandas.core.frame import DataFrame
data=DataFrame(result)
data.rename(columns={0:'uid',1:'type',2:'sale_price'},inplace=True)
data
#計(jì)算檢驗(yàn)統(tǒng)計(jì)量
from scipy import stats
sales0_mean=data[data.type=='0'].sale_price.mean()
sales0_std=data[data.type=='0'].sale_price.std(ddof = 1)
sales1_mean=data[data.type=='1'].sale_price.mean()
sales1_std = data[data.type=='1'].sale_price.std(ddof = 1)
z = (sales0_mean - sales1_mean) / np.sqrt(sales0_std ** 2 / len(data[data.type=='0'].sale_price) + sales1_std ** 2 /len(data[data.type=='1'].sale_price))
p = 2*stats.norm.sf(abs(z))
s = np.sqrt(((len(data[data.type=='0'].sale_price) - 1)* sales0_std**2 + (len(data[data.type=='1'].sale_price) - 1)* sales1_std**2 ) / (len(data[data.type=='0'].sale_price) + len(data[data.type=='1'].sale_price) - 2))
# 效應(yīng)量Cohen's d
d = abs((sales0_mean - sales1_mean)) / s
sales0_mean,sales1_mean,z,p,d
#計(jì)算樣本量
from statsmodels.stats.power import NormalIndPower
import math
effect_size=d
ztest=NormalIndPower()
num=ztest.solve_power(
effect_size=effect_size,
alpha=0.05,
power=0.8,
ratio=1,
alternative='two-sided'
)
print(num)
```