最近給導(dǎo)師報(bào)告處理WISE數(shù)據(jù)的進(jìn)展時(shí)滩援,對(duì)于處理WISE光變導(dǎo)師希望我對(duì)每一個(gè)觀測(cè)區(qū)間的星等數(shù)值給中值的置信度。
計(jì)算數(shù)據(jù)的中值95%的置信區(qū)間
假設(shè)有一組數(shù)據(jù)如下:
data = [-0.1, -2.4, -0.1, -0.7, -1.4, -0.9, -3.2, -0.2, -0.3, -0.6, -3.2, -5.5]
求中值的置信區(qū)間與求數(shù)據(jù)的均值置信區(qū)間的方法是類似的蛮粮,在這里我不會(huì)討論詳細(xì)的數(shù)學(xué)原理,而是直接給出置信度為95%時(shí),對(duì)應(yīng)的數(shù)據(jù)的上限值與下限值名秀。(有關(guān)python實(shí)現(xiàn)的重點(diǎn)在于代碼)
下限:0.5n-0.98
上限:1+0.5n+0.98
上面公式給出的是理論值,具體應(yīng)用到數(shù)據(jù)上時(shí)藕溅,要對(duì)得到的下限和上限值取整匕得,下限值向上取整,上限值向下取整巾表。
一般情況:
Lower lim = -
*
Upper lim = 1 + +
*
注:n是數(shù)據(jù)個(gè)數(shù),一般要求數(shù)據(jù)點(diǎn)個(gè)數(shù)n>=6汁掠。當(dāng)n<6時(shí)是沒有中值的置信區(qū)間。
計(jì)算上述data的中值95%置信區(qū)間
首先要將原數(shù)據(jù)從小到大排列:
sorted(data)
data1 = [-5.5,-3.2,-3.2,-2.4,-1.4,-0.9,-0.7,-0.6,-0.3,-0.2,-0.1,0.1]
下限值:0.5* 12 - 0.98* = 2.6
上限值:1+0.5* 12 + 0.98* = 10.4
則95%置信區(qū)間對(duì)應(yīng)的數(shù)值是第3個(gè)數(shù)據(jù)和第10個(gè)數(shù)據(jù)集币,即(-3.2考阱,-0.2)
python實(shí)現(xiàn)
#求中值median的置信區(qū)間(confidence interval),95%的CI
#對(duì)于中值的置信區(qū)間CI鞠苟,下限lower limit向上取整乞榨,upper limit向下取整
#要注意python中是從0開始計(jì)數(shù)的,根據(jù)上述就很好理解return語(yǔ)句的含義了当娱。
#其實(shí)可以吧math.ceil() - 1 用math.floor()代替
import math
import numpy as np
import matplotlib.pyplot as plt
#自定義的median_ci函數(shù)是給出某一數(shù)據(jù)95%置信區(qū)間的上限和下限對(duì)應(yīng)的值
def median_ci(data,confidence=0.95):
data1 = sorted(data)
n = len(data1)
ll = 0.5*n - 0.98*math.sqrt(n)
ul = 1 + 0.5*n + 0.98*math.sqrt(n)
l = data1[math.ceil(ll)-1]
u = data1[math.floor(ul) - 1]
return (l,u)
#在自定義的函數(shù)里面已經(jīng)將數(shù)據(jù)從小到大排序了吃既,所以調(diào)用函數(shù)時(shí)用的是數(shù)據(jù)data
l,u = median_ci(data)
print(l,u)
-3.2,-0.2
參考:Confidence Intervals for a Median,Documentation for
Confidence Interval of median or percentiles for a given sample size.