溫莎日記 22

Simulation studies in statistics:

A.?What is a simulation study, and why do one?

B.?Simulations for properties of estimators.

C.?Simulations for properties of hypothesis tests.

Simulation:?A numerical technique for conducting experiments on the computer.?It involves random sampling from probability distributions.

Rationale:?Properties of statistical methods must be established so that the methods may be used with confidence.?Exact analytical derivations of properties are rarely possible.?Large sample case to properties are often possible.

- But what happens in the finite sample size case?

- And what happens when assumptions are violated?

Usual issues - Q&A :?

Is an estimator biased in finite samples??

How does it compare to competing estimators on the basis of bias, precision, etc.?

Does a procedure for constructing a confidence interval for a parameter achieve the advertised nominal level of coverage?

Does a hypothesis testing procedure attain the advertised level of significance or size?

Do different test procedures deliver different power?

\rightarrow ?Monte Carlo simulation approximation:?An estimator or test statistic has a true sampling distribution under a particular set of conditions (finite sample size, true distribution of the data).? Ideally, we could want to know this true sampling distribution in order to address the issues on the previous slide.?But derivation of true sampling distribution is not tractable.?Hence we approximate the sampling distribution of an estimator or a test statistic under a particular set of conditions.

\rightarrow ?A typical Monte Carlo simulation involves the following:?Generate S independent data sets under the conditions of interest;?Compute the numerical value of the test statistic T?from each data set. Hence we have an array of T series in the items. If S is large enough, summary statistics across many T?should be good approximations to the true properties of the estimator under the conditions of interest.

1. A simple example:?

Compare three estimators for the mean μ of a distribution based on a random sample?Y_{1:n}.?The three estimators are:?Sample Mean (T^{(1)} ) ;?Sample Median (T^{(2)} ) ;?Sample 10% trimmed mean (T^{(3)} ).

2. Simulation procedures:

For a particular choice of μ, n and true underlying distribution:?Generate independent draws?Y_{1:n} from the distribution ;?Compute?T^{(1)} ,?T^{(2)} ,??T^{(3)} ?;?Repeat S times. Hence we have,?T_{1:s}^{(1)} ,?T_{1:s}^{(2)} ,?T_{1:s}^{(3)} .?Compute for k = 1, 2, 3, , and then we have the important value in observation:?

\hat{mean} =\frac{1}{S}  \sum_{s=1}^S  T_{s}^{(k)} ?;?\hat{bias} = \hat{mean} - \mu  ?;?\hat{SD}  = \sqrt{\frac{1}{S-1} \sum_{s=1}^S {(T_{s}^{(k)} -\hat{mean} )}^2  } ;

\hat{MSE} =\frac{1}{S}  \sum_{s=1}^S  {(\hat{mean} -\mu )}^2 \approx \hat{SD} ^2 + \hat{bias} ^2.

3. A simulation study:?R code

To compare 3 estimators for location μ through a simulation study.?The 3 estimators are:?Sample Mean ;??Sample Median ;?Sample 10% trimmed mean.?Underlying distribution: N(0,1).?Sample size: 15.?Simulation size: 1000?

> # To compare 3 estimators for location,mu, through simulation study

> # The 3 estimators are sample mean, median, and 10% trimmed mean.

> S=1000 # Simulation size (No.of samples)

> n=15 # Sample size

> mu=1 # Mean of the underlying normal distribution

> sd=1 # Standard deviation of the underlying normal distribution

> meax=numeric(S) # A vector contains all the sample means

> medx=numeric(S) # A vector contains all the sample medians

> trmx=numeric(S) # A vector contains all the sample 10% trimmed means

> stdx=numeric(S) # A vector contains all the sample standard deviation

> set.seeds(1234) # Set the seed number

> for (i in 1:S){

+ x=rnorm(n,mu,sd) # Generate a random sample of size n

+ mean[i]=mean(x) # Compute the mean for the i-th

sample,i.e.T∧(1) i

+ medx[i]=median(x)# Compute the median for the i-th

sample,i.e.T∧(2) i

+ trmx[i]=mean(x,trim=0.1)

+# Compute the 10% trimmed mean for the i-th sample, i.e.T∧(3) i

+ stdx[i]=sd(x)# Compute the standard deviation for the i-th sample

}

> # Compute the mean of “S” sample means, medians, and trimmed means

> simumean=apply(cbind(meax,medx,trmx),2,mean)

> # “2” in the second argument asks the computer to fine “means” for

all the columns in the matrix given in argument 1.

> # Hence “simumean” is a 3×1 vector consists of the average of the “S” means, medians, and the 10% trimmed means.

> # Compute the standard deviation of the “S” sample means, medians, and 10% trimmed means

> simustd=apply(cbind(meax,medx,trmx),2,sd)

> # Compute the bias

> simubias=simumean-rep(mu,3)

> # Compute the MSE (Mean Square Error)

> simumse=simubias∧2+simustd∧2

> ests=c(“Sample mean”,“Sample median”,“Sample 10% trimmed mean”) # column heading in the output

> names=c(“True value”,“No. of simu”,“”MC Mean,“MC Std Deviation”,“MC Bias”,“MC MSE”) # row heading in the output

> sumdat=rbind(rep(mu,3),rep(S,3),simumean,simustd,simbias,simumse)

> dimnames(sumdat)=list(names,ests)

> round(sumdat,4)

statistical computing?

4. Checking coverage probability of confidence interval: Usual 95% confidence interval for μ,

Does the interval achieve the nominal level of coverage 95%?

> #Check the coverage probability of confidence interval

> #We make use of the data obtained from the above simulation study

> t05=qt(0.975,n-1) #Get t {0.025,14}

>coverage=sum((meax-t05*stdx/sqrt(n)<=mu)&(meax+t05*stdx/sqrt(n)>=mu))/S

> #The above statement is equivalent to

> # d=0

> # for(i in 1:S){

> #d=d+meax[i]-t05*stdx[i]/sqrt(n)<=mu)&(meax[i]+t05*stdx[i]/sqrt(n)>=mu))}

> #coverage=d/S

>coverage

[1]0.945


Example: Consider the size and power of the usual t-test for the mean

(i)?Test?H_0: \mu = \mu _0?against?H_a: \mu \neq  \mu _0?

(ii)?To evaluate whether level of test achieves advertised α :?Generate data under H_0: \mu = \mu _0 and calculate the proportion of rejections of?H_0.?

(iii)?Approximation the true probability of rejecting?H_0?when it is true: \propto should be approximately equal to α.

(iv)?To evaluate power:?Generate data under some alternative?\mu \neq  \mu _0?and calculate the?\propto ?of rejections of?H_0.?

(v) Approximate the true probability of rejectingH_0when the alternative is true.

> #Check the size of t-test

>ttests=(meax-mu0)/(stdx/sqrt(n))

>size=sum(abs(ttests)>t05)/S

>size

[1]0.045

> #Check the power of t-test when d=1*sd

> Assume the samples generated from N(μ1, σ) where μ1 = 1, σ = 1.

> Test H0 : μ = μ0 vs H1 : μ = μ1 = μ0 + σ where μ0 = 0.

>ttests=(meax-mu0)/(stdx/sqrt(n))

>power=sum(abs(ttests)>t05)/S

>power

[1]0.954


5.?Simulation study in SAS

Simulation to study the size of t-test

*Generate “ns” normal random samples;

* “mcrep” (M.C. Replications) is the index for the s-th random sample;

data simu;

seed=123;

ns=1000;n=25;mu=0;sigma=1;

do mcrep=1 to ns;

do i=1 to n;

x=rannor(seed);

output; end;

keep mcrep x;

end;

run;

proc sort data=simu;

by mcrep;

run;

*Perform the t-test for each sample.

Output the p-value “probt” in the SAS dataset “outtset”;

proc univariate data=simu noprint mu0=0;

by mcrep;

var x;

output out=outtest probt=p;

run;

* Count how many samples have “probt”<0.05;

data outtest;

set outtest;

reject=(p<0.05);

run;

* “reject” is the number of samples with “probt”<0.05. Hence the

mean of “reject” is the rejection rate,“rejrate”;

proc means data=outtest noprint;

var reject;

output out=results mean=rejrate;

proc print data=results;

var

freq

rejrate;

run;

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子振坚,更是在濱河造成了極大的恐慌,老刑警劉巖跪削,帶你破解...
    沈念sama閱讀 219,589評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件谴仙,死亡現(xiàn)場離奇詭異,居然都是意外死亡碾盐,警方通過查閱死者的電腦和手機(jī)晃跺,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,615評論 3 396
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來毫玖,“玉大人掀虎,你說我怎么就攤上這事「斗悖” “怎么了烹玉?”我有些...
    開封第一講書人閱讀 165,933評論 0 356
  • 文/不壞的土叔 我叫張陵,是天一觀的道長阐滩。 經(jīng)常有香客問我二打,道長,這世上最難降的妖魔是什么叶眉? 我笑而不...
    開封第一講書人閱讀 58,976評論 1 295
  • 正文 為了忘掉前任,我火速辦了婚禮芹枷,結(jié)果婚禮上衅疙,老公的妹妹穿的比我還像新娘。我一直安慰自己鸳慈,他們只是感情好饱溢,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,999評論 6 393
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著走芋,像睡著了一般绩郎。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上翁逞,一...
    開封第一講書人閱讀 51,775評論 1 307
  • 那天肋杖,我揣著相機(jī)與錄音,去河邊找鬼挖函。 笑死状植,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的怨喘。 我是一名探鬼主播津畸,決...
    沈念sama閱讀 40,474評論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼必怜!你這毒婦竟也來了肉拓?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,359評論 0 276
  • 序言:老撾萬榮一對情侶失蹤梳庆,失蹤者是張志新(化名)和其女友劉穎暖途,沒想到半個(gè)月后卑惜,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,854評論 1 317
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡丧肴,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,007評論 3 338
  • 正文 我和宋清朗相戀三年残揉,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片芋浮。...
    茶點(diǎn)故事閱讀 40,146評論 1 351
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡抱环,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出纸巷,到底是詐尸還是另有隱情镇草,我是刑警寧澤,帶...
    沈念sama閱讀 35,826評論 5 346
  • 正文 年R本政府宣布瘤旨,位于F島的核電站梯啤,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏存哲。R本人自食惡果不足惜因宇,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,484評論 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望祟偷。 院中可真熱鬧察滑,春花似錦、人聲如沸修肠。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,029評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽嵌施。三九已至饲化,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間吗伤,已是汗流浹背吃靠。 一陣腳步聲響...
    開封第一講書人閱讀 33,153評論 1 272
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留足淆,地道東北人撩笆。 一個(gè)月前我還...
    沈念sama閱讀 48,420評論 3 373
  • 正文 我出身青樓,卻偏偏與公主長得像缸浦,于是被迫代替她去往敵國和親夕冲。 傳聞我的和親對象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,107評論 2 356

推薦閱讀更多精彩內(nèi)容

  • 《閉上眼睛才能看清楚自己》這本書是香海禪寺主持賢宗法師的人生體悟裂逐,修行心得及講學(xué)錄歹鱼,此書從六個(gè)章節(jié)講述了禪修是什么...
    宜均閱讀 10,028評論 1 25
  • 前言 Google Play應(yīng)用市場對于應(yīng)用的targetSdkVersion有了更為嚴(yán)格的要求。從 2018 年...
    申國駿閱讀 64,234評論 14 98
  • 本文轉(zhuǎn)載自微信公眾號“電子搬磚師”卜高,原文鏈接 這篇文章會以特別形象通俗的方式講講什么是PID弥姻。 很多人看到網(wǎng)上寫的...
    這個(gè)飛宏不太冷閱讀 6,846評論 2 15
  • 《來庭敦,我們說說孤獨(dú)》 1·他們都在寫孤獨(dú) 一個(gè)詩人 如果 不說說 內(nèi)心的孤獨(dú) 不將孤獨(dú) 寫進(jìn)詩里 是不是很掉價(jià)呢 ...
    聽太陽升起閱讀 4,376評論 1 7
  • 自幼貧民窟長大的女子疼进,僥幸多念了兩本書,枉以為可以與人平起平坐秧廉∩」悖可是人生從來都是接力賽,我們卻天真的當(dāng)成了百米沖刺...
    Leeanran閱讀 5,772評論 1 5