溫莎日記 22

Simulation studies in statistics：

A.?What is a simulation study, and why do one?

B.?Simulations for properties of estimators.

C.?Simulations for properties of hypothesis tests.

Simulation:?A numerical technique for conducting experiments on the computer.?It involves random sampling from probability distributions.

Rationale:?Properties of statistical methods must be established so that the methods may be used with confidence.?Exact analytical derivations of properties are rarely possible.?Large sample case to properties are often possible.

- But what happens in the finite sample size case?

- And what happens when assumptions are violated?

Usual issues - Q&A :?

Is an estimator biased in finite samples??

How does it compare to competing estimators on the basis of bias, precision, etc.?

Does a procedure for constructing a confidence interval for a parameter achieve the advertised nominal level of coverage?

Does a hypothesis testing procedure attain the advertised level of significance or size?

Do different test procedures deliver different power?

$\rightarrow$ ?Monte Carlo simulation approximation:?An estimator or test statistic has a true sampling distribution under a particular set of conditions (finite sample size, true distribution of the data).? Ideally, we could want to know this true sampling distribution in order to address the issues on the previous slide.?But derivation of true sampling distribution is not tractable.?Hence we approximate the sampling distribution of an estimator or a test statistic under a particular set of conditions.

$\rightarrow$ ?A typical Monte Carlo simulation involves the following:?Generate S independent data sets under the conditions of interest;?Compute the numerical value of the test statistic T?from each data set. Hence we have an array of T series in the items. If S is large enough, summary statistics across many T?should be good approximations to the true properties of the estimator under the conditions of interest.

1. A simple example:?

Compare three estimators for the mean μ of a distribution based on a random sample? $Y_{1:n}$ .?The three estimators are:?Sample Mean ( $T^{(1)}$ ) ;?Sample Median ( $T^{(2)}$ ) ;?Sample 10% trimmed mean ( $T^{(3)}$ ).

2. Simulation procedures:

For a particular choice of μ, n and true underlying distribution:?Generate independent draws? $Y_{1:n}$ from the distribution ;?Compute? $T^{(1)}$ ,? $T^{(2)}$ ,?? $T^{(3)}$ ?;?Repeat S times. Hence we have,? $T_{1:s}^{(1)}$ ,? $T_{1:s}^{(2)}$ ,? $T_{1:s}^{(3)}$ .?Compute for k = 1, 2, 3, , and then we have the important value in observation:?

$\hat{mean} =\frac{1}{S} \sum_{s=1}^S T_{s}^{(k)}$ ?;? $\hat{bias} = \hat{mean} - \mu$ ?;? $\hat{SD} = \sqrt{\frac{1}{S-1} \sum_{s=1}^S {(T_{s}^{(k)} -\hat{mean} )}^2 }$ ;

$\hat{MSE} =\frac{1}{S} \sum_{s=1}^S {(\hat{mean} -\mu )}^2 \approx \hat{SD} ^2 + \hat{bias} ^2$ .

3. A simulation study:?R code

To compare 3 estimators for location μ through a simulation study.?The 3 estimators are:?Sample Mean ;??Sample Median ;?Sample 10% trimmed mean.?Underlying distribution: N(0,1).?Sample size: 15.?Simulation size: 1000?

> # To compare 3 estimators for location,mu, through simulation study

> # The 3 estimators are sample mean, median, and 10% trimmed mean.

> S=1000 # Simulation size (No.of samples)

> n=15 # Sample size

> mu=1 # Mean of the underlying normal distribution

> sd=1 # Standard deviation of the underlying normal distribution

> meax=numeric(S) # A vector contains all the sample means

> medx=numeric(S) # A vector contains all the sample medians

> trmx=numeric(S) # A vector contains all the sample 10% trimmed means

> stdx=numeric(S) # A vector contains all the sample standard deviation

> set.seeds(1234) # Set the seed number

> for (i in 1:S){

+ x=rnorm(n,mu,sd) # Generate a random sample of size n

+ mean[i]=mean(x) # Compute the mean for the i-th

sample,i.e.T∧(1) i

+ medx[i]=median(x)# Compute the median for the i-th

sample,i.e.T∧(2) i

+ trmx[i]=mean(x,trim=0.1)

+# Compute the 10% trimmed mean for the i-th sample, i.e.T∧(3) i

+ stdx[i]=sd(x)# Compute the standard deviation for the i-th sample

}

> # Compute the mean of “S” sample means, medians, and trimmed means

> simumean=apply(cbind(meax,medx,trmx),2,mean)

> # “2” in the second argument asks the computer to fine “means” for

all the columns in the matrix given in argument 1.

> # Hence “simumean” is a 3×1 vector consists of the average of the “S” means, medians, and the 10% trimmed means.

> # Compute the standard deviation of the “S” sample means, medians, and 10% trimmed means

> simustd=apply(cbind(meax,medx,trmx),2,sd)

> # Compute the bias

> simubias=simumean-rep(mu,3)

> # Compute the MSE (Mean Square Error)

> simumse=simubias∧2+simustd∧2

> ests=c(“Sample mean”,“Sample median”,“Sample 10% trimmed mean”) # column heading in the output

> names=c(“True value”,“No. of simu”,“”MC Mean,“MC Std Deviation”,“MC Bias”,“MC MSE”) # row heading in the output

> sumdat=rbind(rep(mu,3),rep(S,3),simumean,simustd,simbias,simumse)

> dimnames(sumdat)=list(names,ests)

> round(sumdat,4)

statistical computing?

4. Checking coverage probability of confidence interval: Usual 95% confidence interval for μ,

Does the interval achieve the nominal level of coverage 95%?

> #Check the coverage probability of confidence interval

> #We make use of the data obtained from the above simulation study

> t05=qt(0.975,n-1) #Get t {0.025,14}

>coverage=sum((meax-t05*stdx/sqrt(n)<=mu)&(meax+t05*stdx/sqrt(n)>=mu))/S

> #The above statement is equivalent to

> # d=0

> # for(i in 1:S){

> #d=d+meax[i]-t05*stdx[i]/sqrt(n)<=mu)&(meax[i]+t05*stdx[i]/sqrt(n)>=mu))}

> #coverage=d/S

>coverage

[1]0.945

Example: Consider the size and power of the usual t-test for the mean

(i)?Test? $H_0: \mu = \mu _0$ ?against? $H_a: \mu \neq \mu _0$ ?

(ii)?To evaluate whether level of test achieves advertised α :?Generate data under $H_0: \mu = \mu _0$ and calculate the proportion of rejections of? $H_0$ .?

(iii)?Approximation the true probability of rejecting? $H_0$ ?when it is true: $\propto$ should be approximately equal to α.

(iv)?To evaluate power:?Generate data under some alternative? $\mu \neq \mu _0$ ?and calculate the? $\propto$ ?of rejections of? $H_0$ .?

(v) Approximate the true probability of rejecting $H_0$ when the alternative is true.

> #Check the size of t-test

>ttests=(meax-mu0)/(stdx/sqrt(n))

>size=sum(abs(ttests)>t05)/S

>size

[1]0.045

> #Check the power of t-test when d=1*sd

> Assume the samples generated from N(μ1, σ) where μ1 = 1, σ = 1.

> Test H0 : μ = μ0 vs H1 : μ = μ1 = μ0 + σ where μ0 = 0.

>ttests=(meax-mu0)/(stdx/sqrt(n))

>power=sum(abs(ttests)>t05)/S

>power

[1]0.954

5.?Simulation study in SAS

Simulation to study the size of t-test

*Generate “ns” normal random samples;

* “mcrep” (M.C. Replications) is the index for the s-th random sample;

data simu;

seed=123;

ns=1000;n=25;mu=0;sigma=1;

do mcrep=1 to ns;

do i=1 to n;

x=rannor(seed);

output; end;

keep mcrep x;

end;

run;

proc sort data=simu;

by mcrep;

run;

*Perform the t-test for each sample.

Output the p-value “probt” in the SAS dataset “outtset”;

proc univariate data=simu noprint mu0=0;

by mcrep;

var x;

output out=outtest probt=p;

run;

* Count how many samples have “probt”<0.05;

data outtest;

set outtest;

reject=(p<0.05);

run;

* “reject” is the number of samples with “probt”<0.05. Hence the

mean of “reject” is the rejection rate,“rejrate”;

proc means data=outtest noprint;

var reject;

output out=results mean=rejrate;

proc print data=results;

var

freq

rejrate;

run;

最后編輯于：2022.09.01 20:40:16

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子振坚，更是在濱河造成了極大的恐慌，老刑警劉巖跪削，帶你破解...
沈念sama閱讀 219,589評論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件谴仙，死亡現(xiàn)場離奇詭異，居然都是意外死亡碾盐，警方通過查閱死者的電腦和手機(jī)晃跺，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,615評論 3贊 396
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來毫玖，“玉大人掀虎，你說我怎么就攤上這事「斗悖” “怎么了烹玉？”我有些...
開封第一講書人閱讀 165,933評論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長阐滩。經(jīng)常有香客問我二打，道長，這世上最難降的妖魔是什么叶眉？我笑而不...
開封第一講書人閱讀 58,976評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮芹枷，結(jié)果婚禮上衅疙，老公的妹妹穿的比我還像新娘。我一直安慰自己鸳慈，他們只是感情好饱溢，可當(dāng)我...
茶點(diǎn)故事閱讀 67,999評論 6贊 393
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著走芋，像睡著了一般绩郎。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上翁逞，一...
開封第一講書人閱讀 51,775評論 1贊 307
城市分裂傳說
那天肋杖，我揣著相機(jī)與錄音，去河邊找鬼挖函。笑死状植，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的怨喘。我是一名探鬼主播津畸，決...
沈念sama閱讀 40,474評論 3贊 420
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼必怜！你這毒婦竟也來了肉拓？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 39,359評論 0贊 276
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤梳庆，失蹤者是張志新（化名）和其女友劉穎暖途，沒想到半個(gè)月后卑惜，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,854評論 1贊 317
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡丧肴，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 38,007評論 3贊 338
?白月光啟示錄
正文我和宋清朗相戀三年残揉，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片芋浮。...
茶點(diǎn)故事閱讀 40,146評論 1贊 351
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡抱环，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出纸巷，到底是詐尸還是另有隱情镇草，我是刑警寧澤，帶...
沈念sama閱讀 35,826評論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布瘤旨，位于F島的核電站梯啤，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏存哲。R本人自食惡果不足惜因宇，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,484評論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望祟偷。院中可真熱鬧察滑，春花似錦、人聲如沸修肠。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,029評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽嵌施。三九已至饲化，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間吗伤，已是汗流浹背吃靠。一陣腳步聲響...
開封第一講書人閱讀 33,153評論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留足淆，地道東北人撩笆。一個(gè)月前我還...
沈念sama閱讀 48,420評論 3贊 373
代替公主和親
正文我出身青樓，卻偏偏與公主長得像缸浦，于是被迫代替她去往敵國和親夕冲。傳聞我的和親對象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,107評論 2贊 356

溫莎日記 22

推薦閱讀更多精彩內(nèi)容