1. Nhanes研究設(shè)計(jì)
- NHanes采用的是復(fù)雜多階段的概率抽樣(a complex, multistage, probability sampling design),并且對某些亞組進(jìn)行oversampling(不知道中文怎么翻譯好,保證有足夠的亞組人群納入研究)。其實(shí)這也是之后數(shù)據(jù)分析較為獨(dú)特的原因。
- 四階段抽樣:分別是縣(counties)赊锚、城市街區(qū)(segments)、住戶(households)、個人(individuals)
2. 樣本權(quán)重
- 樣本權(quán)重(sample weight):反映了個體抽樣的不等概率缠劝,這也是由之前的復(fù)雜多階段抽樣造成的。需要在之后的分析過程中進(jìn)行校正
3. 多周期合并時樣本權(quán)重計(jì)算
- 權(quán)重的選擇:(1)所有變量都在in-home interview中收集骗灶,采用wtint4yr(注意變量名稱惨恭,int,多周期的話要注意使用合并權(quán)重)耙旦;(2)一些變量是在MEC中收集脱羡,采用wtmec4yr(注意變量名稱,mec免都,多周期的話要注意使用合并權(quán)重)锉罐;(3)一些變量是調(diào)查子樣本的一部分,采用相應(yīng)子樣本的權(quán)重琴昆,如研究變量中有空腹甘油三酯(接受檢測的人大約是接受MEC檢查的樣本的一半)氓鄙,采用wtsaf4yr,多周期的話要注意使用合并權(quán)重业舍;(4)一些變量來自24小時飲食召回(24-hour dietary recall):變量來自第一天的recall抖拦,采用wtdrd1升酣;使用兩天的recall進(jìn)行分析,采用wtdr2d态罪,多周期的話要注意使用合并權(quán)重噩茄;
- 選擇樣本數(shù)量最少的變量對應(yīng)的權(quán)重進(jìn)行校正
-
多周期權(quán)重計(jì)算:(1)分析1999-2002四年(2 cycles)數(shù)據(jù)時,使用數(shù)據(jù)集中四年權(quán)重复颈,如wtint4yr绩聘、wtmec4yr;(2)2001-2002及以后的樣本權(quán)重耗啦,乘以相應(yīng)的比例即可凿菩,具體見下圖
4. 樣本權(quán)重計(jì)算與否時的結(jié)果區(qū)別(以頻數(shù)、百分比計(jì)算結(jié)果為例)
- 未校正權(quán)重代碼及結(jié)果
* Unweighted interview sample *;
proc freq data = demo order=formatted;
tables ridreth3 / nocum;/*以下5行均為格式設(shè)置帜讲,整體格式設(shè)置代碼附在最后*/
format ridreth3 r3ordf. ;
title "Percent of 2015-2016 sample, by race and Hispanic origin";
title2 "Unweighted interview sample";
footnote "Non-Hispanic other includes non-Hispanic persons who reported a race other than white, black, or Asian or who reported multiple races.";
label ridreth3 ="Race and Hispanic origin";
run;
- 校正權(quán)重代碼及結(jié)果
* Weighted with interview sample weight *;
proc freq data = demo order=formatted;
tables ridreth3 / nocum ;
weight wtint2yr; /*以下5行均為格式設(shè)置衅谷,整體格式設(shè)置代碼附在最后*/
format ridreth3 r3ordf. ;
title "Percent of 2015-2016 sample, by race and Hispanic origin";
title2 "Weighted with interview weight";
footnote "Non-Hispanic other includes non-Hispanic persons who reported a race other than white, black, or Asian or who reported multiple races.";
label ridreth3 ="Race and Hispanic origin";
run;
- 可以看到校正與不校正結(jié)果差別還是很大的,校正的結(jié)果更符合美國實(shí)際人群的分布(美國non-hispanic white other肯定是最多的)似将。舉這個例子的目的就是為了體現(xiàn)校正sample weight的重要性
5. 參考內(nèi)容
https://wwwn.cdc.gov/nchs/nhanes/tutorials/module3.aspx
https://wwwn.cdc.gov/nchs/data/tutorials/module3_examples_SAS_Survey.sas
6. 整體格式設(shè)置代碼
*******************;
** Download data **;
*******************;
** Paths to 2015-2016 data files on the NHANES website *;
* DEMO demographic *;
filename demo_i url 'https://wwwn.cdc.gov/nchs/nhanes/2015-2016/demo_i.xpt';
libname demo_i xport;
* BPX blood pressure exam *;
filename bpx_i url 'https://wwwn.cdc.gov/nchs/nhanes/2015-2016/bpx_i.xpt';
libname bpx_i xport;
* BPQ blood pressure questionnaire *;
filename bpq_i url 'https://wwwn.cdc.gov/nchs/nhanes/2015-2016/bpq_i.xpt';
libname bpq_i xport;
* Download SAS transport files and create temporary SAS datasets *;
data demo;
set demo_i.demo_i(keep=seqn riagendr ridageyr ridreth3 sdmvstra sdmvpsu wtmec2yr wtint2yr ridexprg );
run;
data bpx_i;
set bpx_i.bpx_i;
run;
data bpq_i;
set bpq_i.bpq_i;
run;
** Prepare dataset for hypertension example **;
data bpdata;
merge demo
bpx_i (keep = seqn bpxsy1-bpxsy4 bpxdi1-bpxdi4)
bpq_i (keep = seqn bpq050a);
by seqn;
**Hypertension prevalence**;
** Count Number of Nonmissing SBPs & DBPs **;
n_sbp = n(of bpxsy1-bpxsy4);
n_dbp = n(of bpxdi1-bpxdi4);
** Set DBP Values Of 0 To Missing For Calculating Average **;
array _DBP bpxdi1-bpxdi4;
do over _DBP;
if (_DBP = 0) then _DBP = .;
end;
** Calculate Mean Systolic and Diastolic **;
mean_sbp = mean(of bpxsy1-bpxsy4);
mean_dbp = mean(of bpxdi1-bpxdi4);
** "Old" Hypertensive Category variable: taking medication or measured BP > 140/90 **;
* as used in NCHS Data Brief No. 289 *;
* variable bpq050a: now taking prescribed medicine for hypertension *;
if (mean_sbp >= 140 or mean_dbp >= 90 or bpq050a = 1) then HTN_old = 100;
else if (n_sbp > 0 and n_dbp > 0) then HTN_old = 0;
** Create Hypertensive Category Variable: "new" definition based on taking medication or measured BP > 130/80 **;
** From 2017 ACC/AHA hypertension guidelines **;
* Not used in Data Brief No. 289 - provided for reference *;
if (mean_sbp >= 130 or mean_dbp >= 80 or bpq050a = 1) then HTN_new = 100;
else if (n_sbp > 0 and n_dbp > 0) then HTN_new = 0;
* race and Hispanic origin categories for hypertension analysis - generate new variable named raceEthCat *;
select (ridreth3);
when (1,2) raceEthCat=4; * Hispanic ;
when (3) raceEthCat=1; * Non-Hispanic white ;
when (4) raceEthCat=2; * Non-Hispanic black ;
when (6) raceEthCat=3; * Non-Hispanic Asian ;
when (7) raceEthCat=5; * Non-Hispanic other race or Non-Hispanic persons of multiple races *;
otherwise;
end;
* age categories for adults aged 18 and over *;
if 18<=ridageyr<40 then ageCat_18=1;
else if 40 <=ridageyr<60 then ageCat_18=2;
else if 60 <=ridageyr then ageCat_18=3;
* Define subpopulation of interest: non-pregnant adults aged 18 and over who have at least 1 valid systolic OR diastolic BP measure *;
inAnalysis = (ridageyr >=18 and ridexprg ne 1 and (n_sbp ne 0 or n_dbp ne 0)) ;
drop bpxsy1-bpxsy4 bpxdi1-bpxdi4;
run;
**********************************************************************************************;
** Estimates for graph - Distribution of race and Hispanic origin, NHANES 2015-2016 *;
* Module 3, Examples Demonstrating the Importance of Using Weights in Your Analyses *;
* Section "Adjusting for oversampling" *;
**********************************************************************************************;
proc format;
* format to combine and reorder the levels of race and Hispanic origin variable ridreth3 *;
value r3ordf
1,2="3 Hispanic"
3,7="4 Non-Hispanic white and other"
4="1 Non-Hispanic black"
6="2 Non-Hispanic Asian"
;
run;