之前寫過查詢數(shù)據(jù)集中的空變量的宏程序黑毅,一時(shí)沒找到。于是重寫一版灶体,并以此簡單介紹下宏程序的構(gòu)建過程阅签。
這篇文章從功能算法講起,然后編程實(shí)現(xiàn)算法邏輯蝎抽,最后進(jìn)行宏程序的構(gòu)建政钟,宏程序完整代碼在文章第4部分匯總。
希望這篇文章可以對讀者日常SAS編程工作有所幫助樟结。
先展示宏程序輸出的效果:
更多臨床試驗(yàn)SAS編程內(nèi)容养交,歡迎關(guān)注:SAS茶談。
1. 程序算法設(shè)計(jì)
宏程序的構(gòu)思設(shè)計(jì)瓢宦,從最小功能單位開始碎连。對于查詢數(shù)據(jù)集中的空變量,我們從單個(gè)數(shù)據(jù)集的單個(gè)變量的判斷做起驮履。
演示數(shù)據(jù)集使用SASHELP.Class鱼辙,進(jìn)行新增空變量處理。
***test dataset;
data class;
set sashelp.class;
a = "";
b = .;
run;
目前數(shù)據(jù)集中有兩個(gè)完全為空的變量玫镐,宏程序的目的就是把這兩個(gè)變量找出來倒戏。用什么程序語言來表示空變量,這個(gè)需要程序員自己摸索和嘗試恐似。我選用的是杜跷,變量不為空的記錄數(shù)為0。聽起來有些拗口矫夷,但程序?qū)崿F(xiàn)起來比較簡單葛闷。
2. 編程實(shí)現(xiàn)算法
Proc SQL和Data步都能夠?qū)崿F(xiàn)非空記錄數(shù)的統(tǒng)計(jì),但因?yàn)镾QL的聚合函數(shù)跨記錄處理相對方便双藕,我以SQL語句進(jìn)行演示淑趾,先統(tǒng)計(jì)變量Name不為空的記錄數(shù):
***Get number of non-missing records;
proc sql noprint;
create table result1 as
select "CLASS" as Dataset length=50, "NAME" as Var length=50, sum(not missing(name)) as Sum
from class;
quit;
依次類推,我們可以統(tǒng)計(jì)出每個(gè)變量不為空的記錄數(shù)蔓彩,然后依次將各個(gè)結(jié)果縱向拼接在一起治笨。
SQL中縱向拼接的查詢表達(dá)是outer union
,默認(rèn)是按兩個(gè)查詢表依次位置拼接的赤嚼,相同變量拼接需加上關(guān)鍵字corresponding/corr
,語法細(xì)節(jié)參考SAS官方文檔:SAS Help Center: query Expression顺又。
***Get number of non-missing records for all variables;
proc sql noprint;
create table result1 as
select "CLASS" as Dataset length=50, "NAME" as Var length=50, sum(not missing(NAME)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "SEX" as Var length=50, sum(not missing(SEX)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "AGE" as Var length=50, sum(not missing(AGE)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "HEIGHT" as Var length=50, sum(not missing(HEIGHT)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "WEIGHT" as Var length=50, sum(not missing(WEIGHT)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "A" as Var length=50, sum(not missing(A)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "B" as Var length=50, sum(not missing(B)) as Sum
from class;
quit;
再獲取數(shù)據(jù)集中所有變量不為空的記錄數(shù)后更卒,我們只需要篩選記錄數(shù)為0的記錄就可以獲取空變量的信息。為展示方便稚照,我們可以把兩個(gè)變量信息綜合在一起蹂空,這個(gè)可以通過轉(zhuǎn)置后橫向拼接實(shí)現(xiàn)俯萌。
***Display result;
proc transpose data = result1 out=result2 prefix=emp_;
by dataset;
var Var;
where sum = 0;
run;
data result3;
set result2;
length Empvar $2000;
Empvar = catx(", ", of emp_:);
keep dataset empvar;
run;
這樣處理看起來比較簡潔,也方便后續(xù)多數(shù)據(jù)集檢查空變量結(jié)果的拼接上枕。
關(guān)于程序咐熙,多解釋一點(diǎn),SAS中函數(shù)批量處理變量序列時(shí)辨萍,需要在變量序列前添加of
棋恼。如果不批量處理,也可以手動(dòng)輸入每一變量名稱锈玉,例如:
empvar = catx(", ", emp_1, emp_2);
如果不知道輸出變量的數(shù)目爪飘,使用特定的前綴對變量進(jìn)行標(biāo)記,再用函數(shù)批量處理拉背,這個(gè)過程會(huì)變得簡潔許多师崎。
3. 宏程序的構(gòu)建
在手動(dòng)編程將算法實(shí)現(xiàn)后,就可以著手構(gòu)建宏程序椅棺。就我個(gè)人SAS編程經(jīng)驗(yàn)來講犁罩,宏程序的作用主要有兩個(gè):
- 單個(gè)功能的重復(fù)調(diào)用;
- 宏循環(huán)的批量處理两疚。
關(guān)于這兩個(gè)作用床估,讀者可以與自己的宏程序編程經(jīng)歷對照理解,這里就不過多展開鬼雀。
從以上手動(dòng)編程的過程中可以看出顷窒,程序主要“重復(fù)”的地方在于各個(gè)變量不為空記錄的統(tǒng)計(jì)。除了變量名稱源哩,拼接程序完全相同鞋吉。如果我們將需要處理的變量名稱保存到宏變量序列中,就可以通過宏循環(huán)依次進(jìn)行調(diào)用励烦,并通過宏循環(huán)批量構(gòu)建程序谓着。
3.1 生成宏變量(序列)
宏循環(huán)處理的關(guān)鍵,在于循環(huán)次數(shù)的獲取以及變量名稱宏變量序列坛掠。
宏變量的生成常用有2種方法:
- Proc SQL 中的
into :
語句- Data步中的
call symputx
語句
SAS數(shù)據(jù)集的元數(shù)據(jù)信息保存在SAS字典中赊锚,這里我以Data步中的call symputx
語句進(jìn)行舉例。
變量數(shù)目保存到宏變量中:
***Get the number of vairiables;
data tmp1;
set sashelp.vtable;
where libname = "WORK" and memname = "CLASS";
call symputx("nvar", strip(put(nvar,best.)));
run;
%put nvar= &nvar.;
變量名稱保存到宏變量序列中:
***Get variables' name;
data tmp2;
set sashelp.vcolumn;
where libname = "WORK" and memname = "CLASS";
call symputx("var"||strip(put(varnum, best.)), strip(name));
run;
%put var1= &var1.;
%put var7= &var7.;
這里一些讀者可能有這樣的想法:這里完全可以使用一個(gè)Data步屉栓,從SASHELP.vcolumn數(shù)據(jù)集獲取最后一條數(shù)據(jù)的varnum作為宏變量nvar的取值舷蒲。類似這樣的處理:
***Get variables' name and nvar;
data tmp3;
set sashelp.vcolumn end=eof;
where libname = "WORK" and memname = "CLASS";
call symputx("var"||strip(put(varnum, best.)), strip(name));
if eof then call symputx("nvar1", strip(put(varnum, best.)));
%put nvar1= &nvar1.;
run;
以當(dāng)前演示數(shù)據(jù)集來看,這兩種方式處理結(jié)果相同友多。但如果一個(gè)數(shù)據(jù)集中沒有任何變量牲平,這時(shí)候SASHELP.vcolumn中是沒有記錄的,而SASHELP.vtable中是有nvar=0
的記錄域滥。此時(shí)后者無法抓取數(shù)據(jù)集的變量數(shù)纵柿。
如果感興趣蜈抓,讀者可以用以下空數(shù)據(jù)集進(jìn)行測試:
data test;
run;
3.2 宏循環(huán)的實(shí)現(xiàn)
宏循環(huán)需要在宏程序中進(jìn)行,宏程序的構(gòu)建盡可能包含可能的情形昂儒。這里根據(jù)變量數(shù)目進(jìn)行分類處理沟使。
***Temp macro;
%macro check_empty_var;
%if &nvar. = 0 %then %do;
data result;
length Dataset $50 empvar $2000;
dataset = "CLASS";
empvar = "There is no variable in the dataset Class!";
run;
%end;
%else %if &nvar. > 0 %then %do;
%if &nvar. = 1 %then %do;
proc sql noprint;
create table result1 as
select "CLASS" as domain length=50, "&var1." as Var length=50, sum(not missing(&var1.)) as Sum
from class
;
quit;
%end;
%if &nvar. > 1 %then %do;
proc sql noprint;
create table result1 as
select "CLASS" as Dataset length=50, "&var1." as Var length=50, sum(not missing(&var1.)) as Sum
from class
%do i = 2 %to &nvar.;
outer union corr
select "CLASS" as Dataset length=50, "&&var&i." as Var length=50, sum(not missing(&&var&i.)) as Sum
from class
%end;
;
quit;
%end;
*Display result;
proc transpose data = result1 out=result2 prefix=emp_;
by dataset;
var Var;
where sum = 0;
run;
data result;
set result2;
length Empvar $2000;
Empvar = catx(", ", of emp_:);
keep dataset empvar;
run;
%end;
%mend check_empty_var;
%check_empty_var;
以上宏程序運(yùn)行結(jié)果如下,與手動(dòng)編程結(jié)果保持一致渊跋。
3.3 宏參數(shù)的設(shè)置
宏參數(shù)一般有3類:
- 輸入內(nèi)容(變量/數(shù)據(jù)集)
- 輸出內(nèi)容(變量/數(shù)據(jù)集)
- 特定條件
這個(gè)宏程序從簡腊嗡,直接以Reslt數(shù)據(jù)集輸出,不需要額外的篩選條件刹枉,只需設(shè)置輸入數(shù)據(jù)集就好叽唱。為了省事,也不在宏里判斷輸入數(shù)據(jù)集的邏輯庫名稱微宝、數(shù)據(jù)集名稱棺亭,直接定義到宏參數(shù)中。
%macro check_empty_var(libname=WORK, memname=);
...
%mend check_empty_var;
確定宏參數(shù)后蟋软,需要在之前初步宏程序中進(jìn)行內(nèi)容替換镶摘,這樣方便以后對不同參數(shù)對象的處理調(diào)用。
4. 宏程序代碼匯總
完整宏程序需綜合以上內(nèi)容岳守,并盡可能考慮多種可能情況凄敢,以求宏程序運(yùn)行穩(wěn)定。例如湿痢,輸入數(shù)據(jù)集不存在的情況涝缝;數(shù)據(jù)集沒有空變量的情況。
如果輸入數(shù)據(jù)集不存在譬重,最好能在Log中輸出一條Warning
記錄作為提醒拒逮。為避免一些程序文本檢查機(jī)制的誤判,War ning
最好能拆開處理下臀规。
其他處理細(xì)節(jié)就不再展開描述滩援,匯總程序如下:
%macro check_empty_var(libname=WORK, memname=);
***Dataset exists;
%if %sysfunc(exist(&libname..&memname.)) = 1 %then %do;
***Get the number of vairiables;
data _null_;
set sashelp.vtable;
where libname = upcase("&libname.") and memname = upcase("&memname.");
call symputx("nvar", strip(put(nvar, best.)));
run;
%put nvar= &nvar.;
***Macro loop;
%if &nvar. = 0 %then %do;
data result;
length Dataset $50 Empvar $2000;
dataset = upcase("&libname..&memname.");
Empvar= "There is no variable in the dataset %sysfunc(upcase(&libname..&memname.)).";
run;
%end;
%else %if &nvar. > 0 %then %do;
**Get variables name;
data _null_;
set sashelp.vcolumn;
where libname = upcase("&libname.") and memname = upcase("&memname.");
call symputx("var"||strip(put(varnum, best.)), strip(name));
run;
%put var1= &var1.;
%if &nvar. = 1 %then %do;
proc sql noprint;
create table result1 as
select upcase("&libname..&memname.") as Dataset length=50, "&var1." as Var length=50, sum(not missing(&var1.)) as Sum
from &libname..&memname.
;
quit;
%end;
%if &nvar. > 1 %then %do;
proc sql noprint;
create table result1 as
select upcase("&libname..&memname.") as Dataset length=50, "&var1." as Var length=50, sum(not missing(&var1.)) as Sum
from &libname..&memname.
%do i = 2 %to &nvar.;
outer union corr
select upcase("&libname..&memname.") as Dataset length=50, "&&var&i." as Var length=50, sum(not missing(&&var&i.)) as Sum
from &libname..&memname.
%end;
;
quit;
%end;
*Display result;
proc transpose data = result1 out=result2 prefix=emp_;
by dataset;
var Var;
where sum = 0;
run;
data result;
set result2;
length emp_1 $50 Empvar$2000;
if not missing(emp_1) then empvar = catx(", ", of emp_:);
else do;
dataset = upcase("&libname..&memname.");
empvar = "There is no empty variable in the dataset %sysfunc(upcase(&libname..&memname.)).";
end;
keep dataset empvar;
run;
%end;
%end;
***Dataset does not exist;
%if %sysfunc(exist(&libname..&memname.)) ne 1 %then %do;
%put %sysfunc(compress(War ning)): Dataset %sysfunc(upcase(&libname..&memname.)) does not exist. ;
%end;
%mend check_empty_var;
%check_empty_var(libname=work, memname=class);
%check_empty_var(libname=work, memname=Yeteng);
如果數(shù)據(jù)集不存在,最后顯示結(jié)果如下:
5. 總結(jié)
文章梳理了查詢數(shù)據(jù)集中所有空變量宏程序構(gòu)建過程塔嬉,希望能夠?qū)ψx者有所幫助玩徊。單個(gè)數(shù)據(jù)集處理的完成,也方便后續(xù)對多個(gè)數(shù)據(jù)集批量處理谨究。
感謝閱讀恩袱, 歡迎關(guān)注:SAS茶談!
若有疑問胶哲,歡迎評論交流憎蛤!