SAS Base

變量名

名字的長度要小于等于 32 個字節(jié)。（一個字母 1 個字節(jié)膳殷，一個漢字 2 個字節(jié)）
以字母或下劃線開頭操骡。
可以包含字母、數(shù)字赚窃、或者是下劃線册招，不能是%$!*&#@。
可以是小寫或大寫字母勒极，且不區(qū)分大小寫
Missing numeric data are represented by a single period (.) and missing character data are represented by blanks.

library name

1-8個字符是掰，字母或者下劃線開頭，剩余部分為字母辱匿，數(shù)字或者下劃線

注釋

星號開頭 ;結尾
星號斜杠開頭键痛，星斜杠結尾 asterisk (*)

DATA steps與PROC steps區(qū)別

The DATA statement does three things

Tells SAS that a DATA step is starting.

Names the SAS dataset being created.

Set variables used in the DATA step to missing values

three default windows

1.program editor window
2.log window
3.output window

The basics of using SAS

Prepare the SAS program

Submit it for analysis

Review the resulting log for errors

Examine the output files to view the results of your analysis

Executing the program

Pull down the Locals menu and select Submit.

Click on the run icon on taskbar, which is a picture of a man running.

Push F8.

Highlight text and click on run symbol

Note: DATA or PROC step is not executed until next DATA and PROC. Use RUN; statement to force execution.

讀入dat文件;

DATA NAME;
INFILE 'E:\data\a.dat' FIRSTOBS=4 DLM=',';
INPUT V1 1-5   V2 5-10   V3 $ 15; 
RUN;
PROC PRINT DATA=NAME; RUN;

infile控制

格式 INFILE 'AAAAA.DAT' XXX;
FIRSTOBS=行數(shù) 從哪一行開始讀取數(shù)據(jù)
OBS=行數(shù) 一直讀取到哪一行
MISSOVER 表示數(shù)據(jù)讀到行末時，如果字段長度短于申明字段長度匾七，則不從下一行讀取數(shù)據(jù)絮短，否則會自動從下一行讀取數(shù)據(jù)
TURNCOVER column input中指定最長的一行

INPUT Notes

(1) Duplicate formats can be used when variables have the same format. The examples below represent the same formats of variables x1-x5.
INPUT x1 4. x2 4. x3 4. x4 4. x5 4.;
INPUT (x1 x2 x3 x4 x5) (4. 4. 4. 4. 4.);
INPUT (x1-x5) (5*4.);
(2) @@ tells SAS to hold the line of raw data and use it when processing the next
observation. The @@ must be the last entry in the INPUT statement.
(3) @ tells SAS to hold this line of data for possible use by INPUT statements later in theDATA step. The @ must be the last entry in the INPUT statement.
(4) / tells SAS to move to the next line of the raw dataset.
(5) #n tells SAS to skip to the nth line of the raw data for the observation.
(6) @n tells SAS to move to the nth column.

特殊字符

@40 跳至第40列 @‘a(chǎn)a’ 跳至aa后面
斜線/ 跳至原始數(shù)據(jù)第二行
#2 跳至某觀測值第二行
重復觀測值，將@@放在input句尾
input句尾加@昨忆， trailing at, 可用來選擇部分數(shù)據(jù)丁频，看例子

數(shù)據(jù)步讀取分隔符文件 delimited files

DLM=',' 指定逗號分隔符 '09'x Tab分隔符
DSD 忽略引號中數(shù)據(jù)的分隔符，例如一個觀測 Joseph,76,"Red Racers, Washington"非引號中的逗號能識別成分隔符邑贴，而引號中的逗號不能識別限府；自動將字符串中的引號去掉；將兩個相鄰的分隔符當作缺失值來處理痢缎。

Excel數(shù)據(jù)讀取

PROC IMPORT DATAFILE='D:\A.XLS' OUT=A  REPLACE DBMS=XLS; GETNAMES=YES; SHEET="Sheet1"; RUN;
PROC PRINT DATA=A; RUN;

OUT= 輸出數(shù)據(jù)集名稱
DBMS= XLS XLSX

sas7dbat文件讀取 (桌面上的文件)

data new; set 'C:\Users\sdkyc\Desktop\hsb2.sas7bdat'; run;
proc print data=new; run;

數(shù)據(jù)集是臨時還是永久

變量賦值與運算

IF-THEN DO IF-ELSE

DO 與END 是一個組合胁勺，內(nèi)部actions都會被執(zhí)行

DATA A;
INFILE 'C:\A.DAT';
INPUT V1 $ V2 V3;
IF V2 = .  THEN   V4='MISSING';
  ELSE IF V2<100  THEN   V4='LOW';
  ELSE IF V2<1000  THEN   V4='MEDIUM';
  ELSE V4 = 'HIGH';
RUN;

可以用來構造子集

使用數(shù)組簡化程序 ARRAY

ARRAY array-name <{n}> <$> <length> <elements> <(initialvalues)>;
array-name - is the name of the array.
{n} - is either the dimension of the array, or an asterisk (*) to indicate that the dimension is determined from the number of array elements or initial values.
$ indicates that the array type is character.
length - is the maximum length of elements in the array. For character arrays, the maximum length cannot exceed 200.
elements - are the variables that make up the array and they exist in a dataset or are created before the array definition.
initial-values - are the values to use to initialize some or all of the array elements. Separate these values with commas or blanks
ARRAY rain {5} janr febr marr aprr mayr;
ARRAY days{7} d1-d7;
ARRAY month{*} jan feb jul oct nov;
ARRAY x{*} _NUMERIC_;
ARRAY qbx{10};
ARRAY meal{3};

關于各個PROC的note鏈接

https://stats.idre.ucla.edu/other/annotatedoutput/

PROC CONTENTS 獲取數(shù)據(jù)集的描述部分，不包括數(shù)據(jù)本身

PROC MEANS

輸出一些Descriptive Statistics 功能與univariate重復
maxdec 小數(shù)位個數(shù)
proc means data=a N NMISS MEAN STD STDERR MAXDEC=4; run;

PROC UNIVARIATE t-test sample mean mu0

Test for location就是一個two-tail的t-test独旷，查看student's t value署穗，如果P＜α寥裂，wirte的平均值不等于30.
proc univariate data = "D:\hsb2" plots normal mu0=30; var write; run;
用來測試normality，畫plot圖找到Shapiro-Wilk P value大于α案疲，正態(tài)分布
proc univariate data=a normal plot; var write; run;

1.These tests check the assumption that the data is distributed as a normal distribution.
2.Null hypothesis: data is normal vs Alternate hypothesis: data not normal.
3.P-value large (eg > 0.05) indicate the data follow normal (we accept the null hypothesis) .
4.If 6 < sample size < 2001 use Shapiro-Wilk.
5.Sample size > 2000 use Kolmogorov-Smirnov test.
6.Within the appropriate sample size range Shapiro-Wilk is more powerful than Kolmogorov-Smirnov test.
7.Any departure from Skewness =0 and kurtosis = 0 implies non normality.

PROC FREQ TABLES chisq

用來測試變量之間有無association封恰，相互是否獨立。找到輸出結果中chi-square值褐啡，大值對應小p-value诺舔。如果P＜α，兩個變量有相關關系备畦，不相互獨立低飒。
English： A large chi-square statistic will correspond to small p-value. If the p-value is small enough (say < 0.05), then we will reject the null hypothesis that the two variables are independent and conclude that there is an association between the row and the column variables.
PROC FREQ DATA=CLASSFIT2; TABLES SEX*HT/CHISQ; RUN;

PROC REG

Assumption

a.Normality of errors: The error distribution is normal.
b.Normality of errors is checked by doing residual analysis. In residual analysis we first calculate the residuals (r = y - ( ??) ???????????????) then verify the normality of the residuals using proc univariate or Q-Q plots.
c.Independence: The errors or observations are independent of each other. Example: apple stock price recorded on 10 consecutive days. Here the 10 observations are not independent
d.變量必須是numerical value

PROC ANOVA

Assumption sampled populations are normally distributed.
one-way ANOVA----only one factor (一個變量，這個變量可以有幾個level)
查看ppt

PROC GLM contrast

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#glm_toc.htm
1.問題：不同年齡的身高平均值相同嗎懂盐？μ1=μ2=μ3=μ4
proc glm data=a; class age; model height=age; run;
2.問題： 11歲與12歲孩子的平均身高13-16歲孩子的平均身高有區(qū)別嗎

proc glm data=a; class age; 
model height=age;
contrast '11&12 vs. rest' 
age 2 2 -1 -1 -1 -1; run; quit;

PROC CORR

查看變量間的相關系數(shù) pearson correlation coefficients褥赊，負值負相關；正值正相關莉恼。
nosimple 不顯示Descriptive Statistics
proc corr data = "D:\hsb2" pearson nosimple; var read write; run;

PROC TTEST t-test

Assumption: all variables are normally distributed.

Single sample t-test 例子：檢驗score的平均值是否與50相同拌喉， p小于α，顯著不同
proc ttest data="D:\hsb2" H0=50; var score; run;
Dependent group t-test (paired t-test) 例子：一群學生都考了兩門考試俐银，學生的write 成績與read成績的平均值是否相同尿背， p小于α，顯著不同
proc ttest data="D:\hsb2"; paired write*read; run;
Independent group t-test 例子：男女性別對write成績有無影響

如果equality of variances Pr>F的值小于α捶惜，那么兩個性別group的variance不同田藐，必須選擇Satterthwaite (unequal)方法，然后查看這個方法對應的Pr>|t|
如果equality of variances Pr>F的值小于α售躁，選Satterhwaite坞淮，否則選pooled
proc ttest data="D:\hsb2"; class sex; var write; run;

PROC NPAR1WAY

可以用來Wilcoxon test茴晋，問題舉例：
Are test scores different from 4th grade to 5th grade on the same students?
Does a particular diet drug have an effect on BMI when tested one the same individuals?
該test的假設是：
Data comes from two matched, or dependent, populations.
The data is continuous.
Because it is a non-parametric test it does not require a special distribution of the dependent variable in the analysis. 對數(shù)據(jù)的distribution不做要求Ｅ憬荨！
尤其適用small sample size

one- and two-tail test

P value

如果 test H0=0诺擅，結果p<α 那么reject the H0市袖，the mean is significantly different from 0.

預制代碼

proc print data= ; run;

?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市烁涌，隨后出現(xiàn)的幾起案子苍碟，更是在濱河造成了極大的恐慌，老刑警劉巖撮执，帶你破解...
沈念sama閱讀 212,454評論 6贊 493
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件微峰，死亡現(xiàn)場離奇詭異，居然都是意外死亡抒钱，警方通過查閱死者的電腦和手機蜓肆，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 90,553評論 3贊 385
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門颜凯，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人仗扬，你說我怎么就攤上這事症概。” “怎么了早芭？”我有些...
開封第一講書人閱讀 157,921評論 0贊 348
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵彼城，是天一觀的道長。經(jīng)常有香客問我退个，道長募壕，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 56,648評論 1贊 284
?港島之戀（遺憾婚禮）
正文為了忘掉前任帜乞，我火速辦了婚禮司抱，結果婚禮上，老公的妹妹穿的比我還像新娘黎烈。我一直安慰自己习柠，他們只是感情好，可當我...
茶點故事閱讀 65,770評論 6贊 386
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布照棋。她就那樣靜靜地躺著资溃，像睡著了一般。火紅的嫁衣襯著肌膚如雪烈炭。梳的紋絲不亂的頭發(fā)上溶锭，一...
開封第一講書人閱讀 49,950評論 1贊 291
城市分裂傳說
那天，我揣著相機與錄音符隙，去河邊找鬼趴捅。笑死，一個胖子當著我的面吹牛霹疫，可吹牛的內(nèi)容都是我干的拱绑。我是一名探鬼主播，決...
沈念sama閱讀 39,090評論 3贊 410
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼丽蝎，長吁一口氣：“原來是場噩夢啊……” “哼猎拨！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起屠阻，我...
開封第一講書人閱讀 37,817評論 0贊 268
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤红省，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后国觉，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體吧恃，經(jīng)...
沈念sama閱讀 44,275評論 1贊 303
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 36,592評論 2贊 327
?白月光啟示錄
正文我和宋清朗相戀三年麻诀，在試婚紗的時候發(fā)現(xiàn)自己被綠了痕寓。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片缸逃。...
茶點故事閱讀 38,724評論 1贊 341
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖厂抽，靈堂內(nèi)的尸體忽然破棺而出需频，到底是詐尸還是另有隱情，我是刑警寧澤筷凤，帶...
沈念sama閱讀 34,409評論 4贊 333
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布昭殉，位于F島的核電站，受9級特大地震影響藐守，放射性物質(zhì)發(fā)生泄漏挪丢。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 40,052評論 3贊 316
男人毒藥：我在死后第九天來索命
文/蒙蒙一卢厂、第九天我趴在偏房一處隱蔽的房頂上張望乾蓬。院中可真熱鬧，春花似錦慎恒、人聲如沸任内。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,815評論 0贊 21
一樁弒父案融柬，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽死嗦。三九已至，卻和暖如春粒氧，著一層夾襖步出監(jiān)牢的瞬間越除，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 32,043評論 1贊 266
情欲美人皮
我被黑心中介騙來泰國打工外盯，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留摘盆，地道東北人。一個月前我還...
沈念sama閱讀 46,503評論 2贊 361
代替公主和親
正文我出身青樓饱苟，卻偏偏與公主長得像孩擂，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子掷空，可洞房花燭夜當晚...
茶點故事閱讀 43,627評論 2贊 350