SAS Base

變量名

名字的長度要小于等于 32 個字節(jié)。(一個字母 1 個字節(jié)膳殷, 一個漢字 2 個字節(jié))
以字母或下劃線開頭操骡。
可以包含字母、 數(shù)字赚窃、 或者是下劃線册招, 不能是%$!*&#@。
可以是小寫或大寫字母勒极, 且不區(qū)分大小寫
Missing numeric data are represented by a single period (.) and missing character data are represented by blanks.


library name

1-8個字符是掰,字母或者下劃線開頭,剩余部分為字母辱匿,數(shù)字或者下劃線

注釋

星號開頭 ;結尾
星號斜杠開頭键痛, 星斜杠結尾 asterisk (*)

DATA steps與PROC steps區(qū)別


The DATA statement does three things

  1. Tells SAS that a DATA step is starting.
  2. Names the SAS dataset being created.
  3. Set variables used in the DATA step to missing values

three default windows

1.program editor window
2.log window
3.output window

The basics of using SAS

  1. Prepare the SAS program
  2. Submit it for analysis
  3. Review the resulting log for errors
  4. Examine the output files to view the results of your analysis

Executing the program

  1. Pull down the Locals menu and select Submit.
  2. Click on the run icon on taskbar, which is a picture of a man running.
  3. Push F8.
  4. Highlight text and click on run symbol
  5. Note: DATA or PROC step is not executed until next DATA and PROC. Use RUN; statement to force execution.

讀入dat文件;

DATA NAME;
INFILE 'E:\data\a.dat' FIRSTOBS=4 DLM=',';
INPUT V1 1-5   V2 5-10   V3 $ 15; 
RUN;
PROC PRINT DATA=NAME; RUN;

infile控制

格式 INFILE 'AAAAA.DAT' XXX;
FIRSTOBS=行數(shù) 從哪一行開始讀取數(shù)據(jù)
OBS=行數(shù) 一直讀取到哪一行
MISSOVER 表示數(shù)據(jù)讀到行末時,如果字段長度短于申明字段長度匾七,則不從下一行讀取數(shù)據(jù)絮短,否則會自動從下一行讀取數(shù)據(jù)
TURNCOVER column input中指定最長的一行

INPUT Notes

(1) Duplicate formats can be used when variables have the same format. The examples below represent the same formats of variables x1-x5.

INPUT x1 4. x2 4. x3 4. x4 4. x5 4.;
INPUT (x1 x2 x3 x4 x5) (4. 4. 4. 4. 4.);
INPUT (x1-x5) (5*4.);

(2) @@ tells SAS to hold the line of raw data and use it when processing the next
observation. The @@ must be the last entry in the INPUT statement.
(3) @ tells SAS to hold this line of data for possible use by INPUT statements later in theDATA step. The @ must be the last entry in the INPUT statement.
(4) / tells SAS to move to the next line of the raw dataset.
(5) #n tells SAS to skip to the nth line of the raw data for the observation.
(6) @n tells SAS to move to the nth column.

特殊字符

@40 跳至第40列 @‘a(chǎn)a’ 跳至aa后面
斜線/ 跳至原始數(shù)據(jù)第二行
#2 跳至某觀測值第二行
重復觀測值,將@@放在input句尾
input句尾加@昨忆, trailing at, 可用來選擇部分數(shù)據(jù)丁频, 看例子


數(shù)據(jù)步讀取分隔符文件 delimited files

DLM=',' 指定逗號分隔符 '09'x Tab分隔符
DSD 忽略引號中數(shù)據(jù)的分隔符,例如一個觀測 Joseph,76,"Red Racers, Washington"非引號中的逗號能識別成分隔符邑贴, 而引號中的逗號不能識別限府; 自動將字符串中的引號去掉; 將兩個相鄰的分隔符當作缺失值來處理痢缎。

Excel數(shù)據(jù)讀取

PROC IMPORT DATAFILE='D:\A.XLS' OUT=A  REPLACE DBMS=XLS; GETNAMES=YES; SHEET="Sheet1"; RUN;
PROC PRINT DATA=A; RUN;

OUT= 輸出數(shù)據(jù)集名稱
DBMS= XLS XLSX

sas7dbat文件讀取 (桌面上的文件)

data new; set 'C:\Users\sdkyc\Desktop\hsb2.sas7bdat'; run;
proc print data=new; run;

數(shù)據(jù)集是臨時還是永久

變量賦值與運算

IF-THEN DO IF-ELSE

  1. DO 與END 是一個組合胁勺,內(nèi)部actions都會被執(zhí)行
DATA A;
INFILE 'C:\A.DAT';
INPUT V1 $ V2 V3;
IF V2 = .  THEN   V4='MISSING';
  ELSE IF V2<100  THEN   V4='LOW';
  ELSE IF V2<1000  THEN   V4='MEDIUM';
  ELSE V4 = 'HIGH';
RUN;
  1. 可以用來構造子集

使用數(shù)組簡化程序 ARRAY

ARRAY array-name <{n}> <$> <length> <elements> <(initialvalues)>;
array-name - is the name of the array.
{n} - is either the dimension of the array, or an asterisk (*) to indicate that the dimension is determined from the number of array elements or initial values.
$ indicates that the array type is character.
length - is the maximum length of elements in the array. For character arrays, the maximum length cannot exceed 200.
elements - are the variables that make up the array and they exist in a dataset or are created before the array definition.
initial-values - are the values to use to initialize some or all of the array elements. Separate these values with commas or blanks

ARRAY rain {5} janr febr marr aprr mayr;
ARRAY days{7} d1-d7;
ARRAY month{*} jan feb jul oct nov;
ARRAY x{*} _NUMERIC_;
ARRAY qbx{10};
ARRAY meal{3};

關于各個PROC的note鏈接

https://stats.idre.ucla.edu/other/annotatedoutput/

PROC CONTENTS 獲取數(shù)據(jù)集的描述部分,不包括數(shù)據(jù)本身

PROC MEANS

輸出一些Descriptive Statistics 功能與univariate重復
maxdec 小數(shù)位個數(shù)
proc means data=a N NMISS MEAN STD STDERR MAXDEC=4; run;

PROC UNIVARIATE t-test sample mean mu0

Test for location就是一個two-tail的t-test独旷,查看student's t value署穗,如果P<α寥裂,wirte的平均值不等于30.
proc univariate data = "D:\hsb2" plots normal mu0=30; var write; run;
用來測試normality,畫plot圖找到Shapiro-Wilk P value大于α案疲,正態(tài)分布
proc univariate data=a normal plot; var write; run;

1.These tests check the assumption that the data is distributed as a normal distribution.
2.Null hypothesis: data is normal vs Alternate hypothesis: data not normal.
3.P-value large (eg > 0.05) indicate the data follow normal (we accept the null hypothesis) .
4.If 6 < sample size < 2001 use Shapiro-Wilk.
5.Sample size > 2000 use Kolmogorov-Smirnov test.
6.Within the appropriate sample size range Shapiro-Wilk is more powerful than Kolmogorov-Smirnov test.
7.Any departure from Skewness =0 and kurtosis = 0 implies non normality.


PROC FREQ TABLES chisq

用來測試變量之間有無association封恰,相互是否獨立。找到輸出結果中chi-square值褐啡,大值對應小p-value诺舔。如果P<α,兩個變量有相關關系备畦,不相互獨立低飒。
English: A large chi-square statistic will correspond to small p-value. If the p-value is small enough (say < 0.05), then we will reject the null hypothesis that the two variables are independent and conclude that there is an association between the row and the column variables.
PROC FREQ DATA=CLASSFIT2; TABLES SEX*HT/CHISQ; RUN;

PROC REG

Assumption

a.Normality of errors: The error distribution is normal.
b.Normality of errors is checked by doing residual analysis. In residual analysis we first calculate the residuals (r = y - ( ??) ???????????????) then verify the normality of the residuals using proc univariate or Q-Q plots.
c.Independence: The errors or observations are independent of each other. Example: apple stock price recorded on 10 consecutive days. Here the 10 observations are not independent
d.變量必須是numerical value

PROC ANOVA

Assumption sampled populations are normally distributed.
one-way ANOVA----only one factor (一個變量,這個變量可以有幾個level)
查看ppt

PROC GLM contrast

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#glm_toc.htm
1.問題:不同年齡的身高平均值相同嗎懂盐?μ1=μ2=μ3=μ4
proc glm data=a; class age; model height=age; run;
2.問題: 11歲與12歲孩子的平均身高13-16歲孩子的平均身高有區(qū)別嗎

proc glm data=a; class age; 
model height=age;
contrast '11&12 vs. rest' 
age 2 2 -1 -1 -1 -1; run; quit;

PROC CORR

查看變量間的相關系數(shù) pearson correlation coefficients褥赊,負值 負相關;正值正相關莉恼。
nosimple 不顯示Descriptive Statistics
proc corr data = "D:\hsb2" pearson nosimple; var read write; run;

PROC TTEST t-test

Assumption: all variables are normally distributed.

  1. Single sample t-test 例子:檢驗score的平均值是否與50相同拌喉, p小于α,顯著不同
    proc ttest data="D:\hsb2" H0=50; var score; run;
  2. Dependent group t-test (paired t-test) 例子:一群學生都考了兩門考試俐银,學生的write 成績與read成績的平均值是否相同尿背, p小于α,顯著不同
    proc ttest data="D:\hsb2"; paired write*read; run;
  3. Independent group t-test 例子:男女性別對write成績有無影響

如果equality of variances Pr>F的值小于α捶惜, 那么兩個性別group的variance不同田藐,必須選擇Satterthwaite (unequal)方法,然后查看這個方法對應的Pr>|t|
如果equality of variances Pr>F的值小于α售躁,選Satterhwaite坞淮,否則選pooled
proc ttest data="D:\hsb2"; class sex; var write; run;

PROC NPAR1WAY

可以用來Wilcoxon test茴晋,問題舉例:
Are test scores different from 4th grade to 5th grade on the same students?
Does a particular diet drug have an effect on BMI when tested one the same individuals?
該test的假設是:
Data comes from two matched, or dependent, populations.
The data is continuous.
Because it is a non-parametric test it does not require a special distribution of the dependent variable in the analysis. 對數(shù)據(jù)的distribution不做要求E憬荨!
尤其適用small sample size

one- and two-tail test

P value

如果 test H0=0诺擅,結果p<α 那么reject the H0市袖,the mean is significantly different from 0.

預制代碼

proc print data= ; run;

?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市烁涌,隨后出現(xiàn)的幾起案子苍碟,更是在濱河造成了極大的恐慌,老刑警劉巖撮执,帶你破解...
    沈念sama閱讀 212,454評論 6 493
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件微峰,死亡現(xiàn)場離奇詭異,居然都是意外死亡抒钱,警方通過查閱死者的電腦和手機蜓肆,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,553評論 3 385
  • 文/潘曉璐 我一進店門颜凯,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人仗扬,你說我怎么就攤上這事症概。” “怎么了早芭?”我有些...
    開封第一講書人閱讀 157,921評論 0 348
  • 文/不壞的土叔 我叫張陵彼城,是天一觀的道長。 經(jīng)常有香客問我退个,道長募壕,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 56,648評論 1 284
  • 正文 為了忘掉前任帜乞,我火速辦了婚禮司抱,結果婚禮上,老公的妹妹穿的比我還像新娘黎烈。我一直安慰自己习柠,他們只是感情好,可當我...
    茶點故事閱讀 65,770評論 6 386
  • 文/花漫 我一把揭開白布照棋。 她就那樣靜靜地躺著资溃,像睡著了一般。 火紅的嫁衣襯著肌膚如雪烈炭。 梳的紋絲不亂的頭發(fā)上溶锭,一...
    開封第一講書人閱讀 49,950評論 1 291
  • 那天,我揣著相機與錄音符隙,去河邊找鬼趴捅。 笑死,一個胖子當著我的面吹牛霹疫,可吹牛的內(nèi)容都是我干的拱绑。 我是一名探鬼主播,決...
    沈念sama閱讀 39,090評論 3 410
  • 文/蒼蘭香墨 我猛地睜開眼丽蝎,長吁一口氣:“原來是場噩夢啊……” “哼猎拨!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起屠阻,我...
    開封第一講書人閱讀 37,817評論 0 268
  • 序言:老撾萬榮一對情侶失蹤红省,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后国觉,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體吧恃,經(jīng)...
    沈念sama閱讀 44,275評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,592評論 2 327
  • 正文 我和宋清朗相戀三年麻诀,在試婚紗的時候發(fā)現(xiàn)自己被綠了痕寓。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片缸逃。...
    茶點故事閱讀 38,724評論 1 341
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖厂抽,靈堂內(nèi)的尸體忽然破棺而出需频,到底是詐尸還是另有隱情,我是刑警寧澤筷凤,帶...
    沈念sama閱讀 34,409評論 4 333
  • 正文 年R本政府宣布昭殉,位于F島的核電站,受9級特大地震影響藐守,放射性物質(zhì)發(fā)生泄漏挪丢。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 40,052評論 3 316
  • 文/蒙蒙 一卢厂、第九天 我趴在偏房一處隱蔽的房頂上張望乾蓬。 院中可真熱鬧,春花似錦慎恒、人聲如沸任内。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,815評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽死嗦。三九已至,卻和暖如春粒氧,著一層夾襖步出監(jiān)牢的瞬間越除,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,043評論 1 266
  • 我被黑心中介騙來泰國打工外盯, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留摘盆,地道東北人。 一個月前我還...
    沈念sama閱讀 46,503評論 2 361
  • 正文 我出身青樓饱苟,卻偏偏與公主長得像孩擂,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子掷空,可洞房花燭夜當晚...
    茶點故事閱讀 43,627評論 2 350

推薦閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,312評論 0 10
  • 豆爺咐鹤,是條小型母狗慎璧,不是什么名犬,俗稱串兒裆操。年過5歲官地,已進成年酿傍,膚色奶白,脊背中間毛色呈綜黃色驱入,貫穿頭尾赤炒,與...
    兜兜的口袋閱讀 499評論 0 0
  • 什么是 MyBatis 氯析? MyBatis 是支持定制化 SQL、存儲過程以及高級映射的優(yōu)秀的持久層框架莺褒。MyBa...
    FX_SKY閱讀 2,448評論 0 1
  • 1. 比特幣最多有2100萬個掩缓。比特幣的最小單位是一聰,是一個比特幣切割成1億份遵岩。 2. 比特幣的分發(fā)和時間戳賬戶...
    9abda844c1aa閱讀 207評論 0 0
  • 早餐費 小學時尘执,周一爸爸送我到學校給一塊錢舍哄,五毛吃早餐五毛周末坐車去父母店里。想省下這五毛錢誊锭,就要走近兩個小時才能...
    畫畫的半山閱讀 205評論 0 4