這一節(jié)說的是association analysis 也是我們分析單標記回歸得到的結果(P值)的一步,這一步的結果可以用來與我第一講 聯(lián)合起來肮柜,形成一個閉環(huán)。跑跑標準流程。
Association analysis 可以有很多用處朴上,比如:
The basic association test is for a disease trait and is based on comparing allele frequencies between cases and controls (asymptotic and empirical p-values are available). Also implemented are the Cochran-Armitage trend test, Fisher’s exact test, di?erent genetic models (dominant, recessive and general), tests for stratified samples (e.g. Cochran-Mantel-Haenszel, Breslow-Day tests), a test for a quantitative trait; a test for dif- ferences in missing genotype rate between cases and controls; multilocus tests, using either Hotelling’s T(2) statistic or a sum-statistic approach (evaluated by permutation) as well as haplotype tests. The basic tests can be performed with permutation, described in the following section to provide empirical p-values, and allow for dierent designs (e.g. by use of structured, within-cluster permutation).
我這了主要介紹一個Linear and logistic models
These two features allow for multiple covariates when testing for both quantitative trait and disease trait SNP association, and for interactions with those covariates. The covariates can either be continuous or binary (i.e. for categorical covariates, you must first make a set of binary dummy variables).
這個主要是可以加入?yún)f(xié)變量作為控制,很靈活卒煞,但是可能速度會慢一點
說明上說最基礎的用法是這樣:
但是這里我遇到一個問題
我bed bim fam 是不包含表型數(shù)據(jù)的痪宰,所以我要自己重新定義一個表型文件
我這里用quantitative traits作為例子
一般來說,就是自己設置一個文件
然后使用 --pheno 指定這個文件
--pheno causes phenotype values to be read from the 3rd column of the specified space- or tab-delimited file, instead of the .fam or .ped file. The first and second columns of that file must contain family and within-family IDs, respectively.
總共三列
前兩列是family and within-family IDs 第三列是表型
我這里用第一主成分作為表型
咱們來試試看
失敗了畔裕,程序運行錯誤
Warning: Skipping --linear since # variables >= # samples.
記住要加一個 --allow-no-sex
--allow-no-sex is now required if you want to retain phenotype values for missing-sex samples. This is a change from PLINK 1.07; we believe it would be more confusing to continue treating regular and --pheno phenotypes differently, and apologize for any temporary inconvenience we've caused.
程序就是這么設定的
plink --bfile clean --linear --pheno clean_one.eigenvec --allow-no-sex
成功啦
生成一個文件 assoc.linear
這個就可以用來畫圖啦
畫圖回到一
這里我解釋一下每一列的意義吧
- 染色體
- snp 名字
- base-pair 物理位置
- Tested allele (minor allele by default)
- Code for the test 估計就是模型
- Number of non-missing individuals included in analysis 個體數(shù)目
- Regression coefficient (--linear) or odds ratio (--logistic) 也就是beta值(回歸系數(shù))
- Coefficient t-statistic (beta除以standard error, 越大越顯著)
9 Asymptotic p-value for t-statistic P值 看顯著
這個我就簡單介紹到這里
這里我還遇到一些實戰(zhàn)的問題
比如有些數(shù)據(jù)衣撬,我算出來極顯著,P值等于0 扮饶,這樣后面畫圖 ylim不能為無窮大具练,會產(chǎn)生問題,還有就是我的圖莫名其妙變瘦了甜无,很奇怪扛点。