一唉侄、孟德爾隨機(jī)化(Mendelian Randomization
育勺, MR
)的原理
- 隨機(jī)對(duì)照實(shí)驗(yàn)(RCT)一般被認(rèn)為是明確因果關(guān)系的金標(biāo)準(zhǔn)蕾额。其要求研究對(duì)象隨機(jī)分組早芭,為減少偏倚,常采用單盲诅蝶、雙盲甚至三盲設(shè)計(jì)退个,然而,這卻需要大量的人力物力调炬,且因?yàn)閭惱韱栴}语盈,對(duì)某些因素的研究幾乎不可能。
- 單盲:實(shí)驗(yàn)對(duì)象不知道自己屬于哪一組缰泡。
- 雙盲:實(shí)驗(yàn)對(duì)象和研究者均不知道組別分配刀荒。
三盲:實(shí)驗(yàn)對(duì)象、研究者及數(shù)據(jù)分析人員都不知道分組情況。
-
孟德爾第二定律認(rèn)為缠借,遺傳物質(zhì)獨(dú)立分離资溃,自由組合。即父母的基因在傳遞給子代時(shí)是隨機(jī)分配的烈炭。因此,特定基因變異的分布是隨機(jī)的宝恶,不會(huì)受到環(huán)境因素或行為習(xí)慣的影響符隙。這種隨機(jī)分布類似于隨機(jī)對(duì)照試驗(yàn)中的隨機(jī)分組,使其成為一種“自然實(shí)驗(yàn)”垫毙。
雖然在個(gè)體層面霹疫,一個(gè)人的遺傳變異來自父母即他們的分配不完全隨機(jī),因?yàn)樽劢妫绻涓改笡]有攜帶某種特定的突變丽蝎,則該個(gè)體也不會(huì)攜帶這種突變。而如果擴(kuò)大到群體水平膀藐,遺傳變異在人群中的分布可以被視為隨機(jī)分布屠阻。
二、工具變量(Instrument Variant
, IV
)的選擇
- 獨(dú)立性假設(shè)额各,即IV不通過其他混雜因素與結(jié)局(
outcome
)相關(guān) - 關(guān)聯(lián)性假設(shè)国觉,即IV要與暴露因素(
exposure
)強(qiáng)相關(guān) - 排他性假設(shè),即IV不能直接與結(jié)局(
outcome
)存在相關(guān)性虾啦。
因果推斷:通過分析工具變量與結(jié)果變量之間的關(guān)聯(lián)來推斷暴露因素對(duì)結(jié)果變量的因果關(guān)系麻诀。常用的方法包括雙階段最小二乘法(2SLS)或加權(quán)中值法等。
三傲醉、什么情況下可以選擇做孟德爾隨機(jī)化 蝇闭?
- 不確定traits之間的因果關(guān)系,比如硬毕,到底是抑郁導(dǎo)致肺癌還是肺癌導(dǎo)致抑郁呻引?
- 研究的樣本量足夠大;
- 暴露和結(jié)局存在假設(shè)的因果關(guān)系昭殉。
四苞七、那么,什么情況不能做孟德爾隨機(jī)化呢挪丢?
u宸纭!乾蓬! 僅代表個(gè)人意見
- 潛在的基因-環(huán)境交互效應(yīng)顯著:trait的遺傳力過小惠啄,即
GWAS
顯著結(jié)果(p < 5e-8)可能代表不了這個(gè)trait時(shí)。
五、孟德爾隨機(jī)化的應(yīng)用
這里介紹兩種孟德爾隨機(jī)化應(yīng)用的工具(楊劍老師lab)撵渡,GSMR
和SMR
1融柬、GSMR(Generalised Summary-data-based Mendelian Randomisation)
性狀水平的MR
,確定兩個(gè)大性狀之間的因果關(guān)系趋距,如確定吸煙和肺癌之間的因果關(guān)系粒氧。
與其他的MR
方法利用一個(gè)IV
進(jìn)行分析不同,GSMR
利用與exposure
顯著相關(guān)的多個(gè)IV
進(jìn)行分析节腐,大大提升了MR
分析的power
和準(zhǔn)確性外盯。
If there are multiple independent (or nearly independent) SNPs associated with x and the effect of x on y is causal, then all the x-associated SNPs will have an effect on y through x. (Zhu et al. Nat Commun)
GSMR
已整合到GCTA
軟件中,操作十分便利
gcta --bfile ${REFGeno} \
--gsmr-file exposure.txt outcome.txt \
--gsmr-direction 2 \
--gwas-thresh 5e-08 \
--clump-r2 0.05 \
--gsmr2-beta \
--gsmr-snp-min 10 \
--heidi-thresh 0.01 \
--effect-plot \
--out meta_gsmr \
--thread-num 10
# --bfile plink格式的基因型文件翼雀,可用千人基因組參考文件
# --gsmr-file 需要兩個(gè)list.txt文件饱苟,前面為exposure,后面為outcome狼渊,格式為BMI BMI.gwas.ma
# --gsmr-direction 2 選項(xiàng)有 0 1 2箱熬,分別為正向狈邑,反向及雙向GSMR
# --gsmr-snp-min 10 只有工具變量達(dá)到規(guī)定數(shù)量才進(jìn)行GSMR
# --heidi-thresh 0.01 進(jìn)行herdi test城须,去除LD SNP
BMI.gwas.ma格式如下,header名稱無所謂
SNP A1 A2 freq b se p n
rs1001 A G 0.8493 0.0024 0.0055 0.6653 129850
rs1002 C G 0.03606 0.0034 0.0115 0.7659 129799
rs1003 A C 0.5128 0.045 0.038 0.2319 129830
......
最后官地,篩選有顯著因果關(guān)系的trait
#! /public/home/shilulu/anaconda3/envs/R4.2.0/bin/Rscript
library(data.table)
library(dplyr)
meta_gsmr <- fread("meta_gsmr.gsmr")
meta_gsmr[meta_gsmr == "nan"] <- NA
# p < 0.05 / row number / 2 for 2 dirction gsmr
# remove row contained "nan" (all)
final <- meta_gsmr %>%
filter(p < 0.05 / (2*nrow(meta_gsmr))) %>%
filter(complete.cases(.))
fwrite(final, file="meta_gsmr_noNA_anno.gsmr", sep="\t", na="nan", quote=FALSE)
2酿傍、SMR(Summary-data-based Mendelian Randomisation)
分子水平的MR
,確定與性狀有因果關(guān)系的分子標(biāo)記驱入,如(eQTL
, mQTL
, pQTL
, apaQTL
, vQTL
等)
SMR
通過將分子QTL作為exposure
赤炒,復(fù)雜性狀作為outcome
,來研究分子QTL與復(fù)雜性狀之間的關(guān)聯(lián)亏较,使用HEIDI
進(jìn)行結(jié)果檢驗(yàn)(即從causal variant中去除Linkage
的莺褒,但無法辨別Pleiotropy
的情況)。
一雪情、run SMR between QTL and traits
- running smr, QTL with gwas trait
smr --bfile ${bfile} \
--beqtl-summary ${eQTL} \
--gwas-summary ${gwas} \
--diff-freq-prop 0.5 \
--maf 0.01 \
--cis-wind 2000 \
--out whole_blood \
--thread-num 4 > eqtlsmr.log 2>&1
- filter result遵岩,i.e. exclude the probe that
p_smr > 0.05 / n test
和p_heidi < 0.01
library(dplyr)
args <- commandArgs(TRUE)
infile <- args[1]
outname <- args[2]
smr = read.table(infile, sep="\t", header=TRUE)
# retain p_SMR <= 0.05 / ntest and p_HEIDI >= 0.01
filter_out = filter(smr, p_SMR <= 0.05/nrow(smr) & p_HEIDI >= 0.01)
write.table(filter_out, file=paste0(outname, "_", sprintf("%.2e", 0.05/nrow(smr)), "_pass_smr_0.01_pass_heidi.txt"),
sep="\t", row.names=FALSE, quote=FALSE)
- generate plot file and plot the probe you interest
smr --bfile ${bfile} \
--gwas-summary ${gwas} \
--beqtl-summary ${eQTL} \
--out plot \
--plot \
--probe ENSG00000117601 \
--probe-wind 2000 Kb \
--gene-list ${Genelist} \
--thread-num 10
# --probe ENSG00000117601 通過p_smr 和 p_heidi的QTL probe
# --gene-list smr官網(wǎng)有提供
plot_SMR_image.r
SMR | Yang Lab
source("~/plot_smr/plot_SMR_image.r")
# Read the data file in R:
SMRData = ReadSMRData("ENSG00000117601.txt")
# Plot the SMR results in a genomic region centred around a probe:
pdf('ENSG00000117601.pdf',width = 10,height = 8)
SMRLocusPlot(data=SMRData, smr_thresh=7.98e-6, heidi_thresh=0.05, plotWindow=200, max_anno_probe=16)
dev.off()
# smr_thresh: genome-wide significance level for the SMR test.
# heidi_thresh: threshold for the HEIDI test. The default value is 0.05.
# cis_wind: size of a window centred around the probe to select cis-eQTLs for plot. The default value is 2000Kb.
# max_anno_probe: maximum number of probe names to be displayed on the figure. The default value is 16.
自用SMR
腳本,包括過濾巡通,分python
和R
版本
run_SMR_andfilter.py
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
'''
@File : run_SMR_andfilter.py
@Time : 2024/10/23 20:21:31
@Author : Lulu Shi
@Mails : crazzy_rabbit@163.com
@link : https://github.com/Crazzy-Rabbit
'''
import os
import argparse
import pandas as pd
def run_smr_qtl(gwas, qtl, outname):
SMR = "/public/home/lijunpeng/smr-1.3.1-linux-x86_64/smr-1.3.1"
bfile = "/public/home/lijunpeng/smr-1.3.1-linux-x86_64/g1000_eur/g1000_eur"
os.system(f'{SMR} --bfile {bfile} \
--beqtl-summary {qtl} \
--gwas-summary {gwas} \
--diff-freq-prop 0.5 \
--maf 0.01 \
--cis-wind 2000 \
--out {outname} \
--thread-num 10 > {outname}.log')
def run_filter(smr, outname):
smr_file = pa.read_csv(smr, sep="\t")
smr_tsh = "{:.2e}".format(0.05 / len(smr_file))
flt_file = smr_file[(smr_file['p_SMR'] <= 0.05 / len(smr_file)) & (smr_file['p_HEIDI'] >= 0.01)]
flt_file.to_csv(f"{outname}_{smr_tsh}smr_0.01heidi.txt", sep="\t", index=False, quoting=False)
def main():
parser = argparse.ArgumentParser(description="Run SMR and filter results.")
parser.add_argument("--func", type=str, default="both", help="Function to run: 'run_smr_qtl', 'run_filter', or 'both'")
parser.add_argument("--gwas", type=str, help="GWAS summary data to run smr")
parser.add_argument("--qtl", type=str, help="QTL file to run smr")
parser.add_argument("--smr", type=str, help="SMR file for filtering")
parser.add_argument("--out", type=str, help="Output prefix")
args = parser.parse_args()
gwas = args.gwas
qtl = args.qtl
out = args.out
func = args.func
smr = args.smr
if func == "run_smr_qtl":
if not gwas or not smr:
raise ValueError("When func is 'run_smr_qtl', --gwas and --qtl must be provided.")
run_smr_qtl(gwas, qtl, out)
elif func == "run_filter":
if not smr:
raise ValueError("When func is 'run_filter', --smr must be provided.")
run_filter(smr, out)
else:
if not gwas or not smr:
raise ValueError("When func is 'both' or not provided, --gwas and --qtl must be provided.")
run_smr_qtl(gwas, qtl, out)
run_filter(f"{out}.smr", out)
if __name__ == '__main__':
main()
run_SMRinR_filter.r
用法:run SMR and filter
Rscript run_SMRinR_filter.r --func run_smr_qtl --gwas ${gwas} --qtl ${mQTL} --out ${outdir}/${outprx}
run filter
Rscript run_SMRinR_filter.r --func run_filter --smr ${smr} --out ${outprx}
#***********************************************#
# File : run_SMRinR_filter.r #
# Time : 2024/10/23 20:50:51 #
# Author : Lulu Shi #
# Mails : crazzy_rabbit@163.com #
# link : https://github.com/Crazzy-Rabbit #
#***********************************************#
library(dplyr)
library(data.table)
# r function run smr qtl to trait
run_smr_qtl <- function(gwas, qtl, outname){
SMR="/public/home/lijunpeng/smr-1.3.1-linux-x86_64/smr-1.3.1"
bfile="/public/home/lijunpeng/smr-1.3.1-linux-x86_64/g1000_eur/g1000_eur"
system(paste(SMR, "--bfile", bfile,
"--beqtl-summary", qtl,
"--gwas-summary", gwas,
"--diff-freq-prop 0.5",
"--maf 0.01",
"--cis-wind 2000",
"--out", outname,
"--thread-num 10"), intern=TRUE) -> mylog
writeLines(mylog, paste0(outname, ".log"))
}
# p_SMR & p_HEIDI filter
run_filter <- function(smr, outname){
smr_file = fread(smr)
smr_tsh = sprintf("%.2e", 0.05/nrow(smr_file))
flt_file = smr_file %>% filter(p_SMR <= 0.05/n() & p_HEIDI >= 0.01)
write.table(flt_file, file=paste0(outname, "_", smr_tsh, "smr_0.01heidi.txt"), sep="\t", row.names=FALSE, quote=FALSE)
}
# args <- commandArgs(TRUE)
# gwas <- args[which(args == "--gwas") + 1]
# qtl <- args[which(args == "--qtl") + 1]
# out <- args[which(args == "--out") + 1]
# func <- ifelse("--func" %in% args, args[which(args == "--func") + 1], "all")
library(optparse)
option_list <- list(
make_option(c("--func", type="character"), default="both", help="Function to run: 'run_smr_qtl', 'run_filter', or 'both'"),
make_option(c("--gwas"), type="character", help="GWAS summary data to run smr"),
make_option(c("--qtl"), type="character", help="qtl file to run smr"),
make_option(c("--out"), type="character", help="out prefix"),
make_option(c("--smr"), type="character", help="SMR file for filtering")
)
opt_parser <- OptionParser(option_list=option_list)
opts <- parse_args(opt_parser)
gwas <- opts$gwas
qtl <- opts$qtl
out <- opts$out
func <- opts$func
smr <- opts$smr
if (func == "run_smr_qtl"){
if (is.null(gwas) || is.null(qtl)) {
stop("When func is 'run_smr_qtl', --gwas and --eqtl must be provided.")
}
run_smr_qtl(gwas, qtl, out)
} else if (func == "run_filter"){
if (is.null(smr)) {
stop("When func is 'run_filter', --smr must be provided")
}
run_filter(smr, out)
} else{
if (is.null(gwas) || is.null(qtl)) {
stop("When func is not provided, --gwas and --qtl must be provided.")
}
run_smr_qtl(gwas, qtl, out)
run_filter(paste0(out, ".smr"), out)
}
二尘执、run multi-omics SMR
即三步SMR分析,前兩步和上述1中的腳本一樣宴凉,只是換了QTL文件 \
1 SMR mQTL and trait
2 SMR eQTL and trait
3 SMR between mQTL and eQTL
First, we map the methylome to the transcriptome in cis-regions by testing the associations of DNAm with their neighbouring genes (within 2?Mb of each DNAm probe) using the top associated mQTL as the instrumental variable (Methods).
Next, we prioritize the trait-associated genes by testing the associations of transcripts with the phenotype using the top associated eQTL.
Last, we prioritize the trait-associated DNAm sites by testing associations of DNAm sites with the phenotype using the top associated mQTL.
If the association signals are significant in all three steps, then we predict with strong confidence that the identified DNAm sites and target genes are functionally relevant to the trait through the genetic regulation of gene expression at the DNAm sites. The SMR & HEIDI method assumes consistent LD between two samples. This assumption is generally satisfied by using data from samples of the same ancestry. (Wu et al. Nat Commun)
SMR between mQTL and eQTL 稍微有點(diǎn)不同
smr --bfile ${bfile} \
--beqtl-summary ${mQTL} \
--beqtl-summary ${bqtl} \
--extract-exposure-probe mqtl_probeID.txt \
--extract-outcome-probe eqtl_probeID.txt \
--diff-freq-prop 0.5 \
--thread-num 10 \
--out ${outdir}/m2eqtl > ${outdir}/m2eqtl.log 2>&1
當(dāng)然誊锭,經(jīng)過后來的優(yōu)化,這種三步式SMR
已被集成為了軟件OPERA (Wu et al. Cell Genom)
Generate multi omics plot file using OPERA
$opera --bfile $bfile \
--gwas-summary $gwas \
--beqtl-summary $eQTL \
--besd-flist mqtl_list \
--plot \
--probe ENSG00000072778 \
--probe-wind 2000 \
--gene-list $Genelist \
--out my_plot
plot the probe you interest
source("~/plot_smr/plot_OmicsSMR_xQTL.r")
SMRData = ReadomicSMRData("my_plot.ENSG00000072778.txt")
pdf("test_opera.pdf", height=12, width=10)
omicSMRLocusPlot(data=SMRData, esmr_thresh=3.19e-06, msmr_thresh=5.37e-07, m2esmr_thresh=5.75e-04,
window=300, anno_methyl=TRUE, highlight=NULL, annoSig_only=TRUE,
eprobeNEARBY="ENSG00000072778", mprobeNEARBY=c("cg03613822", "cg12805420", "cg19466160"),
epi_plot=TRUE, funcAnnoFile="~/plot_smr/funcAnno.RData")
dev.off()
# epi_plot=TRUE if you want plot epigenome
# anno_methyl=TRUE if you want the mQTL can annote in plot
where funcAnno.RData and plot_OmicsSMR_xQTL.r has been provided