LD(Linkage Disequilibrium)
連鎖不平衡是指在一對(duì)基因座或基因組區(qū)域中实愚,兩個(gè)或多個(gè)等位基因的組合頻率與它們?cè)陔S機(jī)組合假設(shè)下的期望頻率之間存在顯著差異。簡(jiǎn)單來(lái)說(shuō),如果兩個(gè)或多個(gè)基因座的等位基因之間的組合頻率不同于它們?cè)诜N群中獨(dú)立隨機(jī)組合的期望頻率,就存在LD連鎖不平衡。
LD連鎖不平衡通常是由于基因座之間存在物理上的或遺傳上的連鎖(linkage)
而產(chǎn)生的克饶。```連鎖是指在同一染色體上的基因座之間存在緊密的遺傳聯(lián)系,使得它們?cè)谶z傳上傾向于一起傳遞給后代``誊辉。當(dāng)連鎖發(fā)生時(shí)矾湃,不同基因座上的等位基因會(huì)以非隨機(jī)的方式組合,導(dǎo)致LD連鎖不平衡的出現(xiàn)堕澄。
LD連鎖不平衡的存在對(duì)遺傳學(xué)和基因組研究具有重要意義邀跃。它可以揭示基因座之間的遺傳聯(lián)系和基因組結(jié)構(gòu)霉咨,為基因相關(guān)性研究、基因定位和基因組關(guān)聯(lián)研究提供重要的信息坞嘀。此外躯护,LD連鎖不平衡還與種群遺傳演化惊来、復(fù)雜疾病的易感性等方面有關(guān)丽涩,對(duì)理解和預(yù)測(cè)相關(guān)表型和疾病風(fēng)險(xiǎn)也有一定的價(jià)值。
# 下載
git clone https://github.com/BGI-shenzhen/PopLDdecay.git
# 安裝
cd PopLDdecay
chmod 755 ./configure
./configure
make
計(jì)算LD
~/soft/PopLDdecay/PopLDdecay -InVCF SE.vcf -OutStat ALL_SNP
A simple example
#1) For gatk VCF file deal , run PopLDdecay direct
./bin/PopLDdecay -InVCF SNP.vcf.gz -OutStat Lddecay.stat.gz
# 2) For plink [.ped .map], chang plink 2 genotype first 2) run PopLDdecay
perl bin/mis/plink2genotype.pl -inPED in.ped -inMAP in.map -outGenotype out.genotype ; ./bin/PopLDdecay -InGenotype out.genotype -OutStat LDdecay.stat.gz
# 3) To Calculate the subgroup GroupA LDdecay in VCF Files # put GroupA sample names into GroupA_sample.list file
./bin/PopLDdecay -InVCF -OutStat -SubPop GroupA_sample.list
Muti population
This is common situation in the LD decay analysis. For example, if there are 50 samples (wild1, wild2, wild3…wild25, cul1, cul2, cul3…cul25) in the VCF file,
To compare the LD decay of these two groups (wild vs cultivation), first of all, put their sample names into own file list for each group, column or row is ok.
./bin/PopLDdecay -InVCF In.vcf.gz -OutStat wild.stat.gz -SubPop wildName.list
./bin/PopLDdecay -InVCF In.vcf.gz -OutStat cul.stat.gz -SubPop culName.list
# created manually muti.list by yourself
perl bin/Plot_MutiPop.pl -inList muti.list -output OutputPrefix
Note:
A.The <wildName.list> can list as follow(column or row is ok):
wild1
wild2
…
Wild25
B.The format of <muti.list> had two columns , the file path of population result and the population flag, such as:
/ifshk7/BC_PS/Lddecay/wild.stat.gz wild
/ifshk7/BC_PS/Lddecay/cul.stat.gz cultivation
畫(huà)圖
# 2.1 For one Population
perl bin/Plot_OnePop.pl -inFile LDdecay.stat.gz -output Fig
# 2.2 For one Population multi chr # List Format [chrResultPathWay]
perl bin/Plot_OnePop.pl -inList Chr.ResultPath.List -output Fig
# 2.3 For multi Populations # List Format :[Pop.ResultPath PopID]
perl bin/Plot_MultiPop.pl -inList Pop.ResultPath.list -output Fig
perl /public/home/fengting/soft/PopLDdecay/bin/Plot_MultiPop.pl -inList res -output Fig
R語(yǔ)言代碼:
LD <- read.table(gzfile("./OUT.bin.gz"),header = T,sep = "\t") # 讀入數(shù)據(jù)
pdf('1.pdf')
plot(data$Dist/1000, data$Mean, type = 'b', col = 'blue', main = 'LD decay', xlab = 'Distance', ylab = 'Mean')
segments(0,0.3368,67,0.3368,lty=4,col="red")
segments(67,0,67,0.3368,lty=4,col="red")
text(67,0.3368,"(67,0.3368881)",pos=4,font=4,cex=1)
dev.off()