研究思路
自閉癥的早期診斷標志物這篇推文簡單介紹了這類研究的基本思路。
統(tǒng)計分析
原文An Exploratory Examination of Neonatal Cytokines and Chemokines as Predictors of Autism Risk: The Early Markers for Autism Study中的統(tǒng)計方法如下
Partial least squares discriminant analysis (PLS-DA) was
performed to examine whether different combinations of
multiple cytokines could be used to differentiate between
child developmental outcomes. Initially, linear regression
analysis was performed on each transformed immune marker
individually using the covariates stated above to generate
residuals for use in the PLS-DA. Eotaxin-2, epithelial
neutrophil-activating protein 78, granulocyte macrophage
colony-stimulating factor, eotaxin-1, interferon-g (IFN-g),
IL-4, monocyte chemoattractant protein 4 (MCP-4), and IL-13
all violated assumptions of linearity in the linear regression
model and were therefore excluded from the PLS-DA. The
PLS-DA was computed using the web-based MetaboAnalyst
software in accordance with the protocol by Xia and Wishart
(24). Analysis was performed using leave-one-out cross-
validation and prediction accuracy performance measure for
determining the number of latent variables. The permutation
statistic was performed using prediction accuracy during
training with 2000 permutations.
采用偏最小二乘判別分析(PLS-DA)檢驗是否可以使用多種細胞因子的不同組合來區(qū)分兒童發(fā)育結(jié)果。最初谷遂,使用上述協(xié)變量對每個轉(zhuǎn)化后的免疫標記分別進行線性回歸分析,以生成殘差用于PLS-DA叠殷。Eotaxin-2甩骏、上皮中性粒細胞活化蛋白78、粒細胞巨噬細胞集落刺激因子肮街、eotaxin-1风题、干擾素-g (IFN-g)、IL-4嫉父、單核細胞趨化蛋白4 (MCP-4)俯邓、IL-13均違反線性回歸模型的線性假設(shè),被排除在PLS-DA之外熔号。PLS-DA是由Xia和Wishart(24)根據(jù)協(xié)議使用基于web的MetaboAnalyst軟件計算出來的稽鞭。采用無遺漏交叉驗證和預(yù)測精度性能指標進行分析,以確定潛在變量的數(shù)量引镊。排列統(tǒng)計采用2000個排列的訓(xùn)練預(yù)測精度進行朦蕴。(機譯)
偏最小二乘判別分析(PLS-DA)
偏最小二乘判別分析(PLS-DA)是一種用于判別分析的多變量統(tǒng)計分析方法。判別分析是一種根據(jù)觀察或測量到的若干變量值弟头,來判斷研究對象如何分類的常用統(tǒng)計分析方法吩抓。其原理是對不同處理樣本(如觀測樣本、對照樣本)的特性分別進行訓(xùn)練赴恨,產(chǎn)生訓(xùn)練集疹娶,并檢驗訓(xùn)練集的可信度。
偏最小二乘回歸(Partial least squares regression)與主成分回歸相關(guān)伦连,但不是尋找響應(yīng)變量和自變量之間最大方差超平面雨饺,而是通過投影分別將預(yù)測變量和觀測變量投影到一個新空間,來尋找一個線性回歸模型惑淳。因為數(shù)據(jù)X和Y都會投影到新空間额港,PLS系列的方法都被稱為雙線性因子模型(bilinear fator models)。當Y是分類數(shù)據(jù)時稱為偏最小二乘判別分析(Partial least squares Discriminant Analysis歧焦, PLS-DA)移斩。
我的理解:建立一個線性回歸模型來預(yù)測分類。
R語言如何進行PLS-DA
ropls: PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data