臨床大數(shù)據(jù)研究系列文獻(xiàn)分享第5篇,由浙江大學(xué)章仲恒老師撰寫的臨床大數(shù)據(jù)系列專欄文章發(fā)表在 Annals of Translational雜志,這篇文章主要介紹的是介紹Logistic回歸模型的構(gòu)建策略软驰。這里只做學(xué)習(xí)交流锭亏,版權(quán)歸原作者所有。
摘要
Logistic回歸是解決醫(yī)學(xué)文獻(xiàn)中混雜因素的最常用模型之一碑隆。本文介紹了如何使用R執(zhí)行有目的的選擇模型構(gòu)建策略。作者著重于介紹使用似然比檢驗(yàn)來查看刪除變量是否會(huì)對(duì)模型擬合產(chǎn)生重大影響劫狠。還應(yīng)檢查已刪除的變量,以確定它是否對(duì)剩余協(xié)變量的重要調(diào)整懦砂。應(yīng)檢查交互作用罚随,以弄清協(xié)變量之間的復(fù)雜關(guān)系及其對(duì)響應(yīng)變量的協(xié)同作用潮改。應(yīng)該檢查模型的擬合優(yōu)度 goodness-of-fit(GOF)。換句話說脏答,擬合模型如何反映真實(shí)數(shù)據(jù)趾疚。 Hosmer-Lemeshow GOF檢驗(yàn)是用于Logistic回歸模型的最廣泛的檢驗(yàn)。
介紹
Logistic回歸模型是研究變量對(duì)醫(yī)學(xué)文獻(xiàn)中二項(xiàng)式結(jié)果的獨(dú)立影響的最廣泛使用的模型之一以蕴。但是糙麦,許多研究并未明確提出模型建立策略,從而損害了結(jié)果的可靠性和可重復(fù)性丛肮。文獻(xiàn)中報(bào)道了多種模型構(gòu)建策略赡磅,例如有目的地選擇變量,逐步選擇和最佳子集宝与。但是焚廊,究竟哪一種方法好還沒有被證明,也不得而知习劫,模型構(gòu)建策略是“部分科學(xué)咆瘟,部分統(tǒng)計(jì)方法以及部分經(jīng)驗(yàn)和常識(shí)”。模型構(gòu)建的原理是選擇盡可能少的變量诽里,但是模型(簡約模型)仍然反映了數(shù)據(jù)的真實(shí)結(jié)果袒餐。在本文中,作者介紹了如何在R中執(zhí)行有目的的選擇谤狡。變量選擇是模型構(gòu)建的第一步灸眼。其他步驟將在后續(xù)文章中介紹。
附上原文
參考文獻(xiàn)
Cite this article as: Zhang Z. Model building strategy for logistic regression: purposeful selection. Ann Transl Med 2016;4(6):111. doi: 10.21037/atm.2016.02.15
Bursac Z, Gauss CH, Williams DK, et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 2008;3:17. [Crossref] [PubMed]
Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health 1989;79:340-9. [Crossref] [PubMed]
Model-building strategies and methods for logistic regression. In: Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2000;63.
Zhang Z, Chen K, Ni H, et al. Predictive value of lactate in unselected critically ill patients: an analysis using fractional polynomials. J Thorac Dis 2014;6:995-1003. [PubMed]
Zhang Z, Ni H. Normalized lactate load is associated with development of acute kidney injury in patients who underwent cardiopulmonary bypass surgery. PLoS One 2015;10:e0120466. [Crossref] [PubMed]
Zhang Z, Xu X. Lactate clearance is a useful biomarker for the prediction of all-cause mortality in critically ill patients: a systematic review and meta-analysis*. Crit Care Med 2014;42:2118-25. [Crossref] [PubMed]
Kabacoff R. R in action. Cherry Hill: Manning Publications Co; 2011.
Bendal RB, Afifi AA. Comparison of stopping rules in forward regression. Journal of the American Statistical Association 1977;72:46-53.
Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation. Am J Epidemiol 1989;129:125-37. [PubMed]
Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol 1999;28:964-74. [Crossref] [PubMed]
Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Applied Statistics 1994;43:429-67. [Crossref]
Hosmer DW, Hjort NL. Goodness-of-fit processes for logistic regression: simulation results. Stat Med 2002;21:2723-38. [Crossref] [PubMed]