Methods of Dealing with Missing Data: Advantages, Disadvantages, Theoretical Approaches and Application of Software

Document Type : Original Article

Authors


آشفته، افشین (1392). بررسی روش­های برخورد با داده­های گمشده. مجله اندیشه آماری، 2، 40-47.
افشاری صفوی، علیرضا؛ کاظم­زاده قره‌چبق، حسین و رضایی، منصور (1394). مقایسه روش الگوریتم EM و روش­های متداول جانهی داده­های گمشده: مطالعه روی پرسشنامه خوددرمانی بیماران دیابتی، مجله تخصص اپیدمیولوژی ایران؛ 11 (3)، 43 – 51.
پورحسینقلی، محمدامین؛ علوی مجد، حمید؛ ابدی، علیرضا و پروانه­وار، سیمین (1384). تحلیل درست‌نمایی ماکسیمم مدل رگرسیون لجستیک در حالتی که داده‌های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند، مجله اپیدمیولوژی ایران، 1 (2)،  65 – 72.
رشیدی‌نژاد، آسیه و نواب­پور، حمیدرضا (1389). مقایسه جانهی الگوریتم EM با دو روش جانهی میانگین و نمونه­های جدید در آمارگیری­های پانلی. مجله بررسی‌های آمار رسمی ایران، 21 (1)، 89 – 108.
زائری، فرید؛ اکبرزاده باغبان، علی­رضا؛ کاظم‌زاده، مژگان؛ یاسری، مهدی و عباسی، علی­محمد (1391). انواع گمشدگی در مطالعات طولی و روش­های مبنی بر درست‌نمایی برای تحلیل آنها. مجله علمی دانشگاه علوم پزشکی ایلام، 4، 208 -222.
قاسمی، وحید (1389). مدل­سازی معادله ساختاری در پژوهش­های اجتماعی با کاربرد Amos. تهران: انتشارات جامعه‌شناسان.
 
Allison, P. D. (2005). Imputation of categorical variables with PROC MI. [accessed July 30, 2006]. http://www2.sas.com/proceedings/sugi30/113-30.pdf.
Arbuckle, J. L., & Wothke, W. (1999). AMOS 4.0 user’s guide [Computer software manual]. Chicago: Smallwaters.
Bernaards, C. A.; Belin, T. R. & Schafer, J. L. (2007). Robustness of multivariate normal approximation for imputation of incomplete binary data. Statistics in Medicine, 26, 1368–1382.
BMDP Statistical Software. (1992). BMDP statistical software manual. Los Angeles: University of California Press.
Bryk, A. S.; Raudenbush, S. W., & Congdon, R. T. (1996). Hierarchical linear and nonlinear modeling with the HLM/2L and HLM/3L programs. Chicago: Scientific Software International.
De Leeuw, E. D.; Hox, J. J., & Huisman, M. (2003). Prevention and treatment of item nonresponse. Journal of Official Statistics, 19, 153–176.
Dempster, A. P.; Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39 (1), 1–22.
Donneau, A. F.; Mauer, M.; Molenberghs, G. & Albert, A. (2015). Communications in Statistics – Simulation and Computation: A Simulation Study Comparing Multiple Imputation Methods for Incomplete Longitudinal Ordinal Data. Communications in Statistics—Simulation and Computation, 44, 1311-1338.
Fleiss, J. L.; Levin, B. & Paik, M. C. (2002). Statistical Methods for Rates and Proportions, 3rd ed. John Wiley & Sons.
Gellman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New York.
Glynn, R. J. & Laird, N. M. (1983). Regression Estimates and MissingData: Complete Case Analysis. Unpublished Manuscript, Department of Biostatistics, Harvard University.
Graham, J. W., & Hofer, S. M. (1991). EMCOV.EXE users 'guide [Computer software manual]. Unpublished manuscript,University of Southern California, Los Angeles.
Graham, J.; Hofer, S., & MacKinnon, D. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197–218. Doi: 10.1207/ s15327906mbr3102_3.
Haitovsky, Y. (1968). Missing data in regression analysis. Journal of the Royal Statistical Society: Series B, Methodological, 30, 67–82.
Honaker, J.; King, G.; Blackwell, M.  (2006). Amelia software website. Accessed December 15, 2006]. http://gking.harvard.edu/amela.
Honaker, J. & King, G. (2006). What to do about missing values in time series cross-section data. [Accessed December 17, 2006]. http://gking.harvard.edu/files/abs/pr-abs.shtml.
Horton, N. J.; Lipsitz, S. R., & Parzen, M. (2003). A potential for bias when rounding in multiple imputation. TheAmerican Statistician, 57 (4), 229–232.
Horton, N. J., & Kleinman, K. P. (2007). Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician, 61(1), 79–90.
Imai, K.; King, G. & Lau, O. (2006). Zelig software website. [Accessed December15, 2006].http://gking.harvard.edu/zelig.
Insightful (2001). S-PLUS (Version 6) [Computer software]. Seattle, WA: Insightful.
Jo¨reskog, K. G., & So¨rbom, D. (2001). LISREL (Version8.5) [Computer software]. Chicago: Scientific Software International.
Kim, J. (2004). Finite sample properties of multiple imputation estimators. Annals of Statistics, 32, 766–783. Doi: 10.1214/009053604000000175.
King, G; Honaker, J.; Joseph, A. & Scheve, K. (2001). Analyzing incomplete political science data: an alternative algorithm for multiple imputation. American Political Science Review, 95, 49–69.
Little, R. J. & Rubin, D. B. (1987). Statistical analysis with missing data. Wiley New York.
Littell, R. C.; Milliken, G. A.; Stroup, W. W., & Wolfinger, R. D. (1996). SAS system for mixed models. Cary, NC: SAS Institute.
Little, R. J. A. & Rubin, D. B. (2002). Statistical analysis with missing data. John Wiley & Sons; New York.
Marwala, T. (2009). Computational Intelligence for Missing Data Imputation, Estimation andManagement:Knowledge Optimization Techniques, South Africa: University of Witwatersrand IGI Global 2009 ISBN 978-1-60566-336-4.
McKnight, P.; McKnight, K.; Sidani, S., & Figueredo, A. (2007). Missing data: A gentle introduction. New York, NY: Guilford Press.
Multilevel Models Project (1996). Multilevel modeling applications—A guide for users of MLn. [Computer softwaremanual]. London: University of London, Institute ofEducation.
Muthe´n, L. K., & Muthe´n, B. O. (1998). Mplus user’sguide [Computer software manual]. Los Angeles: Muthe´n & Muthe´n.
Neale, M. C.; Boker, S. M.; Xie, G., & Maes, H. H. (1999). Mx: Statistical modeling (5th Ed.) [Computer software]. Richmond: Virginia Commonwealth University, Department of Psychiatry.
Nirelli, L. M.; Larsen, M. D.; Croghan, I. T.; Schroeder, D. R.; Offord, K. P. & Hurt, R. D. (2005) Comparison of methods for handling missing data in a collegiate survey of tobacco use Proceedings of the Survey Research Methods Section, American Statistical Association. Alexandria, VA: American Statistical Association.
Peng, C.; Harwell, M.; Liou, S., & Ehman, L. (2006). Advances in missing data methods and implications for educational research. In S. S. Sawilowsky (Ed.), Real data analysis (pp. 31–78). Charlotte, NC: New Information Age.
Peugh, J., & Enders, C. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525–556. Doi: 10.3102/00346543074004525.
Rubin, D. B. (1987). Multiple Imputation for Nonresponsein Surveys. New York: John Wiley & Sons; 1987.
Robins, J. M., & Rotnitzky, A. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. Boston: Birkhauser.
Rubin, D. B. (1996). Multiple Imputation after 18+ Years (with discussion), J. A. Stat. Asso, 19, 473-489.
Salkind, N., & Rasmussen, K. (2007). Encyclopedia of measurement and statistics. Thousand Oaks, CA: Sage.
Stata. (2001). Stata user’s guide [Computer software manual]. College Station, TX: Author.
Schafer, J. L. (1997a). Analysis of incomplete multivariate data, Chapman & Hall, New York.
Schafer, J. L. (1997b). Introduction to multiple imputations for missing data problems, viewed 6 May 2002,<www.stat.psu.edu/~jls/asa97/slide7.html>.
Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Book number 72 in the Chapman & Hall series Monographs on Statistics and Applied Probability. London.
Schimert, J.; Schafer, J. L.; Westerberg, T.; Fraley, C., & Clarkson, D. (2001). Analyzing missing values in SPLUS. Seattle, WA: Insightful.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of American Statistical Association 82, 528–550.
Templ, M. & Filzmoser, P. (2008). Visualization of missing values using the R-package VIM, Reserach report cs-2008-1, Department of Statistics and Probability Theory, Vienna University of Technology.
Templ, M; Kowarik, A. & Filzmoser, P. (2011). Iterative stepwise regression imputation using standard and robust methods, Computational Statistics & Data Analysis, 55, 2793-2806.
Van Buuren, S. (2012). Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL.
Von, Hippel P. (2004). Biases in SPSS 12.0 missing value analysis. The American Statistician, 58 (2), 160–164.
Wayman, J. C. (2003). Multiple imputation for missing data: What is it and how can I use it, in Annual Meeting of the American Educational Research Association, Chicago, IL, pp. 2- 16.
Yuan, Y. C. (2000). Multiple imputation for missing data: Concepts and new development. In Proceedings of the Twenty-Fifth Annual SAS Users Group International Conference (Paper No. 267). Cary, NC: SAS Institute.
Young, W.; Weckman, G., & Holland, W. (2011). A survey of methodologies for the treatment of missing values within datasets: Limitations and benefits. Theoretical Issues in Ergonomics Science, 12, 15 – 43.