
Missing data are ubiquitous in nearly every data analytic enterprise. Simple ad-hoc techniques for dealing with missing values such as deleting incomplete cases or replacing missing values with the item mean can cause a host of (hidden) problems. In this workshop, we will discuss principled methods for treating missing data and how to apply these methods in R. We will cover some basic missing data theory, methods for exploring/quantifying the extent of the missing data problem and two principled methods for correcting the missing data: multiple imputation and full information maximum likelihood. Participants will practice what they learn via practical exercises.
The workshop content will be presented via a combination of short lectures and live R analysis demonstrations. Workshop participants will practice what they learn on-the-fly by following along with the demonstration scripts and completing in-situ practical exercises. If the schedule permits, the participants are also welcome to ask the instructor for advice on their own data analyses.
Participants should install both R and RStudio (the free desktop version) on their computers before the beginning of the course.
Lecturers
Kyle Lang
Target audience
Professionals who seek a master-level introduction to missing data analysis
Please note that there are no graded activities included in this course. Therefore, we are not able to provide students with a transcript of grades. You will obtain a certificate upon completion of this course.
For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here.
Aim of the course
After completing this course, participants can:
- Describe the most important characteristics of a missing data problem and choose appropriate statistics, metrics, or visualizations to quantify/illustrate those characteristics.
- Describe the three missing data mechanisms and their effects on data analyses.
- Describe the fraction of missing information, how it is interpreted, and why it is important.
- Describe the strengths and weaknesses of traditional, ad-hoc missing data treatments.
- Describe multiple imputation (MI): what it is, why it works, and why it is superior to traditional, ad-hoc techniques.
- Describe the steps in an MI-based analysis.
- Describe full information maximum likelihood (FIML): what it is, why it works, and why it is superior to traditional ad-hoc techniques.
- Compare and contrast the relative strengths and weaknesses of MI and FIML.
- Write basic R scripts to do the following:
- Explore a missing data problem with appropriate statistics, metrics, and visualizations.
- Conduct an MI-based analysis.
- Conduct a FIML-based analysis.
Study load
Approximately 8 classroom hours
Costs
- Tuition fee for PhD students from the Faculty of Social and Behavioural Sciences from Utrecht University will be funded by the Graduate School of Social and Behavioural Sciences.
- The tuition fee for staff off the Faculty of Social and Behavioural Sciences from Utrecht University will be funded by FSBS
Utrecht Summer School does not offer scholarships for this course.
Contact details
Irma Reyersen | E: ms.summerschool@uu.nl