Missing Data in R

Course code
S007
Course fee
Price
€150
Course Level
Master

Missing data are ubiquitous in nearly every data analytic enterprise. Simple ad-hoc techniques for dealing with missing values such as deleting incomplete cases or replacing missing values with the item mean can cause a host of (hidden) problems. In this workshop, we will discuss principled methods for treating missing data and how to apply these methods in R. We will cover some basic missing data theory, methods for exploring/quantifying the extent of the missing data problem and two principled methods for correcting the missing data: multiple imputation and full information maximum likelihood. Participants will practice what they learn via practical exercises.

The workshop content will be presented via a combination of short lectures and live R analysis demonstrations. Workshop participants will practice what they learn on-the-fly by following along with the demonstration scripts and completing in-situ practical exercises. If the schedule permits, the participants are also welcome to ask the instructor for advice on their own data analyses.

Participants should install both R and RStudio (the free desktop version) on their computers before the beginning of the course.

  • R can be downloaded here.
  • RStudio can be downloaded here.
  • We will not cover basic R usage. Participants should already know how to use R to read and write data, do basic data manipulations, run R functions, and work with the results returned by R functions.
Course director
Kyle Lang

Lecturers

Kyle Lang

Target audience

Professionals who seek a master-level introduction to missing data analysis

Please note that there are no graded activities included in this course. Therefore, we are not able to provide students with a transcript of grades. You will obtain a certificate upon completion of this course.

For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here.

Aim of the course

After completing this course, participants can:

  1. Describe the most important characteristics of a missing data problem and choose appropriate statistics, metrics, or visualizations to quantify/illustrate those characteristics.
  2. Describe the three missing data mechanisms and their effects on data analyses.
  3. Describe the fraction of missing information, how it is interpreted, and why it is important.
  4. Describe the strengths and weaknesses of traditional, ad-hoc missing data treatments.
  5. Describe multiple imputation (MI): what it is, why it works, and why it is superior to traditional, ad-hoc techniques.
  6. Describe the steps in an MI-based analysis.
  7. Describe full information maximum likelihood (FIML): what it is, why it works, and why it is superior to traditional ad-hoc techniques.
  8. Compare and contrast the relative strengths and weaknesses of MI and FIML.
  9. Write basic R scripts to do the following:
  • Explore a missing data problem with appropriate statistics, metrics, and visualizations.
  • Conduct an MI-based analysis.
  • Conduct a FIML-based analysis.

Study load

Approximately 8 classroom hours

Costs

Course fee:
Price
€150
Included:
Fee covers
Course + course materials
Extra information about the fee

Tuition fee for PhD students from the Faculty of Social and Behavioural Sciences from Utrecht University will be funded by the Graduate School of Social and Behavioural Sciences.

Utrecht Summer School does not offer scholarships for this course.

Contact details

Irma Reyersen | E: ms.summerschool@uu.nl

Registration

Application deadline: 
Registration deadline
18 January 2023