ds
Course

Data Science: Solving Missing Data Problems in R

This 4-day course provides cutting-edge techniques for addressing missing data problems, focusing on the intersection of statistical theory and modern machine learning workflows.

€840

Specifications

-
Course Level
Master
ECTS credits
1.5 ECTS
Course location(s)
Utrecht, The Netherlands

Description

Missing data often disrupt real-world machine learning and AI applications. Through a mix of lectures, hands-on labs, and practical case studies, you will learn to implement flexible, scalable, and statistically sound solutions using R, with emphasis on real-world applicability.
Led by experts in the field, including developers of the widely used mice package, the course explores contemporary approaches for generating imputations, synthesizing data, integrating imputation into AI workflows, and diagnosing the impact of missing data in statistical models and predictive modeling pipelines.

Most researchers need to deal with incomplete data.  Missing data complicate the statistical analysis of data. Simply removing the missing data is not a good strategy and can bias the results. Multiple imputation is a general and statistically valid technique to analyze incomplete data. Multiple imputation has rapidly becoming the standard in social and behavioural science research.

This course will explain modern and flexible imputation and data synthesis techniques that are able to preserve salient data features. The course enhances participants’ knowledge of imputation principles and to provides flexible hands-on solutions to incomplete data problems. The course discusses principles of missing data theory, outlines a step-by-step approach toward creating high quality imputations, and provides guidelines on how to report the results. The course will use the authors’ MICE package in R to illustrate practical solutions to real data problems. The concepts and applications of the illustrated methodology would be equally applicable to other programming languages.

The course materials will follow the book “Flexible Imputation of Missing Data” by Stef van Buuren ( 2nd edition, Chapman & Hall, 2018) as well as a collection of papers and vignettes by the course team. The book can be read online for free at https://stefvanbuuren.name/fimd/.

Format of the course
We iterate short lectures with hands-on practical sessions and plenary discussion of the practicals. This ensures that we form an interactive group of participants that learns the theory and practice of multiple imputation in bite-size blocks. Each block builds up to the next one. We invite participants to share their own experience and challenges during these blocks so that we can foster a collaborative learning environment. 

Prerequisites
Participants should have a basic knowledge of scripting and programming in R. Participants who have limited experience with R need to have followed a relevant R course beforehand, such as 

  • Winter School Introduction to R (S002)
  • Summer School Data Science: Statistical Programming in R (S24)

or any similar level course elsewhere.

The theory and practice discussed in this course requires that participants are familiar with basic statistical concepts and techniques, such as linear modeling, prediction, least squares estimation and hypothesis testing. Participants are requested to bring their own laptop for lab meetings.

Lecturers

Prof. dr. Stef van Buuren (Netherlands Organization for Applied Scientific Research (TNO) and Utrecht University)
 

Target audience

This course is ideal for:

  • Data scientists working on robust imputation strategies for real-world datasets.
  • Applied researchers seeking statistically sound solutions for incomplete data.
  • Machine `learners` tackling incomplete data in their models.

This course is relevant for anyone that would like to get acquainted with incomplete data theory and the practice of imputation and data synthesis. Participants should have basic understanding of statistical techniques (such as analysis of variance and (non)linear regression) and the concept of statistical inference. This course is suitable for students at Master level, Advanced master level en PhD level. A max. of 50 participants will be allowed in this course. Please note that the selection for this course will be done on a first-come-first-served basis.

For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here

We also offer tailor-made M&S courses and in-house M&S training. If you want to look at the possibilities, please contact Dr. Laurence Frank at pe.dsai@uu.nl. 

Aim of the course

  • To enhance participants’ knowledge of imputation methodology;
  • To get comfortable with flexible solutions to deal with incomplete data using R.
     

Learning goals:

  • Participants will learn to make informed decisions on how to handle incomplete data in a scientifically valid way
  • Participants will be able to implement the approach taken using state-of-the-art R technology

Study load

The course runs for four full days, typically starting at 9:00 and ending at 17:00. Each day includes breaks for coffee and tea, lunch, as well as drinks and snacks.

You will receive a certificate upon course completion. Please be aware that this course does not include graded activities, and therefore we cannot provide a transcript of grades.

Costs

  • Course fee: €840.00
  • Included: Course + course materials + lunch
  • Housing fee: €275
  • Housing provider: Utrecht Summer School

This course has the following fee options, depending on your status:

  • Participants affiliated with an academic organization (MSc, PhD, researchers):  € 840
  • Participants working in a non-academic organization:  € 1000

Please make sure to include which price is applicable when registering for this course. This information can be added in the “Comment” field during the registration process.

For PhD students from the FSBS at UU:
As a PhD student from the Faculty of Social and Behavioural Sciences (FSBS) at Utrecht University, you can attend up to three Winter or Summer School courses funded by the Graduate School of Social and Behavioural Sciences. Of course, you may choose to take as many other courses as you wish at your own expense, using your personal budget.
When registering, please indicate in the “Comment” field that you are a PhD candidate from the FSBS at UU, so that the course fee can be waived.

Additional information

The housing costs do not include a Utrecht Summer School sleeping bag. This is a separate product on the invoice. If you wish to bring your own bedding, please deselect or remove the sleeping bag from your order. 

Application

Please include a short description of your (scientific) background.

Tags