Description
Missing data often disrupt real-world machine learning and AI applications. Through a mix of lectures, hands-on labs, and practical case studies, you will learn to implement flexible, scalable, and statistically sound solutions using R, with emphasis on real-world applicability.
Led by experts in the field, including developers of the widely used mice package, the course explores contemporary approaches for generating imputations, synthesizing data, integrating imputation into AI workflows, and diagnosing the impact of missing data in statistical models and predictive modeling pipelines.
Most researchers need to deal with incomplete data. Missing data complicate the statistical analysis of data. Simply removing the missing data is not a good strategy and can bias the results. Multiple imputation is a general and statistically valid technique to analyze incomplete data. Multiple imputation has rapidly becoming the standard in social and behavioural science research.
This course will explain modern and flexible imputation and data synthesis techniques that are able to preserve salient data features. The course enhances participants’ knowledge of imputation principles and to provides flexible hands-on solutions to incomplete data problems. The course discusses principles of missing data theory, outlines a step-by-step approach toward creating high quality imputations, and provides guidelines on how to report the results. The course will use the authors’ MICE package in R to illustrate practical solutions to real data problems. The concepts and applications of the illustrated methodology would be equally applicable to other programming languages.
The course materials will follow the book “Flexible Imputation of Missing Data” by Stef van Buuren ( 2nd edition, Chapman & Hall, 2018) as well as a collection of papers and vignettes by the course team. The book can be read online for free at https://stefvanbuuren.name/fimd/.
Format of the course
We iterate short lectures with hands-on practical sessions and plenary discussion of the practicals. This ensures that we form an interactive group of participants that learns the theory and practice of multiple imputation in bite-size blocks. Each block builds up to the next one. We invite participants to share their own experience and challenges during these blocks so that we can foster a collaborative learning environment.
Prerequisites
Participants should have a basic knowledge of scripting and programming in R. Participants who have limited experience with R need to have followed a relevant R course beforehand, such as
or any similar level course elsewhere.
The theory and practice discussed in this course requires that participants are familiar with basic statistical concepts and techniques, such as linear modeling, prediction, least squares estimation and hypothesis testing. Participants are requested to bring their own laptop for lab meetings.
Data Science specialisation
This course can be taken separately, but is also part of a series of 8 courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics:
- Data Science: Advanced Techniques for Handling Missing Data in analysis and prediction workflows (This course)
- Data Science: Programming with Python (Course code S17, 7-11 July 2025)
- Data Science: Network Science (Course code S37, 7-11 July 2025)
- Data Science: Statistical Programming with R (Course code S24, 14-18 July 2025)
- Data Science: Applied Text Mining (Course code S42, 14-18 July 2025)
- Data Science: Machine Learning with Python (Course code S70, 21-25 July 2025)
- Data Science: Data Analysis (Course code S31, 2026)
- Data Science: Text Mining with R (Course code S41,2026)
Upon completing, within 5 years, 3 out of 8 courses in the Summer School Data Science specialisation (no more than one text mining course), students can obtain a certificate.
Please see here for more information about the full specialization
Lecturers
Prof. dr. Stef van Buuren (Netherlands Organization for Applied Scientific Research (TNO) and Utrecht University)
dr. Gerko Vink (Utrecht University)
Target audience
This course is ideal for:
- Data scientists working on robust imputation strategies for real-world datasets.
- Applied researchers seeking statistically sound solutions for incomplete data.
- Machine `learners` tackling incomplete data in their models.
This course is relevant for anyone that would like to get acquainted with incomplete data theory and the practice of imputation and data synthesis. Participants should have basic understanding of statistical techniques (such as analysis of variance and (non)linear regression) and the concept of statistical inference. This course is suitable for students at Master level, Advanced master level en PhD level. A max. of 50 participants will be allowed in this course. Please note that the selection for this course will be done on a first-come-first-served basis.
For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here
Aim of the course
- To enhance participants’ knowledge of imputation methodology;
- To get comfortable with flexible solutions to deal with incomplete data using R.
Learning goals:
- Participants will learn to make informed decisions on how to handle incomplete data in a scientifically valid way
- Participants will be able to implement the approach taken using state-of-the-art R technology
Study load
The course runs for five full days, typically starting at 9:00 and ending at 17:00. Each day includes breaks for coffee and tea, lunch, as well as drinks and snacks.
You will receive a certificate upon course completion. Please be aware that this course does not include graded activities, and therefore we cannot provide a transcript of grades.
Costs
-
Course fee:
€730.00
-
-
Included:
Course + course materials
PhD students from the Faculty of Social and Behavioural Sciences at Utrecht University have the opportunity to attend three Winter/Summer School courses funded by the Graduate School of Social and Behavioural Sciences. Additionally, they may choose to take as many courses as they wish at their own expense from their personal budget.
There are no scholarships available for this course.
We also offer tailormade M&S courses and in-house M&S training. If you want to look the possibilities, please contact us at ms.summerschool@uu.nl
Application
Please include a short description of your (scientific) background.
Tags