Data Science: Solving Missing Data Problems in R

€840

Specifications

20 Jul. - 23 Jul. 2026

Master

1.5 ECTS

Utrecht, The Netherlands

Description

This course has been cancelled and will not take place as scheduled

Missing data often disrupt real-world machine learning and AI applications. Through a mix of lectures, hands-on labs, and practical case studies, you will learn to implement flexible, scalable, and statistically sound solutions using R, with emphasis on real-world applicability.
Led by experts in the field, including developers of the widely used mice package, the course explores contemporary approaches for generating imputations, synthesizing data, integrating imputation into AI workflows, and diagnosing the impact of missing data in statistical models and predictive modeling pipelines.

Most researchers need to deal with incomplete data. Missing data complicate the statistical analysis of data. Simply removing the missing data is not a good strategy and can bias the results. Multiple imputation is a general and statistically valid technique to analyze incomplete data. Multiple imputation has rapidly becoming the standard in social and behavioural science research.

This course will explain modern and flexible imputation and data synthesis techniques that are able to preserve salient data features. The course enhances participants’ knowledge of imputation principles and to provides flexible hands-on solutions to incomplete data problems. The course discusses principles of missing data theory, outlines a step-by-step approach toward creating high quality imputations, and provides guidelines on how to report the results. The course will use the authors’ MICE package in R to illustrate practical solutions to real data problems. The concepts and applications of the illustrated methodology would be equally applicable to other programming languages.

The course materials will follow the book “Flexible Imputation of Missing Data” by Stef van Buuren ( 2^nd edition, Chapman & Hall, 2018) as well as a collection of papers and vignettes by the course team. The book can be read online for free at https://stefvanbuuren.name/fimd/.

Format of the course
We iterate short lectures with hands-on practical sessions and plenary discussion of the practicals. This ensures that we form an interactive group of participants that learns the theory and practice of multiple imputation in bite-size blocks. Each block builds up to the next one. We invite participants to share their own experience and challenges during these blocks so that we can foster a collaborative learning environment.

Prerequisites
Participants should have a basic knowledge of scripting and programming in R. Participants who have limited experience with R need to have followed a relevant R course beforehand, such as

or any similar level course elsewhere.

The theory and practice discussed in this course requires that participants are familiar with basic statistical concepts and techniques, such as linear modeling, prediction, least squares estimation and hypothesis testing. Participants are requested to bring their own laptop for lab meetings.

Data Science specialisation
This course can be taken separately, but is also part of a series of seven courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics:

Data Science: Programming with Python (Course code S17, 06-10 July 2026)
Data Science: Statistical Programming with R (Course code S24, 06-10 July 2026)
Data Science: Network Science (Course code S37, 13-17 July 2026)
Data Science: Applied Text Mining and Natural Language Processing (Course code S42, 13-17 July 2026)
Data Science: Introduction to Machine Learning and Data Analysis in R (Course code S31, 20 – 24 July 2026)
Data Science: Solving Missing Data Problems in R (This course)
Data Science: Machine Learning with Python (Course code S70, 2027)

Upon completing, within five years, three out of seven courses in the Summer School Data Science specialisation), participants can obtain a certificate. Please click here for more information about
the full specialization.

Day to Day Documents

S28 Day-to-day 2026.pdf

Lecturers

Prof. dr. Stef van Buuren (Netherlands Organization for Applied Scientific Research (TNO) and Utrecht University)

Target audience

This course is ideal for:

Data scientists working on robust imputation strategies for real-world datasets.
Applied researchers seeking statistically sound solutions for incomplete data.
Machine `learners` tackling incomplete data in their models.

This course is relevant for anyone that would like to get acquainted with incomplete data theory and the practice of imputation and data synthesis. Participants should have basic understanding of statistical techniques (such as analysis of variance and (non)linear regression) and the concept of statistical inference. This course is suitable for students at Master level, Advanced master level en PhD level. A max. of 50 participants will be allowed in this course. Please note that the selection for this course will be done on a first-come-first-served basis.

For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here

We also offer tailor-made M&S courses and in-house M&S training. If you want to look at the possibilities, please contact Dr. Laurence Frank at pe.dsai@uu.nl.

Aim of the course

To enhance participants’ knowledge of imputation methodology;
To get comfortable with flexible solutions to deal with incomplete data using R.

Learning goals:

Participants will learn to make informed decisions on how to handle incomplete data in a scientifically valid way
Participants will be able to implement the approach taken using state-of-the-art R technology

Study load

The course runs for four full days, typically starting at 9:00 and ending at 17:00. Each day includes breaks for coffee and tea, lunch, as well as drinks and snacks.

You will receive a certificate upon course completion. Please be aware that this course does not include graded activities, and therefore we cannot provide a transcript of grades.

Costs

Course fee: €840.00
Included: Course + course materials + lunch
Housing fee: €275
Housing provider: Utrecht Summer School

This course has the following fee options, depending on your status:

Participants affiliated with an academic organization (MSc, PhD, researchers): € 840
Participants working in a non-academic organization: € 1000

Please make sure to include which price is applicable when registering for this course. This information can be added in the “Comment” field during the registration process.

For PhD students from the FSBS at UU:
As a PhD student from the Faculty of Social and Behavioural Sciences (FSBS) at Utrecht University, you can attend up to three Winter or Summer School courses funded by the Graduate School of Social and Behavioural Sciences. Of course, you may choose to take as many other courses as you wish at your own expense, using your personal budget.
When registering, please indicate in the “Comment” field that you are a PhD candidate from the FSBS at UU, so that the course fee can be waived.

Additional information

The housing costs do not include a Utrecht Summer School sleeping bag. This is a separate product on the invoice. If you wish to bring your own bedding, please deselect or remove the sleeping bag from your order.

For more detailed information on student housing in Utrecht please click here.

Application

Please include a short description of your (scientific) background.

Related courses

Introduction to R

Organising institution
Utrecht University

Faculty
Faculty of Social and Behavioural Sciences

19 Jan. - 19 Jan. 2026

Course Level
Master

ECTS credits
0.5 ECTS

Closed
Data Science: Statistical Programming with R

Organising institution
Utrecht University

Faculty
Faculty of Social and Behavioural Sciences

6 Jul. - 10 Jul. 2026

Course Level
Bachelor

ECTS credits
1.5 ECTS

Closed