Data Science: Text Mining with R

In this course, students will learn how to apply text mining methods on text data and analyse them in a pipeline with statistical learning algorithms.



Course Level
ECTS credits
1.5 ECTS
Course location(s)
Utrecht, The Netherlands


The course has a strongly practical hands-on focus, and students will gain experience in using and interpreting text mining on data examples from humanities, social sciences, and healthcare.

Nowadays, from social sciences to humanities and healthcare, a major portion of data is inside text. However, text is considered as a kind of unstructured information, which is difficult to process automatically. Therefore, text mining can be applied to create a more structured representation of a text, making its content more accessible to researchers. Therefore, this course offers an elaborate introduction into text mining with R. The course has a strong practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare domains and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include regular expressions, text preprocessing, text classification and clustering, and word embedding approaches for text data.

The course deals with:

  • Understanding and explaining the fundamental approaches to text mining
  • Understanding and applying current methods for analyzing texts
  • Understanding how text is handled, manipulated, preprocessed and cleaned
  • Defining a text mining pipeline given a practical data science problem
  • Implementing generic text mining tools such as regular expression, text clustering, text classification, sentiment analysis, and word embedding

The course starts at a very basic level and builds up gradually. At the end of the course, participants will master text mining skills with R.

Participants should have a basic knowledge of data science and scripting in R. 
A good preparation for this course could be Data Science: Statistical Programming with R and Data Science: Data Analysis

A good follow-up is our summer course Data Science: Applied Text Mining (course code S42).

Participants are requested to bring their own laptop computer. Software will be available online

This course can be taken separately, but is also part of a series of 8 courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics:

  1. Data Science: Programming with Python  (Course code S17, 8-12 July 2024)
  2. Data Science: Statistical Programming with R (Course code S24, 8-12 July 2024)
  3. Data Science: Multiple Imputation in Practice (Course code S28, 8-11 July 2024)
  4. Data Science: Data Analysis (Course code S31, 15-19 July 2024)
  5. Data Science: Network Science (Course code S37, 15-19 July 2024)
  6. Data Science: Applied Text Mining (Course code S42, 15-19 July 2024)
  7. Data Science: Machine Learning with Python (Course code S70, 22-26 July 2024)
  8. Data Science: Text Mining with R  (this course)

Upon completing, within 5 years, 3 out of 8 courses in the Summer School Data Science specialisation (no more than one text mining course), students can obtain a certificate. 

Please see here for more information about the full specialisation.

Day to Day Documents

S41 Day to day program 2024_0.pdf

Target audience

This course is for R users who are interested in practical natural language processing and statistical learning on text data. Participants should have a basic knowledge of scripting and programming in R. Participants from a variety of fields, including sociology, psychology, education, human development, marketing, business, biology, medicine, political science, and communication sciences, will benefit from the course.

For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here.

Aim of the course

The course teaches students the necessary skills to understand how basic text mining techniques work, and how to use R for a variety of text analysis in many domains of science. The skills addressed in this course are:

  • Text mining definitions
  • R environment
  • Regular expressions
  • Preprocessing text data
  • Stemming
  • Text visualisation
  • Text classification 
  • K-fold cross validation
  • Sentiment analysis
  • Text clustering
  • Word embedding
  • Deep learning
  • Keras

Study load

Four full days. A typical course day starts at 09.00 hours and ends at 17.00 hours. There will be breaks for coffee, lunch and tea.

You will receive a certificate upon course completion. Please be aware that this course does not include graded activities, and therefore we cannot provide a transcript of grades.


  • Course fee: €730.00
  • Included: Course + course materials + lunch
  • Housing fee: €200
  • Housing provider: Utrecht Summer School

PhD students from the Faculty of Social and Behavioural Sciences at Utrecht University have the opportunity to attend three Winter/Summer School courses funded by the Graduate School of Social and Behavioural Sciences. Additionally, they may choose to take as many courses as they wish at their own expense from their personal budget. 

This course can be taken free of charge for UU employees of the faculty of Social and Behavioral Sciences. Please complete the form as usual; you will not receive an invoice for this course.

There are no scholarships available for this course.

We also offer tailormade M&S courses and in-house M&S training. If you want to check out the possibilities, please contact us at ms.summerschool@uu.nl

Additional information

The housing costs do not include a Utrecht Summer School sleeping bag. This is a separate product on the invoice. If you wish to bring your own bedding, please deselect or remove the sleeping bag from your order. 


Please include a short description about your (scientific) background, your programming experience and what you expect to learn from this course (or would like to learn).


Related courses