Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc. In this course, students will learn how to apply text mining methods on text data and analyse them in a pipeline with statistical learning algorithms. The course has a strongly practical hands-on focus, and students will gain experience in using and interpreting text mining on data examples from humanities, social sciences, and healthcare.
Nowadays, from social sciences to humanities and healthcare, a major portion of data is inside text. However, text is considered as a kind of unstructured information, which is difficult to process automatically. Therefore, text mining can be applied to create a more structured representation of a text, making its content more accessible to researchers. Therefore, this course offers an elaborate introduction into text mining with R. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare domains and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include regular expressions, text preprocessing, text classification and clustering, and word embedding approaches for text data
The course deals with the following topics:
The course starts at a very basic level and builds up gradually. At the end of the course, participants will master text mining skills with R. Participants should have a basic knowledge of scripting in R.
This course is part of a series of 5 courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics. Please see here for more information about the full specialisation. This course can also be taken separately.
Summer School Data Science specialisation:
Upon completing 3 out of 5 courses in the specialisation (no more than one text mining course), students can obtain a certificate. Each course may also be taken separately.
Please note that there is always the possibility that we have to change the course pending COVID19-related developments. The exact details, including a day-to-day program, will be communicated 6 weeks prior to the start of the course.
This course is for R users who are interested in practical natural language processing and statistical learning on text data. Participants should have a basic knowledge of scripting and programming in R. Participants from a variety of fields, including sociology, psychology, education, human development, marketing, business, biology, medicine, political science, and communication sciences, will benefit from the course.
A maximum of 80 participants will be allowed in this course. Please note that the selection for this course will be done on a first-come-first-served basis.
The course teaches students the necessary skills to understand how basic text mining techniques work, and how to use R for a variety of text analysis in many domains of science. The skills addressed in this course are:
For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here.
Three full days. A typical course day starts at 9.00 and ends at 17.00 with breaks for coffee, lunch and tea.
Please note that there are no graded activities included in this course. Therefore, we are not able to provide students with a transcript of grades. You will obtain a certificate upon completion of this course.
Housing through: Utrecht Summer School.
You can choose between two options for participating in this course, but please note that there is always the possibility that we have to change the course pending COVID19-related developments:
If you are interested in the campus option, let us know via a message in the application form under ‘Student Comment’.
The physical course costs €510, but if you participate via the livestream you will get a 60 euro discount. Note that if you choose the campus option, you will be asked to first pay the livestream-fee (€450) and, when we have permission from the university to actually organise classes on location, we will send a second invoice for the remainder of the fee. This way, you will be ensured to have at least a spot for the livestream.
Tuition fee for PhD students from the Faculty of Social and Behavioural Sciences from Utrecht University will be funded by the Graduate School of Social and Behavioural Sciences.
To spare the environment, we only provide digital course material (a few days before the start of the course).
Irma Reyersen | E: MS.email@example.com