Data Science: Applied Text Mining

€850

Specifications

15 Jul. - 19 Jul. 2024

Advanced Master

1.5 ECTS

Utrecht, The Netherlands

Description

In this course, students will learn how to apply text mining methods on text data and analyse them in a pipeline with machine learning and deep learning algorithms. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from social sciences, humanities, and healthcare, and interpreting the results.

Given the rapid rate at which text data are being digitally gathered in many domains of science, there is a growing need for automated tools that can analyze, classify, and interpret these kinds of data. Text mining techniques can be applied to create a structured representation of text, making its content more accessible for researchers. Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc. This course offers an extensive exploration into text mining with Python. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include preprocessing text, text classification, topic modeling, word embedding, deep learning models, and responsible text mining.

The course deals with:

Reviewing the fundamental approaches to text mining;
Understanding and applying current methods for analyzing texts;
Defining a text mining pipeline given a practical data science problem;
Implementing all steps in a text mining pipeline: feature extraction, feature selection, model learning, model evaluation;
Understanding and applying state-of-the-art methods in text mining;
Implementing word embedding and advanced deep learning techniques.

The course starts with reviewing basic concepts of text mining and implementing advanced concepts in natural language processing. At the end of the week, participants will master advanced skills of text mining with Python.

Participants should have a basic knowledge and a motivation of scripting and programming in Python.
A good preparation for this course is our summer course Data Science: Text Mining with R and the course Data Science: Programming with Python

Participants are requested to bring their own laptop computer. Software will be available online.

This course can be taken separately, but is also part of a series of 8 courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics:

Data Science: Programming with Python (Course code S17, 8-12 July 2024)
Data Science: Statistical Programming with R (Course code S24, 8-12 July 2024)
Data Science: Multiple Imputation in Practice (Course code S28, 8-11 July 2024)
Data Science: Data Analysis (Course code S31, 15-19 July 2024)
Data Science: Network Science (Course code S37, 15-19 July 2024)
Data Science: Applied Text Mining (this course)
Data Science: Machine Learning with Python (Course code S70, 22-26 July 2024)
Data Science: Text Mining with R (Course code S41, 19-22 August 2024)

Upon completing, within 5 years, 3 out of 8 courses in the Summer School Data Science specialisation (no more than one text mining course), students can obtain a certificate.
Please see here for more information about the full specialisation.

Day to Day Documents

S42 Day to day program 2024.pdf

Target audience

This course works best for learners who are comfortable programming in Python, who want to acquire skills in text mining approaches, and who have a basic knowledge of machine learning.

Participants should also have a basic knowledge and a motivation of scripting and programming in Python. Participants from a variety of fields, including sociology, psychology, education, human development, marketing, business, biology, medicine, political science, and communication sciences, will benefit from the course. A maximum of 80 participants will be admitted to this course. Please note that the selection for this course will be done on a first-come-first-served basis.

For an overview of all our summer courses offered by the Department of Methodology and Statistics, please click here.

Aim of the course

The course teaches students the basic and advanced text mining techniques using Python on a variety of applications in many domains of science. The skills addressed in this course are:

Python environment;
Preprocessing text and feature extraction;
NLTK, Gensim, spaCy;
Text classification;
Sentiment analysis;
Text clustering;
Topic modeling;
Word embedding;
CBOW vs Skip-gram;
Convolutional neural networks;
Recurrent neural networks;
Attention models;
Responsible text mining;
Transformers and large language models.

Study load

Five full days. A typical course day starts at 9.00 hours and ends at 17.00 hours, with breaks for coffee, lunch and tea.

You will receive a certificate upon course completion. Please be aware that this course does not include graded activities, and therefore we cannot provide a transcript of grades.

Costs

Course fee: €850.00
Included: Course + course materials + lunch
Housing fee: €200
Housing provider: Utrecht Summer School

PhD students from the Faculty of Social and Behavioural Sciences at Utrecht University have the opportunity to attend three Winter/Summer School courses funded by the Graduate School of Social and Behavioural Sciences. Additionally, they may choose to take as many courses as they wish at their own expense from their personal budget.

There are no scholarships available for this course.

We also offer tailormade M&S courses and in-house M&S training. If you want to check out the possibilities, please contact us at ms.summerschool@uu.nl.

Additional information

The housing costs do not include a Utrecht Summer School sleeping bag. This is a separate product on the invoice. If you wish to bring your own bedding, please deselect or remove the sleeping bag from your order.

Application

Please include a short description about your (scientific) background, and what you expect to learn from this course (or would like to learn).

Related courses

Data Science: Text Mining with R

Organising institution
Utrecht University

Faculty
Faculty of Social and Behavioural Sciences

19 Aug. - 22 Aug. 2024

Course Level
Master

ECTS credits
1.5 ECTS

€730
Data Science: Statistical Programming with R

Organising institution
Utrecht University

Faculty
Faculty of Social and Behavioural Sciences

8 Jul. - 12 Jul. 2024

Course Level
Advanced Bachelor

ECTS credits
1.5 ECTS

Closed
Data Science: Data Analysis

Organising institution
Utrecht University

Faculty
Faculty of Social and Behavioural Sciences

15 Jul. - 19 Jul. 2024

Course Level
Advanced Master

ECTS credits
1.5 ECTS

Closed