More and more researchers rely upon Systematic Reviews: attempts to synthesize the state of the art in a particular scientific field. However, the scientific output of the world doubles every nine years. In this tsunami of new knowledge, there is not enough time to read everything, resulting in costly, abandoned or error prone work. Using the newest methods from the field of Artificial Intelligence (AI), you can reduce the number of papers to screen by up to 95%(!). This summer school course introduces you to the newest technologies to assist you during the screening phase of a systematic review.
Performing a systematic review is a very rigorous process, which is increasingly resource intensive due to the ever-growing number of scientific publications to review. Nevertheless, systematic reviews are pivotal not only for scholars, but also for clinicians, policy makers, journalists, and, ultimately, the general public. Developing a search strategy for a systematic review is an iterative process aimed at balancing recall, precision and quality. That is, including as many potentially relevant and – ideally – high-quality studies as possible (recall and sensitivity), while at the same time limiting the total number of studies to screen (precision or specificity). In light of the time-consuming and costly process of conducting rigorous systematic reviews with the constant growth of scientific publications, reports, guidelines and other data sources, recent advances in natural language processing (NLP), text mining and machine learning have produced new algorithms that can accurately mimic human endeavour in systematic review activity, faster and more cheaply.
Within this course, everyday consists of both lectures and do-it-yourself computer labs. The lectures will be provided by a multidisciplinary team of experts from different fields: statistics, systematic reviewing, data science, open science, bibliometrics and transparent software engineering. For the computer sessions, we have a team ready to help you.
The first day of the course we compare the classical manual- based pipeline of performing a systematic review using the PRISMA steps with the AI-aided approach using screening prioritization. We assume participants are familiar with PRISMA, if not, you are requested to read the information on the PRISMA-website before the start of the summer school: http://prisma-statement.org/. In the afternoon we will work with the open source software ASReview (www.asreview.nl) by making use of example datasets, so that you can experience the benefits of using active learning. Make sure to have installation rights on your pc!
The second day will be devoted to obtaining the perfect dataset in a systematic way. The basics of searching online databases will be discussed using examples and demos: how to compose a search query and how to get the highest quality of data (e.g., complete abstracts). Because of the use of active learning, the size of the dataset can be different compared to a classical systematic review. How does this affect your search? Is there still a need to search multiple databases? How do you process these large datasets? With the same effort a much larger dataset can be screened, for example the CORD19 database containing over 350K papers on the Coronavirus. Imagine screening such a database in a couple of days instead of a lifetime!
The third day will be devoted to an in-depth explanation of the different feature extraction techniques (TF-IDF, word2vec, sBert) and classifiers (e.g., Naive Bayes, SVM, neural nets) that can be used, the query strategies (certainty, uncertainty, random sampling) and balancing strategies to deal with the extremely sparse relevant papers in the dataset. Although this part of the course is technical, we consider it important to better understand how AI works if you want to use AI-aided tools (and also to answer questions of your supervisors, reviewers, peers, and friends).
The fourth day is devoted to Open Science. Although sharing the search query and data is part of the PRISMA checklist, actually sharing the complete (meta)data underlying a systematic review, including all labelling decisions, is not standard. Therefore, we will discuss a data-sharing protocol, including the importance of persistent identifiers (DOIs), abstract retrieval and trusted repositories. Moreover, when using AI-aided tools it is not enough to make the search query and the meta-data FAIR (Findable, Accessible, Interoperable and Reusable), but the AI also makes decisions throughout the process which should be made FAIR. Therefore, all settings of the AI and every iteration of the model have to be stored and made human-readable. We will explain this process, and demonstrate how this can be done.
The fifth day consists of Q&A sessions and consultations.
Laura Hofstee (coordinator), Rens van de Schoot, Daniel Oberski, Jan de Boer, Bianca Kramer, Felix Weijdema, Sofie van den Brand, Yongchao Terry Ma
Participants from a variety of fields — including psychology, education, human development, public health, prevention science, sociology, marketing, business, biology, medicine, political science, and communication — will benefit from the course.
It helps if you have a concrete plan for carrying out a systematic review so that you can immediately start working with your own data.
Aim of the course
After engaging in the course lectures and discussions as well as completing the hands-on practice activities, participants will be able to carry out their own AI-aided systematic review using active learning.
Tuition fee for PhD students from the Faculty of Social and Behavioural Sciences from Utrecht University will be funded by the Graduate School of Social and Behavioural Sciences.
There are no scholarships available for this course.
Irma Reyersen | E: email@example.com