This document provides all details needed to have access to the research collection eRisk 2021.
Any scientific publication derived from the use of this collection should explicitly refer to the following publications:
David Losada, Fabio Crestani. “A Test Collection for Research on Depression and Language use”. In Experimental IR Meets Multilinguality, Multimodality, and Interaction 7th International Conference of the CLEF Association, CLEF 2016, Évora, Portugal, September 5-8, 2016 . Bibtex: bibtex file
Javier Parapar, Patricia Martín, David E. Losada, Fabio Crestani. Overview of eRisk 2021: Early Risk Prediction on the Internet. CLEF 2021, Lecture Notes in Computer Science, 2021. Bibtex: bibtex file.
The eRisk 2021 collection is available for research purposes under proper user agreements.
The self-harm and gambling collections contain textual interactions (posts or comments) from multiple users (self-harm and non-self-harm; gambling and non-gambling). For each subject, a (usually long) history of writings (posts or comments from a social networking site) is available. This is stored as a XML file (one per subject) with the following structure:
<INDIVIDUAL>
<ID> ... </ID>
<WRITING>
<TITLE> ... </TITLE>
<DATE> ... </DATE>
<INFO> ... </INFO>
<TEXT> ... </TEXT>
</WRITING>
<WRITING>
<TITLE> ... </TITLE>
<DATE> ... </DATE>
<INFO> ... </INFO>
<TEXT> ... </TEXT>
</WRITING>
....
</INDIVIDUAL>
ID: contains the anonymised id of the subject
TITLE: title of the post if available (if it is a comment then TITLE is empty)
INFO: additional info about the writing (source of the post/comment)
TEXT: body of the post or comment
The eRisk2021 collection also contains another dataset with multiple users (for each user, his history of writings is provided) and the responses given by these users to a BDI (Beck’s Depression Inventory) questionnaire. More details about this dataset are available in the eRisk 2021 overview (third task, on measuring the severity of the signs of depression).
This collection can only be used for research purposes. If you are interested in having access to this data, please fill the following user agreement and send it to ezra.aragon@usc.es