This document provides all details needed to have access to the research collection eRisk 2024.

Any scientific publication derived from the use of this collection should explicitly refer to the following publications:

Fabio Crestani, David E. Losada, Javier Parapar. “Early Detection of Mental Health Disorders by Social Media Monitoring: The First Five Years of the eRisk Project”. Springer, Studies in Computational Intelligence (SCI, volume 1018), 2022. . Bibtex: bibtex file

Javier Parapar, Patricia Martín, David E. Losada, Fabio Crestani. Overview of eRisk 2024: Early Risk Prediction on the Internet. CLEF 2024, Lecture Notes in Computer Science, 2024. Bibtex: bibtex file.

The eRisk 2024 collection is available for research purposes under proper user agreements.

Data

The collection of sentences (Task 1, eRisk 2024) contains a large corpus of TREC-formatted sentences and the corresponding relevance judgments. More details about this dataset are available in the eRisk 2024 overview. The corpus of sentences has the following structure:

<DOC>
<DOCNO> ... </DOCNO>
<PRE> ... </PRE>
<TEXT> ...   </TEXT>
<POST> ... </POST>
</DOC>
....

DOCNO: contains an identifier for the sentence

TEXT: contains the text of the sentence

PRE: contains the preceding sentence

POST: contains the following sentence

The anorexia collection (Task 2, eRisk 2024) contains textual interactions (posts or comments) from multiple users (anorexia and non-anorexia). For each subject, a (usually long) history of writings (posts or comments from a social networking site) is available. This is stored as a XML file (one per subject) with the following structure:

<INDIVIDUAL>
<ID> ... </ID>
<WRITING>
<TITLE> ...   </TITLE>
<DATE> ... </DATE>
<INFO> ... </INFO>
<TEXT> ...  </TEXT>
</WRITING>
<WRITING>
<TITLE> ... </TITLE>
<DATE> ... </DATE>
<INFO> ... </INFO>
<TEXT> ... </TEXT>
</WRITING>
....
</INDIVIDUAL>

ID: contains the anonymised id of the subject

TITLE: title of the post if available (if it is a comment then TITLE is empty)

INFO: additional info about the writing (source of the post/comment)

TEXT: body of the post or comment

The eRisk2024 collection also contains another dataset with multiple users (for each user, his/her history of writings is provided) and the responses given by these users to an Eating Disorders questionnaire. More details about this dataset are available in the eRisk 2024 overview (third task).

User agreement

This collection can only be used for research purposes. If you are interested in having access to this data, please fill the following user agreement and send it to