This document provides all details needed to have access to the research collection reported in the paper David Losada, Fabio Crestani. “A Test Collection for Research on Depression and Language use”. In Experimental IR Meets Multilinguality, Multimodality, and Interaction 7th International Conference of the CLEF Association, CLEF 2016, Évora, Portugal, September 5-8, 2016 . Bibtex: bibtex file

Any scientific publication derived from the use of this collection should explicitly refer to this CLEF 2016 paper.

The collection is available for research purposes under proper user agreements.

# Data

The collection contains textual interactions (posts or comments) from 892 users. 137 subjects have explicitly declared that they have been diagnosed with depresssion, and the remaining 755 subjects are a control group. For each subject, a (usually long) history of writings (posts or comments from a social networking site) is available. This is stored as a XML file (one per subject) with the following structure:

<INDIVIDUAL>
<ID> ... </ID>
<WRITING>
<TITLE> ...   </TITLE>
<DATE> ... </DATE>
<INFO> ... </INFO>
<TEXT> ...  </TEXT>
</WRITING>
<WRITING>
<TITLE> ... </TITLE>
<DATE> ... </DATE>
<INFO> ... </INFO>
<TEXT> ... </TEXT>
</WRITING>
....
</INDIVIDUAL>

ID: contains the anonymised id of the subject

TITLE: title of the post if available (if it is a comment then TITLE is empty)