Best Practices for the Documentation of Data Collection
This page contains best practices for the Documentation of Data Collection, including instructions and examples.
What is a Documentation of Data Collection?
Discuss Data understands itself as part of initiatives in the social sciences and humanities which aim to make the collection or production and analysis of research data transparent in order to allow for a critical assessment of the validity of the conclusions which have been drawn from them. Moreover, Discuss Data aims to offer the academic community the chance to benefit from Data Collection efforts made by others. Secondary data analysis can provide huge benefits to researchers:
- It can save costs which is especially important for early-stage researchers who lack funding
- It can add further evidence to original data, supporting or challenging one’s own results
- It can allow for comparisons across cases by providing data on additional cases
- It can offer access to unique information, if, for instance, historical data can no longer be found or generated
However, secondary data analysis crucially relies on a precise documentation to ensure that the data is not misinterpreted. Therefore, all Data Collections which are submitted to Discuss Data should be accompanied by such a documentation - the Documentation of Data Collection.
The Documentation of Data Collection explains the broader context of the Data Collection, e. g. how, when and where the data was compiled, what the data means, and may contain ethical issues, empirical challenges, and epistemological approaches.
What should be included in the Documentation of Data Collection?
Discuss Data does not offer strict guidelines for the Documentation of Data Collection. As an interdisciplinary forum covering the whole range of quantitative and qualitative methods in the social sciences and the humanities, Discuss Data is open to different ways of documenting data collections. However, Discuss Data aims to develop some minimum requirements for the Documentation of Data Collection in order to allow for a sensible and meaningful discussion and to enable secondary data analysis. In compiling the Documentation of Data Collection, Discuss Data suggests following the FAIR Data principles, which are aimed at making data Findable, Accessible, Interoperable, and Reusable.
In general, Discuss Data expects a Documentation of Data Collection to indicate (in at least a brief description) the methodology of the Data Collection:
- Title of the Data Collection
- Creator / Author of the Data Collection
- Guiding research interest and paradigm (i.e., epistemological approach)
- The sources of the data
- Date of the creation of the Data Collection
- Time period covered by the Data Collection
- Data Collection Methods: Sampling, Data Collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage
- Detailed description of the Data Collection
- Context of the data: Project history, aim, objectives and hypotheses
- Handling of empirical challenges (as encountered during data collection), validity of the collected data, reliability of sources which provided the data, cleaning and quality assurance procedures carried out
- Handling of ethical issues (as appropriate)
- Changes made to data over time since their original creation and identification of different versions of data files
- Information on access and use conditions or data confidentiality
Go to our checklist with a short overview, what should be included in the Documentation of Data Collection.
The Documentation of Data Collection is submitted together with the Data itself and the Metadata and, like the Metadata, is always accessible, even if the access of the Data itself is restricted.
Examples
The form of the Documentation of Data Collection crucially depends on the methods used. While a Documentation of Data Collection concerning statistical data should allow the replication of the respective analysis, a replication study of qualitative elite interviews is not feasible, because time will impact on the result. If the interview is repeated after some time, the same interview partner may not be able or willing to repeat the old views, especially if events related to the interview topic have unfolded in unexpected ways.
Best practices also depend on the epistemological approach of the responsible researcher. The neo-positivist tradition usually documents qualitative elite interviews with a list indicating the name and position of the interview partner as well as the time and place of the interview. The questionnaire or interview guide should be added and interview recordings or transcripts are archived. In the interpretive approach, however, the focus is on contextuality and evidence is seen as co-generated in interaction with the research participants. Accordingly, the Documentation of Data Collection takes the form of field notes, describing the setting of an interview, important events prior to the interview which may have impacted on the persons taking part in the interview process (including the researcher) and any other details deemed relevant by the researcher.
In order to support the quality and standardisation of the Documentation of Data Collections, Discuss Data aims to offer a reference to an actual Documentation of Data Collection that fulfils the requirements of Discuss Data or to a brief discussion of the requirements and challenges of such a documentation by an experienced researcher for as many methods of data collection as possible. Below you find examples which may help drafting your own Documentation of Data Collection.
In case you need further help or information, please contact the responsible curator or the Discuss Data team (info@discuss-data.net).
Case studies
The collection of materials of all sorts related to case studies or process tracing as part of a neo-positivist research design should be documented following the rules for active citations, as lied out by Andrew Moravcsik, see
Andrew Moravcsik (2010): Active Citation: A Precondition for Replicable Qualitative Research, in: PS January 2010, pp. 29-35, doi:10.1017/S1049096509990783, available online at https://www.princeton.edu/~amoravcs/library/ps.pdf
Andrew Moravcsik (2014): Transparency: The Revolution in Qualitative Research, in: PS January 2014, pp. 48-53, doi:10.1017/S1049096513001789, available online at https://www.princeton.edu/~amoravcs/library/transparency.pdf
Concerning the collection of materials for case studies based on a social anthropological, ethnographic or interpretivist approach, Discuss Data recommends the transparency standards presented by
Katherine Cramer (2015): Transparent explanations, yes. Public transcripts and fieldnotes, no: Ethnographic research on public opinion, in: Qualitative & Multi-Method Research, 13 (1), pp. 17-20; available online at: https://zenodo.org/record/893069#.W8X0yPaYTGg
Content analysis of mass media reporting
The following documentation of the creation and content analysis of a text corpus of media reporting can serve as a sample which has to be adjusted to individual projects:
Andreas Heinrich, Heiko Pleines et al. (2014): Analysis of mass media reports on export pipelines (Azerbaijan, Kazakhstan, Turkmenistan). Part I. Documentation of the creation of the text corpus, Part II. Full codebook; available online at: https://www.forschungsstelle.uni-bremen.de/UserFiles/file/Pipelines-Caspian_media-list+codebook.pdf
For quantitative computer-assisted text analysis the following instructions are also relevant:
David Romney, Brandon Stewart, Dustin Tingley (2015): Plain text? Transparency in computer-assisted text analysis, in: Qualitative & Multi-Method Research, 13 (1), pp. 32-38; available online at: https://zenodo.org/record/893085#.W8X5o_aYTGg
Concerning the documentation of text analysis following the hermeneutic or interpretivist approach, Discuss Data accepts the transparency standards presented by
Andrew Davidson (2015): Hermeneutics and the question of transparency, in: Qualitative & Multi-Method Research, 13 (1), pp. 43-47; available online at: https://zenodo.org/record/893073#.W8X6f_aYTGg
Content analysis of official documents
For a documentation of the creation of a text corpus of official (i.e. published) documents and analyses (concerning primary as well as secondary sources) all documents or publications should be presented following the rules for active citations:
Andrew Moravcsik (2010): Active Citation: A Precondition for Replicable Qualitative Research, in: PS January 2010, pp. 29-35, doi:10.1017/S1049096509990783, available online at https://www.princeton.edu/~amoravcs/library/ps.pdf
Andrew Moravcsik (2014): Transparency: The Revolution in Qualitative Research, in: PS January 2014, pp. 48-53, doi:10.1017/S1049096513001789, available online at https://www.princeton.edu/~amoravcs/library/transparency.pdf
The content analysis should be documented with the help of a codebook following the example given for content analysis of mass media.
For quantitative computer-assisted text analysis the following instructions are also relevant:
David Romney, Brandon Stewart, Dustin Tingley (2015): Plain text? Transparency in computer-assisted text analysis, in: Qualitative & Multi-Method Research, 13 (1), pp. 32-38; available online at: https://zenodo.org/record/893085#.W8X5o_aYTGg
Concerning the documentation of text analysis following the hermeneutic or interpretivist approach, Discuss Data accepts the transparency standards presented by
Andrew Davidson (2015): Hermeneutics and the question of transparency, in: Qualitative & Multi-Method Research, 13 (1), pp. 43-47; available online at: https://zenodo.org/record/893073#.W8X6f_aYTGg
Content analysis of social media
For quantitative computer-assisted text analysis the following instructions are relevant:
David Romney, Brandon Stewart, Dustin Tingley (2015): Plain text? Transparency in computer-assisted text analysis, in: Qualitative & Multi-Method Research, 13 (1), pp. 32-38; available online at: https://zenodo.org/record/893085#.W8X5o_aYTGg.
We are grateful for any suggestions related to samples or instructions concerning the documentation of corpus creation and content analysis of social media posts, that could be added here as examples.
Interviews
The documentation of expert or elite interviews conducted in a neo-positivist research design should follow the rules for documentation laid out by
Erik Bleich, Robert J. Pekkanen (2015). Data Access, Research Transparency, and Interviews: The Interview Methods Appendix, in: Qualitative & Multi-Method Research, 13 (1), pp. 8-13; available online at: https://zenodo.org/record/892386#.W8XsavaYTGg
If the submitted interview transcripts have not been completely and irreversibly anonymized, scanned consent forms signed by the respondents have to be submitted to Discuss Data during the upload of the Data Collection. For more information, please see our FAQs on informed consent.
Concerning the documentation of interviews conducted in an interpretivist way, Discuss Data accepts the transparency standards presented by
Katherine Cramer (2015): Transparent explanations, yes. Public transcripts and fieldnotes, no: Ethnographic research on public opinion, in: Qualitative & Multi-Method Research, 13 (1), pp. 17-20; available online at: https://zenodo.org/record/893069#.W8X0yPaYTGg
Process tracing
The documentation of data collection for process tracing follows the logic described above for case studies.
Additionally, for a critical discussion of research transparency in the case of process tracing you can consult:
Tasha Fairfield (2015): Reflections on analytic transparency in process tracing research, in: Qualitative & Multi-Method Research, 13 (1), pp. 17-20; available online at: https://zenodo.org/record/893075#.W8bnJfaYTGg
Protest-event databases
The following documentation of compilation of a protest-event database can serve as a sample which has to be adjusted to individual projects:
Beissinger, Mark R. (2003): Codebook for Disaggregated Event Data. “Mass Demonstrations and Mass Violent Events in the Former USSR, 1987-1992 [Event databases used for the analysis in Nationalist Mobilization and the Collapse of the Soviet State]”, available online at: https://scholar.princeton.edu/mbeissinger/publications/mass-demonstrations-and-mass-violent-events-former-ussr-1987-1992-these, copy at http://www.tinyurl.com/yafecbne
Participant observation
Concerning the documentation of participant observation in social anthropological or ethnographic research, Discuss Data accepts the transparency standards presented by
Katherine Cramer (2015): Transparent explanations, yes. Public transcripts and fieldnotes, no: Ethnographic research on public opinion, in: Qualitative & Multi-Method Research, 13 (1), pp. 17-20; available online at: https://zenodo.org/record/893069#.W8X0yPaYTGg
Representative opinion polls
The documentation of representative opinion polls should follow the guidelines of one of the respected international polling organisations. Discuss Data recommends the following example of guidelines for the documentation of representative opinion polls:
British Polling Council: Statement of Disclosure, available online at http://www.britishpollingcouncil.org/statement-of-disclosure/
Official statistics
If official statistics are used, it is important to give a complete reference to the data source used including the date of collection (or documenting later revisions to the data if relevant). Additionally, the validity of the statistical data should be discussed including an explicit reference to the definitions and collection methods used, highlighting any related incoherencies over time, between units of analysis or between sources.
When discussing the collection and validity of statistical data please consider the experiences described by
Francesca Refsum Jensenius (2014): The Fieldwork of Quantitative Data Collection, in: PS April 2014, pp. 402-404, doi:10.1017/S1049096514000298, available online at www.francesca.no/wp-content/2014/04/PS_Jensenius.pdf
Focus groups
We are grateful for any suggestions related to samples or instructions concerning the documentation of focus group discussions.
Network analysis
We are grateful for any suggestions related to samples or instructions concerning the documentation of network analysis.
Your method is missing?
Please write us to info@discuss-data.net
Help
-
First steps: Overview
-
First steps for new users
- FAQs
-
Data Upload
- Edit License & Access
- Edit Metadata
- Edit Data Set Description & Add Tags
- Edit Dataset Description
- Dataset Publishing
- Edit Dataset Versions
- Manage Datafiles
- Edit Related Publications
- Data Set Upload
- edit-dataset-metadata
- dataset-list
- Edit collaboration
- dataset-overview
- dataset-copyright-declaration
- dataset_curation
- edit-dataset-description-add-tags
- Edit Data Submission Agreement
-
Best Practices
- Best Practices for Data Submission
- Best Practices for the Documentation of Data Collection
- Checklist for the Documentation of Data Collection
- Best Practices for Discussion
- Best Practices for Curators
- Best Practices for Interviews
- Best Practices for Informed Consent
- Safeguarding Good Scientific Practice
- Rules