Using secondary and social media data
Guidance on ethical considerations for using secondary data and data from social media in research projects.
This information is largely adopted from the LEL advice pages in PPLS. You can access the original pages in relevant sections below. Please contact the Informatics ethics committee (inf-ethics@inf.ed.ac.uk) with any questions about the use of secondary data and/or social media data in Informatics research.
Note that for both secondary data and social media data, the use of data is not automatically ethical just because it is legally accessible. Always consider your research question and the participants from whom data is collected; for instance if the research is conduct on a group considered vulnerable (e.g. a forum on mental health) the ethical considerations are much more complex than research conducted on less vulnerable groups (e.g. football fans).
Secondary data - ethics application may be required
Secondary data is sometimes available through established corpora. If you are using data from an existing corpus, there is typically no need to apply for further ethical approval, however you should continue to treat any data from human participants in an ethical manner. Considerations include:
- If the data are in the public domain, you must abide by any requirements stated by the corpus provider, including with respect to anonymity, or any other conditions on use.
- Some corpora may require ethical approval, especially corpora that include physical or mental health data, or corpora that contain data that could be used de-anonymise individuals (e.g. when free-text responses are allowed).
- If the data are not in the public domain, you must ensure that your use of the data conform to any requirements stated by the corpus provider. For example, the data must not be shared in any unauthorized manner (e.g., posted online).
- In either case, if there is reason to suspect that the people who initially provided the data were not aware that it would be used for research purposes, you should carefully consider the ethical implications of your research, including whether you should obtain informed consent.
- The use of secondary data is not automatially ethical just because it is legally accessible.
LEL pages on using corpus data
Social media data - ethics application required
As with corpus data originally collected from live participants, data from social media needs to be used in an ethical manner. Unlike corpus data, information about individuals on social media is not generally created for the purpose of research. Considerations include:
- Whether or not the data is publicly available.
- Read the terms of use provided by the social media site (e.g. Twitter rules).
- You may be able to access a profile or other kinds of social media data on a site because you are registered user. This is not the same as that information being publicly available.
-
You can only use the data available to you as registered user of a site in accordance with the policy of that site (which users consent to when they join). For example, Facebook allows the collection of information by third parties from its site, but users’ consent must be obtained.
-
The use of social media data is not automatially ethical just because it is legally accessible.
LEL pages on using social media data
Non-publicly available data
If the data you want to use is not publicly available (e.g. Facebook), consider:
- Non-publicly available data will typically require registration, and agreement to abide by the conditions stated by the data owner (e.g. Facebook).
- Pay careful attention to terms of use. Please read and ensure that you understand and agree to follow any conditions placed on use of data.
- Please note that if the conditions of data use require you to obtain informed consent, you must submit your project for ethical review following the Informatics Ethics procedure.
Publicly available data
If the data you want to use is publicly available (e.g. Twitter or public Facebook groups) there are still ethical implications of using that data that you need to consider:
-
Does the user have a reasonable expectation that their data could be used for research purposes without their consent? For example, the Twitter development policy prohibits "the use of Twitter data in any way that would be inconsistent with people’s reasonable expectations of privacy see this paper on participant perception". See also a relevant research paper on this topic.
Fiesler and Proferes (2018) Participant Perceptions of Twitter Research Ethics
-
If the user cannot have a reasonable that their research could be used for research purposed, then the ethically responsible thing to do may be to obtain informed consent even though the data are public. Ethics is more stringent than the law.
- Consider carefully whether there is any potential risk (psychological or otherwise) to the participants of their data being used for your project. If so, then discuss other options with an Informatics ethics committee member.
- Consider whether the population you are targeting should be considered vulnerable (e.g., groups organised around mental health issues, or responses to ongoing natural disasters). If so, then discuss options with an Informatics ethics committee member.
- Consider whether it is feasible to use an opt-out method, or to accept a ‘yes’ tweet or email if obtaining written informed consent the standard way is not feasible.
- If you are reproducing a post or tweet, it may be particularly likely you should obtain consent (see below), so discuss with an Informatics ethics committee member.
Contact the Informatics ethics committee
- If possible, when presenting, publishing or sharing, use only aggregated data, i.e. no quoted posts or tweets (and no user IDs).
- If you must use quoted posts or tweets, are there ways you could anonymise the data, or make it harder for them to be de-anonymised? For example:
- Don’t collect or provide usernames alongside posts.
- However this method may not work because of Terms of Service guidelines (e.g., Twitter specifically says you must provide names/handles in published quoted tweets) and the ease with which quoted text is searchable online.
- Don’t include tweets/posts that could be considered sensitive or personal
-
If any photos, imagery, or voice recordings are involved, careful consideration must be taken.
- Pay careful attention to terms of use. Please read and ensure that you understand and agree to follow any conditions placed on use of data.
- For example, see above note about twitter user IDs.
- The Twitter User Development Policy also says “…scraping the Services without the prior consent of Twitter is expressly prohibited”.
Twitter display requirements for tweets
Additional guidance and resources
For additional guidance about using social media data please have a look at the following papers, contact the Informatics ethics, or talk to your supervisor.
- Markam et al. (2012) Ethical Decision-Making and Internet Research
- Social Media Research: A Guide to Ethics (University of Aberdeen)
- D’Arcy & Young (2012) Ethics and social media: Implications for sociolinguistics in the networked public
- Fiesler and Proferes (2018) Participant Perceptions of Twitter Research Ethics
- Zimmer (2012) But the data is already public: on the ethics of research in Facebook
- But it’s already public right? The ethics of using online data. This article includes a flow chart to advise data journalists on whether or not to publish tweets.