An exploratory study of COVID-19 misinformation on Twitter Data Collection
In this study, researchers collected two datasets. The first dataset consists of false/partially false tweets from fact-checking websites. The second dataset consists of a random sample of COVID-19 related tweets.
To obtain the first dataset, they first collected 7623 fact-checked news articles from Snopes and Poynter. Then, they used Beautifulsoup to crawl the articles and anchortags to obtain the tweet ID's, collecting 3053 Tweet ID's. Tweepy was then used to fetch the Tweets.
To obtain the second dataset, they randomly extracted 1000 COVID-19 related tweets per day between the months of January and July through Twitter API and Twitter4J, getting 92095 tweets from Twitter API and 71000 tweets from Twitter4J.
Tweets were categorized into "false," "partially false," "true," or others based on fact-checks.
0
1
Tags
CSCW (Computer-supported cooperative work)
Computing Sciences