Mozdeh

Big Data Text Analysis

Home -- Download -- Instructions -- FAQ

Academic Research Twitter on Mozdeh

Click above to download the standard version of Mozdeh (standard Twitter, YouTube etc.) This page is about the special Twitter Academic Research programme version.

In January 2021, Twitter revealed that they would give academic researchers and students access to the full archive of Twitter for research purposes. Whilst the default free tweets are at most 7 days old (in the standard version of Mozdeh), members of the Twitter Academic Research programme can access all tweets, irrespective of date. The following steps are needed to take advantage of this. This is for an experimental version of Mozdeh - please report any bugs to @MikeThelwall by email m.thelwall at wlv.ac.uk, including a screenshot and complete description of what you were doing at the time.

0) Please try out the normal Mozdeh first with the download link above and get used to working it. Decide based on this whether tweets will be useful for your project.

1) Sign up for the Twitter Academic Research programme. You will need to describe your project to Twitter and will need an academic email address and a Twitter account.

2) Wait until you get an email from Twitter anouncing that you are part of the programme.

3) Go to the Twitter developer portal and get a bearer token for the Academic Research programme (initial developer portal page [picture] and location of the Bearer token button [picture]). You may need to specify Mozdeh as your app, with URL http://mozdeh.wlv.ac.uk/

4) Right click to download Academic Research Mozdeh (24 Feb 2021 update) to a Windows PC. This is the same as the standard Mozdeh except that it goes through the Academic Research track. It needs a bearer token* for the Academic Research programme from the steps above.

5) Construct a set of queries for your project, which can be a set of keywords or something more complex - including country-specific and location-specific (e.g., cities) searches (I am not sure how Twitter calculates these - possibly a combination of geolocated tweets and user-declared location in user profiles; they may not get many results, perhaps <0.1% of matches). Consider using -is:retweet with every query to avoid retweets and adding lang:en (or other language code) to each query to specify language. Also decide on start and end dates for the tweets. The start date can be as far back as 26 March 2007. Twitter limits the number of tweets, so choosing too wide an interval might mean that you run out of tweets before it finishes. You can search old tweets at Twitter.com with the until: keyword (btw, don't use until: in Mozdeh because it uses a different date specification method) to check the types of tweets you will get and how many, and search Google Trends for a time series of interest in a term to get more insights into trends in popularity.

5) Start Academic Research Mozdeh, choose a folder for your projects, enter your queries, click the search button and enter your bearer token, start and end dates. You might get a strange error message if something is wrong with your query - please read this carefully because it might contain a clue to what is wrong. The title bar reports the earliest date of the last tweet collected (see below: 1400 tweets with the oldest from 23 June 2019), so you can see its progress.

6) Wait for Academic Research Mozdeh to finish. It gets a maximum of 500 tweets per second, or 30,000 tweets per minute or 1,800,000 tweets per hour so you can estimate how long it will take if you know how many tweets to expect. The maximum is currently 10,000,000 per month, which will probably take 27 hours to download (about 7 hours in theory, but my first test took 27 hours). The computer must be connected to the internet all the time and not sleep or hybernate. After data collection, the initial processing may take another 8 hours (a week if you specify the daily time segmentation option) but does not need the internet.

7) Answer the questions at the end about processing the data and wait for the tweets to be processed, taking you to the analysis screen. The tweets are also saved to a simple text file in the project file in Mozdeh. The analysis options are the same as for the standard Mozdeh.

Good luck!

* If you are not sure if your bearer token works, try running the following test command in a Windows Command Prompt, after replacing BT with your bearer token (paste can be enabled in the properties dialog box of the Command Prompt obtained by right-clicking the title bar). If you don't get an error message (e.g., "unauthorised") but get Json (lots of {} brackets and some tweets) then it is working.

curl "https://api.twitter.com/2/tweets/search/all?query=Corona" -H "Authorization: Bearer BT"

Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.