Mozdeh

Big Data Text Analysis

Home -- Download -- Instructions -- FAQ

Downloading Tweets Using Keyword Searches

Before starting with Mozdeh, you will need a set of relevant Twitter queries. The process to build a useful set of queries is:

  1. Brainstorm a set of potentially relevant queries: Create a list of queries that you believe should match relevant tweets. The queries can be keywords or phrases that describe your topic and, as far as possible, do not also describe irrelevant topics. Enclose phrases in quotes and save the queries in a plain text file. It is particularly important to use a plain text file (e.g., in Windows Notepad) because if you have quotes in a word processor document then it will change the type of quotes from straight to smart, which the Twitter Search will not recognise.
  2. Check each query for irrelevant content and remove if almost all content is irrelevant, or refine the query if there is some irrelevant content. Test each query but submitting it to Twitter Search (https://twitter.com/search-home) and checking the results for false matches. Queries can be modified by adding extra terms specific to the topic or by subtracting terms matching irrelevant topics. For example, if the initial search was tiger for the big cat then the refinement might be tiger cat or even tiger –woods, both of which should get a higher percentage of relevant queries than the initial query.
  3. Brainstorm for additional queries to add to the set
  4. Repeat from 2 until the set of queries seems to be stable and satisfactory.

Once this is complete, the queries can be used to gather the data for analysis, as described below.

Try the list of search operators here too - and beware that the web search results may not work in the same way as Mozdeh search results. For example, searching for a username returns some of the users tweets in the results online but seems not to in Mozdeh - the from:user command is needed instead.

Gather matching tweets with Mozdeh

Now Mozdeh can be used with your queries. First, Mozdeh must be installed on the computer that will run the analysis. To do this, download the appropriate version from here. Start Mozdeh and follow the instructions about selecting a folder in which to store your data. Mozdeh will ask to create a folder called rss_data on your computer. This folder will be initially empty but you will eventually populate this with new subfolders, one for each Twitter project.

Mozdeh can help with piloting the original queries. Start Mozdeh, enter a name for the pilot test (e.g., Tiger pilot1) and click the Start New Project button.

Then enter the queries in the Data Collection screen. Make sure that queries matching too many irrelevant tweets are rejected, and that queries matching some irrelevant tweets are refined to eliminate most irrelevant tweets. For example, a study to investigate how the words kiss and hug are used in Twitter to express affection started with these two terms as queries (below left). After pilot testing the queries, terms were subtracted from them to remove lots of unwanted matches (below right). For example, subtracting ass removed many instances of the phrase "kiss my ass", removing 104.1 excluded references to the Kiss 104.1 radio station and subtracting bora removed lots of spam related to a TV meme at the time of checking.

Original queries -> Revised queries

Once the queries are entered, click the Search Twitter Once button. Or, if this is a full-scale study, click the Search Twitter Continually button and then click again to stop it at the end of your data collection period.

You will be taken to a web page asking you to logon to Twitter. This will give you a pin number to enter into Mozdeh that gives Mozdeh permission to search on your behalf (but not permission to tamper with your account, so this is safe).

When it has finished, it will ask a series of questions – please click OK or give the suggested answer to these questions and then you should get the main search screen. Now follow from the second or third step in the instructions in order to analyse the tweets.

Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.