Mozdeh

Big Data Text Analysis

Home -- Download -- Instructions -- FAQ

Advanced Twitter Data Gathering

  1. Creating queries. Use http://search.twitter.com to construct a set of queries that match the topic, ensuring that the queries have high precision – i.e., give few false matches. This is far more important than ensuring that the queries are comprehensive - i.e., have high recall.
  2. Creating query URLs. Convert the queries into Twitter query URLs and save them, one per line, in a plain text file. Each URL should start with https://api.twitter.com/1.1/search/tweets.json?q=REPLACEME&count=100, where REPLACEME should be replaced by the query. Spaces should be converted to + symbols and characters other than standard ASCII characters including letters and numerals, should be URL encoded. For exact phrase searches, enclose the phrase in straight quotes (e.g., "Justin Bieber" – make sure that the quotes are not smart 66 or 99 style quotes).
    1. Optionally, the queries can specify that the tweets must be in English by adding &lang=en to the end of each URL. For other languages, replace en with the ISO 639-1 language code from http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes.
    2. Optionally, the queries can specify a geographic location in the form of a circle from which the tweets must originate. For example, adding &geocode=54,-3,500km to the end of the URL (latitude,longitude,radius) ensures that the tweets must come from a 500km radius circle approximately surrounding the UK and Ireland. The circle coordinates must be the latitude and longitude of the circle centre, plus its radius. See the geocode section of https://dev.twitter.com/docs/api/1/get/search for a little more information. Google Maps is a good source of longitudes and latitudes for places on the earth, or see this list of longitudes and lattitudes for countries.
      1. If a geocode is added then the query can be left blank to collect all tweets from the location, irrespective of their content. To do this, remove the REPLACEME text from the URL above and add the geocode information at the end of the URL.
    3. See a list of additional useful operators, such as from:user to get all tweets from a user.
    4. Here are some examples of valid URLs.
      1. https://api.twitter.com/1.1/search/tweets.json?q=bieber&count=100 is a query for tweets containing the keyword bieber
      2. https://api.twitter.com/1.1/search/tweets.json?q=justin+bieber&count=100 is a query for tweets containing both of the keywords justin and bieber.
      3. https://api.twitter.com/1.1/search/tweets.json?q="justin+bieber"&count=100 is  a query for tweets containing the exact phrase justin bieber
      4. https://api.twitter.com/1.1/search/tweets.json?q=bieber&count=100&lang=en is a query for English tweets containing the keyword
      5. https://api.twitter.com/1.1/search/tweets.json?q=bieber&count=100&geocode=54,-3,500km is a query for tweets from the UK and Ireland containing the keyword bieber
      6. https://api.twitter.com/1.1/search/tweets.json?q=bieber&lang=en&count=100&geocode=54,-3,500km is a query for tweets in English and from the UK and Ireland (more precisely, from a 500kmm radius circle with longitude 54 and latitude -2) containing the keyword bieber.
    5. Here is an example of a valid text files of URLs.
    6. For advanced users, more information about query URL construction is here https://dev.twitter.com/docs/api/1/get/search.
  3. Follow the instructions here, except in the data collection wizard, check the option to run searches from a file.
Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.