Mozdeh

Big Data Text Analysis

Home -- Download -- Instructions -- FAQ

Twitter Thick Description Analysis Steps with Mozdeh

A thick description is a method of creating a detailed description of a topic, as captured by Twitter searches. This is based upon the chapter 8 of the free book below.

Thelwall, M. (2013). Webometrics and social web research methods [free in-progress draft copy]. University of Wolverhampton.

Content analysis in Mozdeh

A random sample of texts can be saved by Mozdeh by clicking the Save tab (0 below), clearing the search text box (1), checking the Save Random Matching Tweets to Text File (2) and clicking the Boolean Search button (3).

When asked for a location to save the results to, make sure that you remember and can find the location that you chose. Hint: If you can't find it, select View all Reports Created So Far from the Analyse menu.
To run a content analysis, load the new file into a spreadsheet. The easiest way to do this is to open a spreadsheet program, then open the text file, select and copy all the text in the text file and paste it into the spreadsheet.

Relative word frequency analyses for an entire topic in Mozdeh

Once tweets have been gathered for a topic, it can be useful to identify the words that are unusually common in these tweets because they may point to important aspects of the topic discussed. This can be achieved in two steps. First, register a generic list of word frequencies that the topic words can be compared against. This should be a plain text file with a list of words and their frequencies from a common, generic collection of tweets. For example, the following file of word frequencies for a large collection of UK and Ireland tweets could be used (unzip first). To register the file, click the Load Word Frequency List as Reference Set for Above Results button on the bottom right of the screen (see 1 below). Now clear the search box (2), check the List by Z score option (3) and click the Calculate Word Frequencies for all Search Matches (slow) button (4) and the results will appear in the text box above it (5).


The z score indicates how statistically significant the difference is between the frequency of the terms in the topic-specific collection of tweets compared to the general collection, with higher terms indicating more significant results. The results can be copied to a spreadsheet or word processer for analysis. To do this, right click in the box of results on the top right of the screen, left click and click Select All from the menu that appears, then left click and click Copy from the menu and then paste the results into your spreadsheet or word processor document.

Relative word frequency analyses for queries within a topic in Mozdeh

Once tweets have been gathered for a topic, it can be useful to identify the words that are unusually common in tweets that match a particular query because they may point to important aspects of the topic discussed. This can be achieved in four steps; click the List by project chi square option (1); enter the query (2); click the Boolean Search button (3) and click the Calculate Word Frequencies for Search Matches button (4). Words that are unusually common in tweets matching the query compared to tweets not matching the query (as measured by the chi square statistic) will be listed in descending order in the box above (5).


The chi square value indicates how statistically significant the difference is between the frequency of the terms in the topic-specific collection of tweets compared to all tweets collected, with higher values indicating more significant results. The results can be copied to a spreadsheet or word processer for analysis. To do this, right click in the box of results on the top right of the screen, left click and click Select All from the menu that appears, then left click and click Copy from the menu and then paste the results into your spreadsheet or word processor document.

Creating and saving a time series of tweets in the collection

A time series of how often the tweets in the collection occurred for each hour selected can show patterns of changes in interest over time. To create such a graph, select Graph Time Series from the Analyse menu, then click on the box in the top left hand corner (1) and clear the text so that it is blank, then click Create Graph with Boolean Search (2). To print the graph, click Show Graph Formatting Options (3) and click Print Graph (4). Before printing, select the printer options in the print dialog box and change the paper layout to Landscape or some of the graph will be missing. To return to the main search screen, select Search from the Analyse menu.


Creating and saving a time series of tweets in the collection that match a query

A time series of how often the tweets in the collection match a particular query can show patterns of changes in interest related to that query (e.g., riot) over time. To create such a graph, select Graph Time Series from the Analyse menu, then click on the box in the top left hand corner (1 above) and enter your search term in the box, then click Create Graph with Boolean Search (2). To print the graph, click Show Graph Formatting Options (3) and click Print Graph (4). Before printing, select the printer options in the print dialog box and change the paper layout to Landscape or some of the graph will be missing.

Time series scanning

This is only for if you have at least 50 thousand tweets collected over at least 10 days! See the instructions at the bottom of this page.

 

Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.