Mozdeh

Big Data Text Analysis

Home -- Download -- Instructions -- FAQ

Word Frequency Analyses

There are two types of word frequency analyses: individual query and whole project. For both of these, first download and install Mozdeh and then collect posts from Twitter, YouTube or elsewhere as nomal.

Relative word frequency analyses for individual queries

It can be useful to identify the words that are unusually common in posts that match a particular query because they may point to important aspects of the topic discussed. This can be achieved in the following four steps.

  1. [Optional] Click the List by project chi square option (1) to ensure that the terms found are listed in order of decreasing importance to the query, as reflected by a chi square calculation.
  2. Enter the query in the normal search box(2)
  3. Click the Boolean Search button (3) to run a query in Mozdeh as usual.
  4. Click the Calculate Word Frequencies for Search Matches button (4).
  5. Examine the text box (5) to see a lis of words that are unusually common in texts matching the query compared to texts not matching the query (as measured by the chi square statistic) listed in descending order.


The chi square value indicates how statistically significant the difference is between the frequency of the terms in the topic-specific collection of texts compared to all texts collected, with higher values indicating more significant results. The results can be copied to a spreadsheet or word processer for analysis. To do this, right click in the box of results on the top right of the screen, left click and click Select All from the menu that appears, then left click and click Copy from the menu and then paste the results into your spreadsheet or word processor document.

[Advanced] Relative word frequency analyses for an entire topic in Mozdeh

Once texts have been gathered for a topic, it can be useful to identify the words that are unusually common in the entire project. This can be achieved in two steps. First, you must create and load a generic list of word frequencies that the topic words can be compared against. This should be a plain text file with a list of words and their frequencies from a common, generic collection of texts. For example, the following file of word frequencies for a large collection of UK and Ireland tweets could be used (unzip first). To register the file, click the Load Word Frequency List as Reference Set for Above Results button on the bottom right of the screen (see 1 below). Now clear the search box (2), check the List by Z score or chi square option (3) and click the Calculate Word Frequencies for all Search Matches (slow) button (4) and the results will appear in the text box above it (5).


The z score indicates how statistically significant the difference is between the frequency of the terms in the topic-specific collection of texts compared to the general collection, with higher terms indicating more significant results. The results can be copied to a spreadsheet or word processer for analysis. To do this, right click in the box of results on the top right of the screen, left click and click Select All from the menu that appears, then left click and click Copy from the menu and then paste the results into your spreadsheet or word processor document.

 

Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.