Big Data Text Analysis

Home -- Download -- Instructions -- FAQ

Topic Modelling with Mozdeh

Topic modelling is an algorithmic approach for extracting topics from a set of documents. Mozdeh does not include topic modelling algorithms but code and instructions are given below for topic modelling from Mozdeh data (tweets or YouTube comments).

The picture below is the output of a topic model of a set of news-related tweets from one day. Some of the topics relate to individual news stories, but others are quite vague. This seems typical for tweets.


You will need to download and install the statistical software R, then download a program to process this data, and modify one or more lines of this program so that it can find your Mozdeh data. The program will generate a topic model for a Twitter or YouTube project, attempting to find the main topics discussed by people that posted the texts.


These instructions may take 1 day if you are not familiar with R, or one hour if you are.

Download and install R Studio, if you don't already have it (free). You will need to find an online tutorial to learn the basics of R Studio, if you are not familiar with it.

Download the Mozdeh topic modelling software for Twitter, or YouTube or plain text (whatever data you have) and load it in R Studio (File > Open > R Script).

Near the top of the script you will need to edit the folder names to point to the relevant "raw data" folders on your computer. They will be sub-sub-sub-folders of the folder from where you started Mozdeh.

Then run the code and look for the output in the raw data folder that you edited above.

EXTRA HELP: More detailed notes on topic modelling with R from one of my classes.



Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.