Big Data Text Analysis

Home -- Download -- Instructions -- FAQ

Downloading Reddit posts with Mozdeh

This page describes how to download up to 1000 posts in each of one or more subreddits with Mozdeh.

  1. Download Mozdeh to a Windows computer. You may need to navigate safety messages to save it.
  2. Start Mozdeh, enter a project name and click New Project.
  3. Select the Reddit tab in the Data Collection interface (see below). Enter a list of subreddit names, one per line, in the big box near the top. Tick the type of content you want to collect (recent, new links and/or top links). Click Collect Reddit Posts and wait for it to finish (may take minutes, hours, or days, depending on how many posts need to be downloaded).
  4. .
  5. Once Mozdeh has finished downloading the posts, it will ask you a series of questions about processing them. Here are the recommended options: (a) Yes to filter out duplicate texts, (b) 1 for the Which langauge question if your comments are mainly in English, (c) OK to the Preparing project warning, (d) OK to the indexing options dialog box (accept the defaults), and (e) wait for five minutes to several days for the comments to be processed, depending on how many there are.
  6. You will end up at a large search filter interface that can be used to explore the comments. See this page of information about how to search or filter the comments. The posts are also saved to a plain text file inside a folder called raw_data inside the project folder.
Made by the University of Wolverhampton during the CREEN and CyberEmotions EU projects and updated at the University of Sheffield.