Big Data Text Analysis

Home -- Download -- Instructions -- FAQ

Social Media Data Analysis

This is an online course about social media data analysis with Mozdeh. It is usually given face-to-face with a workshop but this page contains recordings of the presentations as well as copies of the slides and workshop handouts. The workshop starts with Twitter and progresses to YouTube (optional). It also starts with data gathering and progresses through more complex analyses using Mozdeh. Live versions normally run twice a year as a Cathie Marsh Institute for Social Research short course and elsewhere.

Download a zipfile of all presentation slides and workshop handouts.

1. Introduction to social media data analysis video.

2. Methods for gathering tweets video. You might like to try searching for any topics of interest, perhaps trying the until:yyyy-mm-dd command.

3. Gathering, searching and filtering tweets with Mozdeh video. See also a live demo of gathering tweets with Mozdeh video. Now try workshop handout section 1: Downloading, searching and exporting tweets to Excel.

4. [optional] Bonus demo about how to download tweets from a set of users and create networks from them video. Now try workshop handout section 3: Twitter networks [optional].

5. Association mining with Mozdeh video. Now try workshop handout section 2, 3, 4: Downloading users’ tweets; Time series, word association, and sentiment analysis; Topic changes over time.

6. General advice for Twitter projects is in workshop handout section 5: Twitter project advice (No video).

7. Gathering and analysing YouTube video comments video. Now try workshop handout section 6, 7, 8: Downloading YouTube comments; Downloading YouTube channels; Registering for a key.

8. [Optional]: Word Association Thematic Analysis for tweets or YouTube comments video (no workshop tasks). A book on this method is due to be published soon and Google Scholar can point you to the full text of the following papers using it:

Thelwall, M. & Mas-Bleda, A. (2018). YouTube science channel video presenters and comments: Female friendly or vestiges of sexism? Aslib Journal of Information Management, 70(1), 28-46.

Thelwall, M. & Stuart, E. (2019). She’s Reddit: A source of statistically significant gendered interest information? Information Processing & Management, 56(4), 1543-1558.

Thelwall, M. & Thelwall, S. (2020). Covid-19 tweeting in English: Gender differences. El Profesional de la Información, 29(3), e290301.

Please email m dot thelwall at with any questions. If you are having a technical issue, please include a screenshot of the problem, if possible, and describe what happened just before the problem.

Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.