Big Data Text Analysis
Word association analysis is introduced here. This page describes how to apply it for gender.
How does Mozdeh detect gender? If your texts are Tweets or YouTube comments then Mozdeh matches usernames against a list of common first names that are used at least 90% of the time by either males (e.g., Winston) or females (e.g., Bethany), assigning the appropriate gender. The remaining texts are assigned no gender. If using Twitter, Mozdeh can also detect nonbinary users from Twitter bio they/them pronouns (Advanced menu, Identify nonbinary... option). The gendered name lists are for the USA based on 2011 census information and GenderAPI.com. Email Mike Thelwall for name lists for other countries.
To find words that are more in texts by males, females or nonbinaries than in texts by the rest:
1) Clear the search box and all non-gender filters, select the appropriate gender in the drop-down box (middle, left of the analysis screen).
2) Click Compare Mine associations for search and filters.
The results appear in the box underneath the button. Two examples are below. The first example has a male gender selected. To explain the results, the third row shows that the hashtag #peoplesarmy is in 1.5% of male-authored tweets and 0.4% of the remaining tweets (female, nonbinary, unassigned), so #peoplesarmy is a male-associated term. Stars at the end of the column indicate that the differences are statistically significant. In this case only @paulnuttallukip is statistically significant, so the apparent male association of the remaining terms may be due to chance.
Compare male vs. female for each query and click Compare words matching the above queries (slow).
If you have a small dataset then you can expect few, if any, significant differences (as above for 3000 UKIP tweets). For large data sets there may be many differences (as below for 1.6 million TripAdvisor comments), with statistical significance starts in the final column. For example, the table below shows that males are more likely than others to mention wife, beer and excellent (amongst many other terms) in their TripAdvisor posts.
The selected gender vs. the rest is the simplest but not the best method to use for gender comparisons because the unknown gender set is likely to include a mix of males, females, nonbinaries and organisations so comparing specified genders against each other is more powerful.
To find words that are in more texts by gender A then in texts by gender B
1) Click the Association Mining Comparisons tab.
2) Within this tab, enter the two queries to be compared in angle brackets (<N>, <F>, <M>) separated by a comma. For example, to compare male-authored texts against female-authored texts, enter <F>,<M>
2) Click Compare words matching the above queries and wait for the processing to finish.
The results are saved to a plain text file (name ending in ) that can be read by copying it to a spreadsheet like Excel.
|Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.|