Big Data Text Analysis
Can Mozdeh collect tweets relevant to a topic that is in the future or less than a week old?
Can Mozdeh collect tweets from a set of tweeters, even if they are from a long time ago?
Can Mozdeh collect all tweets relevant to a topic that is over a week old?
Does Mozdeh gather 100% of tweets or just the 1% allowed by Twitter?
Can Mozdeh gather tweets from a list of Twitter IDs?
Does Mozdeh save the tweets into a simple text file that I can analyse myself?
Can I find which country users are from?
Does Mozdeh continue collecting data if the computer/laptop goes into Sleep or Hibernate mode?
What happens to the data if Mozdeh crashes or if there is a power cut?
Can I merge two projects?
Can Mozdeh find the timezones of the tweets?
Can Mozdeh create a subset of tweets based on date?
Can Mozdeh create networks of tweets from a given date range?
Does Mozdeh store the tweet URLs?
Does Mozdeh store the gender/sentiment/country information with the tweets?
Can I preview a Mozdeh project while it is collecting data?
Can I create a copy of my project with different time slicing (day/hour/month)?
Why is REPLY @user in the Mozdeh tweet but not in the original tweet?
Why do I see #NAME in my data?
Why does Mozdeh get truncated retweets, limited to 140 characters?
Why does my YouTube key not work or why has it stopped working?
Can I find nonbinary users?
Can I see user self-descriptions?
Can I count word frequencies per user instead of per text (e.g., for Twitter timeline data)?
Why am I getting a proxy server error?
Why do I always get a file busy error when running Mozdeh?
Yes, but if the topic started before the data collection then it might miss some of the older tweets. Start collecting tweets from before the topic starts, if possible, to minimise the chance of loosing any. Continue collecting for a few days after the event too. If there are huge numbers of tweets then you may get a sample rather than all tweets.
Yes - use the Tweeters feature of Mozdeh (after starting Mozdeh and entering a project name, select the Tweeters tab in the data collection wizard and enter a list of the usernames to collect - full instructions here). This will collect the most recent approximately 2000-3200 tweets from each account, no matter now old the tweets are.
No - Mozdeh is restricted by Twitter to tweets that are up to a week old. You would have to buy this data from a Twitter data reseller, such as Gnip or DataSift, or sign up for commercial analytics services, such as Pulsar.
For keyword searches, Mozdeh gathers 100% of tweets provided that (a) they are not older than 1 week, (b) the volume of search results is not so big that Mozdeh is rate limited by Twitter and cannot get them all, and (c) they are not filtered out by Twitter for being spam, duplicates or duplicate retweets. In practice, this means that Mozdeh will get close to 100% of relevant tweets unless you use hundreds of queries or your topic is a major news story or Twitter event that gets tens of thousands of tweets. Twitter sets a 1% limit for its service that supplies all tweets rather than tweets matching a query - this does not apply here.
You can test this yourself by running a pilot test in Mozdeh with a single rare keyword (e.g., thelwall works well) and comparing the tweets that you get from Mozdeh with the Tweets listed by the Twitter search service. For the tweets that are up to 1 week old, there should be a 100% overlap, except that the live Twitter search includes #thelwall results and Mozdeh does not (so if you want them you would have to search for both thelwall and #thelwall in Mozdeh).
Yes. There is a checkbox under the Twitter search button to allow tweets to be downloaded instead of searched for. If you follow the procedure to search for tweets but look for this checkbox then you should see the checkbox. The twitter IDs should be in one or more files within the same folder with no other files in that folder and with one ID per line,
Yes - to see this file, select Show Reports in Folder from the Analyse menu, then navigate to the main project folder for your project and then open the "raw data" folder. This contains a plain text file with the tweets and can be loaded into a spreadsheet for analysis.
If Mozdeh crashes or there is a power cut then the data is not lost but will be saved in a file called something like TwitterSearches_Tweets.txt and stored within a subfolder of the project folder within moz_data called raw data. So the full path for the file might be something like c:\moz_data\SNP MP test\raw data\TwitterSearches_Tweets.txt. This will contain all of the data except perhaps a few tweets from the last few minutes before the crash. If you restart Mozdeh and select the project then it will process this file as a normal project. See below if you want to merge projects from two or more TwitterSearches_Tweets.txt files due to Mozdeh crashes or power cuts. You can only merge projects after the data collection has finished.
To create a single combined Mozdeh project for more than one set of posts collected by Mozdeh, then download Webometric Analyst, start it and close the startup Wizard. From the main search interface, select the Text menu, the Merge files submenu and the option Merge any number of text files (simple consecutive merge, no checking). When asked, reply Yes to the question about ignoring header lines after the first one. These files are inside the raw_data subfolder of the main project folders in moz_data. So the full path for the file might be something like c:\moz_data\SNP MP test\raw data\TwitterSearches_Tweets.txt. Select all the different Mozdeh TwitterSearches_Tweets.txt files, one at a time, and choose any name for the merged file. One the merged file is ready, export it back to Mozdeh using the Webometric Analyst button Convert Twitter Files to Mozdeh Format in the Twitter tab on the main interface. After this, start the new project in Mozdeh and it will process it.
No. When the computer goes into sleep (or hibernate) mode, all programs stop running, including Mozdeh, so it would not be able to collect any data until it is brought out of sleep/hibernate mode. When Mozdeh is woken from sleep it can carry on as normal without a warning that it was asleep.
Mozdeh will work with the screen turned off, so it is safe to configure your computer power management settings to switch the display off after 30 minutes of inactivity as long “never” is set as the time period before going into sleep/hibernation mode (The “Put the Computer to sleep:” Power Option setting).
From the File menu in Mozdeh, select Add country code to raw tweet file and then select the file of original tweets - time zone information will be added as an extra column, when known.
It is tricky to create a subproject with a specified date range but with this one you can create a new project with the old data and select a range.
For this one, you will need to use the original project, not a date filtered new project. After opening the original project, select "Make new raw data file with date restrictions" from the data menu and choose the filtered raw data file and the date range you want. This will save a date-restricted copy of the raw data file in the same place as the original one.
Now when you select the network creation option select the new filtered raw data file rather than the original one and you should get a network with data only from these dates.
The original link is not saved anywhere, but if the tweet is still live then you can find it by adding the tweet Entry ID (second column of raw data file) to the end of the URL https://twitter.com/statuses/ and it should redirect the standard URL. For example,
https://twitter.com/statuses/862336612591689733 redirects to https://twitter.com/CamOpenAccess/status/862336612591689733
The gender information is only in the interface version, sorry. Once it has been examined in the interface, the gender information is saved in a file called genderinfo.txt in the main project folder, which matches up with the ID numbers, in case this helps. The same is true for sentiment in the "Item and Feed IDs" folder. There is a "Add country code" option in the File menu of Mozdeh, which might be useful.
Yes - click the Make copy of project button towards the bottom of the data collection screen. You will need to start a second copy of Mozdeh, which can run at the same time as the first one, and open the project copy to see it.
Yes but it is a bit tricky.
If a tweet is recorded by Twitter as a reply to @user1 but @user1 is not mentioned in the tweet then Mozdeh adds REPLY @user1 to the start of the tweet at data collection time so that these replies can be included in network analyses.
If you view YouTube video comments in Excel then any line (comment or videoID) starting with a minus sign is interpreted as a "bad formula name" by Excel. To get round this problem, start a new copy of Excel, Right click in the top left hand corner of a worksheet, select format and Text. This converts all cells of the worksheet to expect text and not try to convert anything into a formula. If you copy and paste your Mozdeh data into here then it should no longer produce #NAME anywhere. If you still get #NAME then it is possible that (a) you have previously saved the file with Excel or (b) your computer is configured to process text files through Excel in some way, even though it looks like you are not using Excel. For problem (a) you would have to re-collect the data, but for (b) you might have to try a different computer.
In the 20 June 2020 upgrade, Mozdeh should gather full tweets in all cases. It previously truncated retweets.
YouTube keys suddenly stopping working seems to occur a lot and may be caused by a YouTube glitch. Try logging on to the Google Developer platform, creating a new project, adding the YouTube Data API v3 to the new project and generating credentials for it (in that order). This new key might work. This almost always works for me.
This is possible for Twitter but not for YouTube or other sources.
Versions of Mozdeh from July 2020 onwards can detect the country of Twitter users (see the menu option: Advanced|Get countries of Twitter users...). It does this by retrieving the location information from Twitter (may require a Twitter logon) and matching it against a list of country names in English, and a list of major city names in English. For example, a user location of Wolverhampton, UK would map to UK and a user location of Beijing would match to China. But a user location of My Home, 中国, or Wombourne (small village) would map to None.
This is possible for Twitter but not for YouTube or other sources.
Versions of Mozdeh from July 2020 onwards can detect nonbinary Twitter users (see the menu option: Advanced|Identify nonbinary, male, female...; you many need to activate Advanced| Get countries of Twitter users... first). It does this by retrieving the user self-description information from Twitter for all users in the current project (may require a Twitter logon) and searching the display name and self-description fields. Any user reporting they/them pronouns in either field and not she/her or he/him is categorised as nonbinary.
This is possible for Twitter but not for YouTube or other sources.
Versions of Mozdeh from July 2020 onwards can show user descriptions when clicking on a search result. This is only possible after loading the countries of search results (this also downloads the description information). To load countries, see the menu option: Advanced|Get countries of Twitter users.... It does this by retrieving the user self-description information from Twitter for all users in the current project and reporting the self-description fields. After loading countries, load the self-descriptions (see the menu option: Advanced|Report user descriptions...).
Yes but you will need a new project, importing your old data into it and selecting the option, Merge all texts from the same user. Here is how it works for Twitter timelines.
Copy the UserNames_Timelines.txt file into a new folder where it is on its own.
Start Mozdeh, enter a new project name and click the Import data button.
Enter number 1 (Twitter) for the data type and click OK.
Browse for the folder containing one file with all the tweeters' tweets and click OK.
Select Merge all texts from the same user at the bottom of the massive dialog box and click OK.
Click OK for all the other dialog boxes to accept the results.
This should give a new project in which each doctor has one "megatweet" consisting of all their tweets merged into one. The file vocabulary_items.txt in the project folder should then report the number of users tweeting each word at least once.
If you get an error message about proxy permissions when running Mozdeh from a work computer, then please either get your network administrator to allow Mozdeh to access the internet or run it from a non-work computer, such as from home, if you can.
If you get a file access error and are running Mozdeh from a network drive, the cloud or storage not on your computer, please try running it from a USB stick attached to your computer because network delays can cause it problems.
|Made by the Statistical Cybermetrics Research Group at the University of Wolverhampton during the CREEN and CyberEmotions EU projects.|