Big Data Text Analysis
Can Mozdeh collect tweets relevant to a topic that is in the future or less than a week old?
Can Mozdeh collect tweets from a set of tweeters, even if they are from a long time ago?
Can Mozdeh collect all tweets relevant to a topic that is over a week old?
Does Mozdeh gather 100% of tweets or just the 1% allowed by Twitter?
Can Mozdeh gather tweets from a list of Twitter IDs?
Does Mozdeh save the tweets into a simple text file that I can analyse myself?
Does Mozdeh continue collecting data if the computer/laptop goes into Sleep or Hibernate mode?
What happens to the data if Mozdeh crashes or if there is a power cut?
Can I merge two projects?
Can Mozdeh find the timezones of the tweets?
Can Mozdeh create a subset of tweets based on date?
Can Mozdeh create networks of tweets from a given date range?
Does Mozdeh store the tweet URLs?
Does Mozdeh store the gender/sentiment/country information with the tweets?
Can I preview a Mozdeh project while it is collecting data?
Can I create a copy of my project with different time slicing (day/hour/month)?
Why is REPLY @user in the Mozdeh tweet but not in the original tweet?
Why do I see #NAME in my data?
Why does Mozdeh get truncated retweets, limited to 140 characters?
Why does my YouTube key not work or why has it stopped working?
Yes, but if the topic started before the data collection then it might miss some of the older tweets. Start collecting tweets from before the topic starts, if possible, to minimise the chance of loosing any. Continue collecting for a few days after the event too. If there are huge numbers of tweets then you may get a sample rather than all tweets.
Yes - use the Tweeters feature of Mozdeh (after starting Mozdeh and entering a project name, select the Tweeters tab in the data collection wizard and enter a list of the usernames to collect - full instructions here). This will collect the most recent approximately 2000-3200 tweets from each account, no matter now old the tweets are.
No - Mozdeh is restricted by Twitter to tweets that are up to a week old. You would have to buy this data from a Twitter data reseller, such as Gnip or DataSift, or sign up for commercial analytics services, such as Pulsar.
For keyword searches, Mozdeh gathers 100% of tweets provided that (a) they are not older than 1 week, (b) the volume of search results is not so big that Mozdeh is rate limited by Twitter and cannot get them all, and (c) they are not filtered out by Twitter for being spam, duplicates or duplicate retweets. In practice, this means that Mozdeh will get close to 100% of relevant tweets unless you use hundreds of queries or your topic is a major news story or Twitter event that gets tens of thousands of tweets. Twitter sets a 1% limit for its service that supplies all tweets rather than tweets matching a query - this does not apply here.
You can test this yourself by running a pilot test in Mozdeh with a single rare keyword (e.g., thelwall works well) and comparing the tweets that you get from Mozdeh with the Tweets listed by the Twitter search service. For the tweets that are up to 1 week old, there should be a 100% overlap, except that the live Twitter search includes #thelwall results and Mozdeh does not (so if you want them you would have to search for both thelwall and #thelwall in Mozdeh).
Yes. There is a checkbox under the Twitter search button to allow tweets to be downloaded instead of searched for. If you follow the procedure to search for tweets but look for this checkbox then you should see the checkbox. The twitter IDs should be in one or more files within the same folder with no other files in that folder and with one ID per line,
Yes - to see this file, select Show Reports in Folder from the Analyse menu, then navigate to the main project folder for your project and then open the "raw data" folder. This contains a plain text file with the tweets and can be loaded into a spreadsheet for analysis.
If Mozdeh crashes or there is a power cut then the data is not lost but will be saved in a file called something like TwitterSearches_Tweets.txt and stored within a subfolder of the project folder within moz_data called raw data. So the full path for the file might be something like c:\moz_data\SNP MP test\raw data\TwitterSearches_Tweets.txt. This will contain all of the data except perhaps a few tweets from the last few minutes before the crash. If you restart Mozdeh and select the project then it will process this file as a normal project. See below if you want to merge projects from two or more TwitterSearches_Tweets.txt files due to Mozdeh crashes or power cuts. You can only merge projects after the data collection has finished.
To create a single combined Mozdeh project for more than one set of posts collected by Mozdeh, then download Webometric Analyst, start it and close the startup Wizard. From the main search interface, select the Text menu, the Merge files submenu and the option Merge any number of text files (simple consecutive merge, no checking). When asked, reply Yes to the question about ignoring header lines after the first one. These files are inside the raw_data subfolder of the main project folders in moz_data. So the full path for the file might be something like c:\moz_data\SNP MP test\raw data\TwitterSearches_Tweets.txt. Select all the different Mozdeh TwitterSearches_Tweets.txt files, one at a time, and choose any name for the merged file. One the merged file is ready, export it back to Mozdeh using the Webometric Analyst button Convert Twitter Files to Mozdeh Format in the Twitter tab on the main interface. After this, start the new project in Mozdeh and it will process it.
No. When the computer goes into sleep (or hibernate) mode, all programs stop running, including Mozdeh, so it would not be able to collect any data until it is brought out of sleep/hibernate mode. When Mozdeh is woken from sleep it can carry on as normal without a warning that it was asleep.
Mozdeh will work with the screen turned off, so it is safe to configure your computer power management settings to switch the display off after 30 minutes of inactivity as long “never” is set as the time period before going into sleep/hibernation mode (The “Put the Computer to sleep:” Power Option setting).
From the File menu in Mozdeh, select Add country code to raw tweet file and then select the file of original tweets - time zone information will be added as an extra column, when known.
It is tricky to create a subproject with a specified date range but with this one you can create a new project with the old data and select a range.
For this one, you will need to use the original project, not a date filtered new project. After opening the original project, select "Make new raw data file with date restrictions" from the data menu and choose the filtered raw data file and the date range you want. This will save a date-restricted copy of the raw data file in the same place as the original one.
Now when you select the network creation option select the new filtered raw data file rather than the original one and you should get a network with data only from these dates.
The original link is not saved anywhere, but if the tweet is still live then you can find it by adding the tweet Entry ID (second column of raw data file) to the end of the URL https://twitter.com/statuses/ and it should redirect the standard URL. For example,
https://twitter.com/statuses/862336612591689733 redirects to https://twitter.com/CamOpenAccess/status/862336612591689733
The gender information is only in the interface version, sorry. Once it has been examined in the interface, the gender information is saved in a file called genderinfo.txt in the main project folder, which matches up with the ID numbers, in case this helps. The same is true for sentiment in the "Item and Feed IDs" folder. There is a "Add country code" option in the File menu of Mozdeh, which might be useful.
Yes - click the Make copy of project button towards the bottom of the data collection screen. You will need to start a second copy of Mozdeh, which can run at the same time as the first one, and open the project copy to see it.
Yes but it is a bit tricky.
If a tweet is recorded by Twitter as a reply to @user1 but @user1 is not mentioned in the tweet then Mozdeh adds REPLY @user1 to the start of the tweet at data collection time so that these replies can be included in network analyses.
If you view YouTube video comments in Excel then any line (comment or videoID) starting with a minus sign is interpreted as a "bad formula name" by Excel. To get round this problem, start a new copy of Excel, Right click in the top left hand corner of a worksheet, select format and Text. This converts all cells of the worksheet to expect text and not try to convert anything into a formula. If you copy and paste your Mozdeh data into here then it should no longer produce #NAME anywhere. If you still get #NAME then it is possible that (a) you have previously saved the file with Excel or (b) your computer is configured to process text files through Excel in some way, even though it looks like you are not using Excel. For problem (a) you would have to re-collect the data, but for (b) you might have to try a different computer.
Mozdeh gets tweets from the free Twitter API, which truncates unmodified retweets to 140 characters. All other tweets should be full length.
This may be caused by a YouTube glitch. Try logging on to the Google Developer platform, creating a new project, adding the YouTube Data API v3 to the new project and generating credentials for it (in that order). This new key might work.