Big Data Text Analysis
Can I use Mozdeh to collect all tweets relevant to a topic that is in the future or less than two weeks old?
Can I use Mozdeh to collect all tweets made by one or more people, even if they are from a long time ago?
Can I use Mozdeh to collect all tweets relevant to a topic that is over two weeks old?
Does Mozdeh gather 100% of tweets or just the 1% allowed by Twitter?
Does Mozdeh save the tweets into a simple text file that I can analyse myself?
Does Mozdeh continue collecting data if the computer/laptop goes into Sleep or Hibernate mode?
What happens to the data if Mozdeh crashes or if there is a power cut?
Can I merge two projects?
Can I find the timezones of the tweets?
Can I create a subset of tweets based on date?
Can I create networks of tweets from a given date range?
Does Mozdeh store the tweet URLs?
Does Mozdeh store the #gendergender/sentiment/country information with the tweets?
Can I preview a Mozdeh project while it is collecting data?
Can I create a copy of my project with different time slicing (day/hour/month)?
Why is REPLY @user in the Mozdeh tweet but not in the original tweet?
Yes, but if the topic started before the data collection then it might miss some of the older tweets. Start collecting tweets from before the topic starts, if possible, to minimise the chance of loosing any. Continue collecting for a few days after the event too. If there are huge numbers of tweets then you may get a sample rather than all tweets.
Yes - use the timeline feature of Mozdeh (after starting Mozdeh and entering a project name, select the Timeline tab in the data collection wizard and enter a list of the usernames to collect - full instructions here). This will collect the most recent approximately 3200 tweets from each account, no matter now old the tweets are.
No - Mozdeh is restricted by Twitter to tweets that are up to two weeks old. You would have to buy this data from a Twitter data reseller, such as Gnip or DataSift, or sign up for commercial analytics services, such as Pulsar.
For keyword searches, Mozdeh gathers 100% of tweets provided that (a) they are not older than 1 week, (b) the volume of search results is not so big that Mozdeh is rate limited by Twitter and cannot get them all, and (c) they are not filtered out by Twitter for being spam, duplicates or duplicate retweets. In practice, this means that Mozdeh will get close to 100% of relevant tweets unless you use hundres of queries or your topic is a major news story or Twitter event that gets tens of thousands of tweets. Twitter sets a 1% limit for its service that supplies all tweets rather than tweets matching a query - this does not apply here.
You can test this yourself by running a pilot test in Mozdeh with a single rare keyword (e.g., thelwall works well) and comparing the tweets that you get from Mozdeh with the Tweets listed by the Twitter search service. For the tweets that are up to 1 week old, there should be a 100% overlap, except that the live Twitter search includes #thelwall results and Mozdeh does not (so if you want them you would have to search for both thelwall and #thelwall in Mozdeh).
Yes - to see this file, select Show Reports in Folder from the Analyse menu, then navigate to the main project folder for your project and then open the "raw data" folder. This contains a plain text file with the tweets and can be loaded into a spreadsheet for analysis.
If Mozdeh crashes or there is a power cut then the data is not lost but will be saved in a file called something like TwitterSearches_Tweets.txt and stored within a subfolder of the project folder within moz_data called raw data. So the full path for the file might be somethign like c:\rss_data\SNP MP test\raw data\TwitterSearches_Tweets.txt. This will contain all of the data except perhaps a few tweets from the last few minutes before the crash. If you restart Mozdeh and select the project then it will process this file as a normal project. See below if you want to merge projects from two or more TwitterSearches_Tweets.txt files bue to Mozdeh crashes or powercuts. You can only merge projects after the data collection has finished.
To create a single combined Mozdeh project for more than one set of posts collected by Mozdeh, then download Webometric Analyst, start it and close the startup Wizard. From the main search interface, select the Text menu, the Merge files submenu and the option Merge any number of text files (simple consecutive merge, no checking). When asked, reply Yes to the question about ignoring header lines after the first one. These files are inside the raw_data subfolder of the main project folders in moz_data. So the full path for the file might be somethign like c:\moz_data\SNP MP test\raw data\TwitterSearches_Tweets.txt. Select all the different Mozdeh TwitterSearches_Tweets.txt files, one at a time, and choose any name for the merged file. One the merged file is ready, export it back to Mozdeh using the Webometric Analyst button Convert Twitter Files to Mozdeh Format in the Twitter tab on the main interface. After this, start the new project in Mozdeh and it will process it.
No. When the computer goes into sleep (or hibernate) mode, all programs stop running, including Mozdeh, so it would not be able to collect any data until it is brought out of sleep/hybernate mode. When Mozdeh is woken from sleep it can carry on as normal without a warning that it was asleep.
Mozdeh will work with the screen turned off, so it is safe to configure your computer power management settings to switch the display off after 30 minutes of inactivity as long “never” is set as the time period before going into sleep/hibernation mode (The “Put the Computer to sleep:” Power Option setting).
From the File menu in Mozdeh, select Add country code to raw tweet fileand then select the file of origial tweets - timezone information will be added as an extra column, when known.
It is tricky to create a subproject with a specified date range but with this one you can create a new project with the old data and select a range.
For this one, you will need to use the original project, not a date filtered new project. After opening the original project, select "Make new raw data file with date restrictions" from the data menu and choose the filtered raw data file and the date range you want. This will save a date-restricted copy of the raw data file in the same place as the original one.
Now when you select the network creation option select the new filtered raw data file rather than the original one and you should get a network with data only from these dates.
The original link is not saved anywhere, but if the tweet is still live then you can find it by adding the tweet Entry ID (second column of raw data file) to the end of the URL https://twitter.com/statuses/ and it should redirect the standard URL. For example,
https://twitter.com/statuses/862336612591689733 redirects to https://twitter.com/CamOpenAccess/status/862336612591689733
The gender information is only in the interface version, sorry. Once it has been examined in the interface, the gender information is saved in a file called genderinfo.txt in the main project folder, which matches up with the ID numbers, in case this helps. The same is true for sentiment in the "Item and Feed IDs" folder. There is a "Add country code" option in the File menu of Mozdeh, which might be useful.
Yes - click the Make copy of project button towards the bottom of the data collection screen. You will need to start a second copy of Mozdeh, which can run at the same time as the first one, and open the project copy to see it.
Yes but it is a bit tricky.
If a tweet is recorded by Twitter as a reply to @user1 but @user1 is not mentioned in the tweet then Mozdeh adds REPLY @user1 to the start of the tweet at data collection time so that these replies can be included in network analyses.