It doesn’t take long to amass a huge assemblage of data when you’re regularly creating new documents, photos, videos, and other content. If you’re like the countless individuals and business owners who realize the power of data, you’re probably making a dedicated effort to keep that information protected. Having a well thought out disaster recovery strategy is highly recommended, but did you know that you are often creating “un-intentional backups” of your data even when you delete it? It’s true.
Not all data is sent to that informational graveyard in the sky once you click or tap the delete button. Data is more persistent than you ever imagined, and may reside on desktops, remote servers, and various storage devices in some form long after it’s been wiped at the surface. Whether it’s lurking at the local level or somewhere out in the vast web space, there are quite a few ways to track down that misplaced data.
1. Restoring Deleted Web Pages with Google Cache
Google Search has an arsenal of useful features. The cache is one lesser known function that website owners are finding valuable for recovering web content. Google keeps a cached version of all indexed web pages on its servers. While the feature is designed to help you view pages that are temporarily unavailable – be it due to a busy server or downtime, it can also pull up those that have been removed by the publisher. Here’s a quick overview of how it works:
1. Enter “site:yourdomain” in Google.
2. Google will show you a list of all your pages that have been indexed and stored in the cache.
3. Find the page you want to recover and click the “Cached” link beside the listing.
4. If all goes well, you’ll see the version of your page as it looked the moment it was crawled by the Google bot.
5. Right-click the page in your browser and select “View Page Source”.
6. Lastly, copy the HTML from the source code, and paste it in the website directory where you want the page to appear.
Google essentially takes a visual snapshot of your pages, so with as little info as your domain name, it’s possible to use the cache as an alternative method of recovering lost web pages and content. This same method can be used with most other search engines as well. Keep in mind that the Google cache method generally doesn’t work with server side scripting (PHP, ASP, etc.) and may call for additional steps if you’re trying to recover dynamic content in a database like MySQL.
2. Enter the Wayback Machine
The Google cache feature offers a very convenient way to recover web content once thought long gone. However, it’s a multi-step process so if you prefer something a bit simpler, the Wayback Machine may be your Huckleberry. The Wayback Machine is an online database that has been archiving web pages, complete with files, graphics and all, since 1996. Simply enter the URL to the page you’re looking for and if it has been archived, chances are, you’ll find it. With nearly 400 billion web pages saved, it’s estimated that this massive archive has hundreds of terabytes worth of data behind it.
Though it shares similarities, the Wayback Machine actually has an edge on Google’s cache. Unlike Google, which does not specify when pages are cached, this database lets you know exactly when a given page was archived. The specificity element makes the Wayback Machine a valuable tool for applications beyond restoring a deleted page from your website. A web designer could take a look at the structure of a client’s old website to get a better idea of what worked and what didn’t when creating a new one. Companies can use it to conduct deep competitive analysis of a rival firm’s site structure, content, and SEO strategy. It’s a powerful resource in the right hands.
3. Meticulous Discovery with Metadata
In the hunt for missing information, sometimes all you have is a small sample of data to work with. If it’s metadata, then you might be in luck. Best described as data about data, metadata is information designed to make specific informational sources easier to understand. As it relates to the web, metadata can be used to describe images, videos, and other content on a website. This information is also taken into consideration to help determine relevance when a user goes to search for something in Google, Bing, or Yahoo!
Metadata is all around us, being created manually and automatically as we interact with technology on a daily basis. When you create and save a file in Microsoft Word, for instance, you also generate various pieces of metadata to go along with it. The typical operating system merely hides the location to create more space, and speed up read and writing time when a file is deleted. By building your investigative efforts around metadata such as author name, file permissions, and date created, it’s possible to pinpoint the true location of the lost files on your system and recover them.
Digital Data Dies Hard
Most data is so persistent that unless it’s completely destroyed or someone takes the necessary steps to make sure it’s permanently deleted, it can usually be recovered one way or another. Whether you’re a member of law enforcement, an IT administrator, or just a bumbling computer geek, that persistence is extremely handy when it comes time to break out the forensics tools and go all CSI with your recovery efforts.
For more information on forensic and granular recovery, learn about StorageCraft Granular Recovery for Exchange.
Photo Credit: Danja Vasiliev via Flickr