A Primer on Data Archival and Retention


This is a copy of an article I recently wrote on my site. Original article here.


The world is getting more and more digitized. The growth in importance and size of our accruing personal data sometimes necessitates implementing software and hardware solutions for the maintenance and retainment thereof. For those who aren't as familiar with data archival of devices such as desktop PCs, phone, and other digital devices, this article will hopefully get you on track to being aware that your data has a life expectancy and the basic threats to data integrity. This article isn't intended for archivists, necessarily, though if you consider yourself one you probably know all of the following information anyway. This is intended more as a guide for personal data protection and archival for those less technically familiar and want to make sure they're up to speed with the basics. My anxiety and coffee fueled write-ups are also occasionally useful for when I don't feel like writing a long paragraph when someone online asks for advice on certain subjects as well. You may likely assume that protecting your data and backing it up is pretty obvious but just as we have TV PSAs to remind of us the seemingly obvious, so too will I bring you a PSA. It's time we care for our data because only you can prevent data loss!

First, let's recognize one of the most common and obvious loss vectors in a typical consumer scenario; hardware failure. To mitigate this, you should always have a backup if your data means anything to you. And I assume it does if you're reading this article. Backups help assure against total or partial loss due to accidental or malicious data deletion or rarely to what's called bitrot, which may happen when your OS gets "cluttered". In rarer cases it can also be defined as natural wear-and-tear or even to the entropic forces that cause data decay wherein random bits on your hard drive platters or storage device may get flipped. This could be due to the influence of neighboring bits, temperature fluctuations within the device, or aliens from outer space. Some of those problems can be mitigated by using a PC with ECC ram and by using a highly redundant file system such as ZFS if you care to delve into using a GNU-Linux OS.But this being only a primer, we'll not get too technical here. Assuming you don't have a dedicated backup drive that's separate from your usual information acquisition device, ie, PC, phone, etc, maybe I can influence you with free software that can read your storage device's SMART (health) status. One such SMART reading software I can recommend is CrystalDiskInfo due to its simple interface and also being open source freeware. Download it, run it, and if the health status is yellow or red, I'd recommend backing up immediately. But sometimes the SMART status isn't always telling of your device's health status though it can be a good indicator of possible and largely predictable drive failure. That said, it's not unheard of for storage devices, hard drives in this case, to have sudden head crashes so it's always good to have a backup on hand. Sometimes in the case of hard drive failure, I've read of many people have had success by putting the drive in a moisture-proof bag and freezing it anywhere from one to twenty-four hours, then when cooled sufficiently quickly hooking it up to your PC to transfer the data. Anecdotally, I've tried this once but with no luck. In the case of solid state drives, if you want more information of your drives health, you can also download SSDLife though I don't have experience with it as I don't have an SSD drive. Donations are welcome :).

If you're using a phone or other portable device, it's good to frequently back up your data to another media format that you can easily access in the future to retrieve your original data, such as a flash drive, hard drive, solid state drive, USB drive, or optical media such as CD/DVD/Bluray. If using optical, you may prefer to pay more and buy archival quality discs. Optical degradation when stored in temperature and humidity controlled rooms where there's little UV penetration isn't much of a worry unless it's burned using typical burning media. But even in that case, in my experience, as long as you avoid sunlight, extreme temperatures and extreme humidity, even cheap media can last ten or more years. Part of the resilience of commercial optical media is that it's pressed and so UV rays and moisture don't tend to be as big of a factor as with home burned media. With burned optical media, dies are chemically changed when "burned". The die product on some of the cheaper media tends to not be as resilient compared to archival quality media.

But moving on into the present day, personal optical media storage has almost been antiquated by storage devices that allow more for more storage space. Also I can assume it's due to optical being largely eclipsed by digital online distribution for entertainment means. Another practical backup solution is to use one of the many free cloud storage sites that may give you anywhere from one gigabyte to twenty gigabytes or more of storage. Any more and you usually need to pay a typically small monthly fee. If using the cloud as backup and uploading sensitive data, I'd recommend encrypting your data before uploading if your storage provider doesn't already do it. Be sure to use a reputable and solidified company and not one that may possibly go under in a year and take your data with it. But understandably so, uploading many gigabytes of data isn't practical for everyone since you may have, like me, a highly asymmetrical connection with slow upload speed, so you may be better off with a local storage solution.


Another loss vector you should already be aware of is malicious activity. This in itself is a broad vector, Victor (Airplane reference for the cinematically challenged). The main cause here would likely be the scoundrel that we all know and despise, the virus. You can become infected with one by doing anything from browsing the web, opening an email attachment, downloading and executing non-reputable or in less cases, reputable software, or by plugging in a flash drive that may have malicious auto-run software on it. That's a few of the many, but most likely ways, you could become infected. A good antimalware and antivirus program will help here as well as being cautious of your online activities. Windows Defender and Windows Firewall are decent players in the game and they're free if using Windows. Personally I also use Malwarebytes Pro as well as TinyWall, which is a more workable supplement to the built-in Windows Firewall. These are pretty decent solutions for Windows users and if you use GNU-Linux, you're probably already aware of your options , not that you need any of them (pro-GNU-Linux joke if you don't get it, not that it's necessarily true). But with software firewalls, you're better off running only one as both can interfere with each other as occasionally the case with antimalware and antivirus software. A good supplement to a software firewall is a hardware firewall such as the one likely built into your wifi router or modem. Beyond having a firewall configuration, by using a NAT (Network Address Translation) enabled device, it can help minimize the attack vectors of physically or remotedly attached devices by not exposing them directly to the internet. These are all good precautions to take for keeping your immediate OS data preserved but for anything archival worthy, your data shouldn't be directly attached to your OS except in the cases you're backing up your data.

Other malicious activity that can cause worry of data integrity are rootkits. These embed deep within the OS processes usually masking themselves as valid ones or perhaps not not even making itself known in any way with only a cursory inspection of the OS integrity. They can also reside deeper in hardware such as your storage drive's MBR (Master Boot Record), even more rarely in its firmware, your PC's bios, or in other embedded chips where information can be attained. For the latter, I've only heard of it as more proof of concept and very unlikely to be seen in the wild as well as usually being less practical to implement. If you suspect you may have a rootkit or even if you don't, it's sometimes good to run one or more of the many free rootkit removers out there. But before I leave this topic, you may be interested to know that Sony, yes, that's the one, was responsible for indirectly installing rootkits on user PCs in 2005 to 2007 as a means of DRM (Digital Rights Management). Since then I'm hoping most companies have wised up in realizing your information is like your house, it shouldn't be intruded upon unless permission is granted.

Last of all, let's not forget user incidents or more appropriately, accidents. Just as with network security, you may be one of the biggest threats to yourself and not realize it. If you find you've accidentally deleted data of value, you may be in luck. There's many free options out there to undelete and hopefully recover your data. For a recommendation, techradar has a good recent review. For the best experience with recovery software, assuming you've already emptied your recycle bin, as long as you haven't written much data to the source drive you'll likely stand a good chance of recovering your data. This issue, as well as most issues previously mentioned shouldn't be issues if you have current data backups. Just as with most other things the saying rings true, don't put all your eggs in one basket. If you can, store your data in more than one location and if you can afford it, a small fireproof, waterproof safe may be more appropriate.

What I discussed is just the basics of data protection and backup. Like with most subjects in technology and elsewhere, things can get pretty technical if need be. Luckily my needs in this topic haven't required me to have to get very technical. I'm not too proud to admit but I consider myself somewhat of a digital packrat. If I come across an article that I find interesting, I feel the sinister soul of entropy trying to corrupt the information someone may have put a lot of work into acquiring and if relevant information isn't saved appropriately, I feel it as a loss to humanity if we have no means to remember our past triumphs and failures. Though I realize information overload can be counter-intuitive to a healthy future and that a lot of the information we process and create online may be redundant. But having a packrat mentality, sometimes it's not always done with reason. Sometimes for perhaps selfish, nostalgic purposes. Ever watch the show Hoarders? The guy that collects newspaper articles and books, that's essentially my mentality in digital form. The hoarders that who leave food out and have roaches in their house, I'm pretty well disgusted by their houses as most people but as long as it's a more biologically healthy clutter, I can relate pretty well to that mentality.

That said, for the sake of the knowledge of future generations, copy that floppy! (I jest, somewhat.)




This article brought to you by a lot of anxiety mixed with a pinch of coffee. I need to quit. *sigh* (Update, I did quit for more than two months but the anxiety and chest pains, if anything, got worse but I realize I'm fortunate compared to some in that coffee is my only vice.)

Comments

Popular Posts