A blog post by our Technical consultant Andries Faassen
For a longtime backup has meant tape. With tape we used a backup schedule and versioning/retention scheme often called "Grandfather, father, son" it was a retention scheme based on keeping copies of your backup with ever growing intervals. This was done to reduce tape usage and the directly linked costs associated with it, while still providing a reasonable restore capability.
What is a backup if you ask me? In pure principal a backup is for meant for disaster recovery, which means, to provide a point-in-time copy of all your data to be used in the event of a disaster and your company’s data suddenly disappears overnight. With ever growing threats such as crypto lockers, data corruption and other threats, having a copy of your data in a separated safe "vault" has become more important than ever.
The same backup system traditionally also gets “abused” as a data archive but in principal this is not what a disaster recovery backup is intended for. Recently we’ve started to see a change in that trend though, the practice of classifying data and archiving it using software tools has started to gain more and more traction over the last decade. Data is classified better, and stored at more appropriate locations.
Getting back to backups, the available techniques for making backups have changed, especially in the last 5 years. The main backup target has changed from directly to tape, to staging it using a disk stage or VTL to now using Deduplication pools which can efficiently hold large amounts of backup copies without needing a 1:1 extra amount of space (or tapes) as was needed before. It has also opened the door for replicating data off-site (to a second location or the cloud) in an efficient manner.
Because of that I invite everyone reading this blog post who are already using the newer techniques such as deduplication to not only look at how they backup or when (the schedule) but also take a critical look at their retention policy for backups.
Is the “traditional” Grandfather-father-son retention scheme still the best solution for your data?
All though the “traditional” schema still works for modern storage devices such as deduplication the corporate landscape in regards to data retention and the time you need (or are allowed) to keep your data is rapidly changing.
The “traditional” schema does not allow for incremental changes necessary in the current corporate landscape. Either corporate or government policies now often dictate how long you are mandated to keep certain data but maybe even more important after how long certain data has to be removed. An example, starting 2018 you will need to honor requests of deleting data you keep in relation to person’s entity.
As mentioned above, your backup should not be used as an archive. Archiving software is much better at performing this task. But what happens when you are using archiving software, and it cleanly deletes the data in its archives as mandated but you have backup copies that are kept for a year or sometimes even indefinitely?
Your company will still be out of compliance, even while using archiving software.
A different schema for keeping backup data
Because of this and new forms of backup storage, we see more and more companies switching to a retention schema which does not keep backup data for very long periods of times. They rely on their archiving tools to keep their data and rely on the backup for disaster recovery. Backups older than a few weeks hold no value, since each full backup copy already holds all of the data (including the archives) each time the backup runs.
That way you could think of a backup retention schema which does not keep backup data for years but more efficiently uses the valuable deduplicated storage space to keep recent copies of backup data. Some companies determine that they want to be able to restore the last 6 months but others go as far as saying that being able to go back 2 months is more than enough. In the event of a disaster you are more likely to restore the copy from last week, not one from 8 months ago.
This blog post is meant as a bit of food for thought. How does your current retention policy look? Are you keeping a lot of very old backup copies, using valuable deduplication or cloud storage space? Are you currently archiving old data and thus categorizing it to intelligently handle it instead of just storing everything?
In the current age of ever rapidly expanding data, common practices of yesteryears might warrant a re-think and I believe your retention policy might be one of those.