10OPERATOR’S DIARY
A backup you have never restored is just a hope with a file size.
Everyone has backups. I have lost count of the organisations that told me, with complete confidence, that they were covered, right up until the moment they needed to restore and found the backup incomplete, corrupt or quietly failing for months. A backup is not disaster recovery. Disaster recovery is the proven ability to get the business running again, where proven is the word doing all the work.
The uncomfortable thing about disaster recovery is that you only find out whether you had it on the day you needed it, which is the worst possible day to be finding out. Everything before that is an assumption and assumptions are exactly what the bad day exists to test.
01RPO and RTO, the two numbers the business owns.
Disaster recovery starts with two questions and neither of them is technical. The first is how much data you can afford to lose, measured as time. If everything fell over right now, could you survive losing the last hour of work, the last five minutes or none of it at all? That tolerance is your recovery point objective. The second is how long you can afford to be down before the damage turns serious: an hour, a day, three days. That is your recovery time objective. Everything else in recovery is engineering in service of those two numbers.
The mistake I see most is treating them as figures for the technical team to invent. They are business decisions. Only the business can say what an hour of lost orders or a day of downtime actually costs and those numbers are what should drive every technical choice that follows. Set them honestly and the rest of disaster recovery becomes a solvable engineering problem. Skip them and you are buying protection against risks nobody ever priced, which is how organisations end up both over-spending and under-protected at the same time.
02The backup that does not restore.
A backup nobody has ever restored is not a safety net. It is a belief. The failure modes are mundane and common: the job that silently stopped working months ago, the snapshot that completes every night but cannot actually be restored, the database backup missing the one table that mattered. None of these announce themselves. They wait, politely, for the worst day to introduce themselves to you.
Ransomware has made this sharper, because modern attacks go looking for your backups first. An attacker who encrypts your production and your backups together has taken away the one option that would have let you refuse to pay. This is why the old discipline still holds. Keep copies in more than one place, keep at least one of them somewhere the production environment cannot reach and corrupt, then above all test the restore. A restore you have actually performed, end to end and on a realistic schedule, is the only backup you are entitled to trust. The rest is filenames.
03Recovery is a drill, not a binder.
The plan that sits in a document is worth very little on the day. What matters is whether the people and the systems have rehearsed the recovery itself: failed over to the standby, restored the database from cold, brought the dependencies back in the right order and discovered, in a drill rather than a crisis, that one service quietly depended on another nobody had mapped. Recovery has an order and a thousand small dependencies and the only honest way to learn them is to run it.
The teams that recover fast are not the ones with the thickest binder. They are the ones for whom the recovery has become a practised routine, almost dull through repetition, because the first time you discover your runbook is wrong should never be while the business is down and the clock is running. A drill turns a plan you hope works into a procedure you know works and that is the entire difference between the two on the day it counts.
The first time you learn your runbook is wrong should be in a drill, never in a disaster.
04Why recovery lets you run boldly.
Regulation has caught up with what good operators always knew. NIS2 expects essential and important entities to treat business continuity and backup management as a baseline duty,¹ and for financial entities DORA now makes operational resilience, including tested recovery, an explicit and audited requirement.² Both are asking for the same thing: not a promise to recover, but evidence that you have done it and can do it again.
And this is why I find it the opposite of a gloomy subject. Tested recovery is what lets you build without flinching. When you know, because you have proven it rather than assumed it, that you can be back on your feet inside your recovery time objective having lost no more than your recovery point objective of data, the prospect of running ambitious and critical systems stops being frightening. You have already met the worst day in a rehearsal and walked out of it intact. The confidence to push forward comes precisely from knowing, in detail, how you would come back.
QUESTIONS ON THIS PIECE
What readers tend to ask.
01What is the difference between RPO and RTO?
Recovery point objective (RPO) is how much data you can afford to lose, measured as time: if everything failed now, could you live with losing the last hour of work, the last five minutes or none of it? Recovery time objective (RTO) is how long you can afford to be down before the damage is serious. Both are business decisions, not technical ones and they should drive every recovery choice that follows.
02How often should you test a disaster recovery plan?
Often enough that the team has muscle memory and recently enough that the test reflects the current systems. For critical services that usually means a real restore and failover exercise at least once or twice a year, plus a check whenever the architecture changes meaningfully. A plan tested once at launch and never again is a document, not a capability.
03Why do backups fail when you need them?
Usually because nobody ever restored from them. Backup jobs stop silently, snapshots complete but cannot be restored or one critical table is missing and no one notices until the worst day. Ransomware makes it sharper by targeting backups directly. The only backup you can trust is one you have actually restored, end to end, on a realistic schedule.
04Does NIS2 require a disaster recovery plan?
NIS2 requires business continuity, including backup management and crisis management, as part of its baseline risk-management measures, so a tested recovery capability is effectively expected. For financial entities, DORA goes further and makes tested operational resilience an explicit, audited obligation. Both ask the same thing: not a promise to recover, but evidence that you can.
SOURCES
- Business continuity, including backup management and crisis management, is part of the baseline cybersecurity risk-management measures. NIS2 Directive (EU) 2022/2555, Article 21: EUR-Lex. ↩
- DORA establishes digital operational resilience requirements, including tested recovery, for financial entities (in force January 2025). Regulation (EU) 2022/2554: EUR-Lex. ↩