Imageplus

06OPERATOR’S DIARY

Your data has debts that come due the moment you ask a question.

Most organisations sitting on a disappointing AI or analytics result do not have a data shortage. They have the opposite. The data is there in abundance and the answer still will not come out cleanly. What stands between the two is something I have learned to call data debt, because it behaves exactly like the financial kind: invisible while the work gets done, expensive the moment the bill is presented.

A close view of an ATM keypad and card reader, where every transaction leaves a clean record
FIG · 00Every transaction leaves a clean record. Data debt is what builds up in the gaps between one system’s records and another’s.

Every time data gets captured in a hurry, named inconsistently, copied into a second system or left undocumented, the organisation quietly takes out a small loan. The work still gets done, so the loan never shows up anywhere. Then someone asks a question that crosses all of it at once and the debt comes due with interest.

01What data debt really is.

It helps to borrow the software idea first. Technical debt is Ward Cunningham’s metaphor for the future cost of a shortcut taken today: shipping quick and imperfect to move now, knowing you will pay to fix it later.¹ Data debt is the same bargain applied to the data itself rather than the code around it. A field that means slightly different things in two systems, a customer who exists three times under three spellings, a date stored as free text, a spreadsheet that quietly became the real source of truth: each was a sensible shortcut at the time and each is a balance you will eventually have to service.

The reason it earns its own name is that data debt is far harder to see than the code kind. A messy codebase announces itself daily to the people who work in it. Messy data sits quietly in tables that look perfectly fine row by row and only reveals itself the moment you try to add them up across a whole business.

02How the debt piles up unseen.

It accumulates wherever data is captured by people who are busy doing something else. A sales team types a company name a little differently each time, so one customer fragments into five across the database. A process that was never given a proper system migrates into a shared spreadsheet, which works beautifully for the team that built it and is opaque to everyone else. Two departments each keep their own version of the truth, both correct in their own terms, quietly disagreeing with each other.

I have spent a great deal of discovery work surfacing precisely this. You go looking for the data behind a process and you find the real logic living in one analyst’s workbook or three systems that each insist they hold the authoritative customer record. None of it was negligence. It was a hundred reasonable decisions, each made under time pressure, each adding a little to the balance. The bill had simply not been presented yet, because nobody had asked the question that would present it.

03Why AI makes the bill bigger.

This is the part that has changed and it is why the subject suddenly matters. For years an organisation could carry a great deal of data debt comfortably, because the questions it asked were narrow and a human quietly smoothed over the gaps in every report. Analytics and AI ask the broad questions, across every system at once, at a scale where nobody is smoothing anything. That is precisely when the debt is called in.

A model does not know that two customer records are the same person or that this field means net while that one means gross. It takes your contradictions as facts and returns a confident answer built on them. That is the honest meaning of a line we use a lot: the data is there, the answer is not. The data exists; the answer is being held hostage by the debt sitting between the data and the question. You cannot prompt your way past records that disagree with each other and you cannot build automation anyone will trust on a foundation no one has reconciled.

AI does not forgive data debt. It scales it confidently and hands you the result as an answer.

04Paying it down without a moratorium.

The instinct, once you can see the debt, is to call a halt and clean everything first. That is almost always the wrong move and it is why so many data-quality initiatives quietly die: they get framed as a vast project with no clear end and no visible return, so they lose to everything more urgent. Debt is not cleared by stopping the business to fix it all at once. It is serviced.

The discipline that works is the one any treasurer would recognise: pay the highest-interest balance first. Find the handful of data problems standing between you and the questions genuinely worth answering, settle those and leave the rest documented and known. A duplicated customer record blocking your most valuable analysis is worth paying off this week. A messy field nobody ever queries can wait, as long as you know it is there. The goal is not pristine data, which no real organisation has. It is data you can trust exactly where trusting it matters, with plain honesty about where you cannot yet.

That is the possibility hiding inside an unglamorous word. An organisation that services its data debt deliberately, a little at a time and worst first, gets to keep saying yes to questions its competitors cannot answer at all, because their data still owes too much to be asked. The answer was always in there. Paying down the debt is simply how you finally get it out.

QUESTIONS ON THIS PIECE

What readers tend to ask.

01What is data debt?

Data debt is the accumulated future cost of shortcuts in how data was captured, named, copied and documented. Like financial debt, each shortcut is a small loan taken to move faster now and the balance comes due later when you try to use the data for analytics or AI. The work still got done, which is why the debt stays invisible until a broad question forces every record to line up at once.

02How is data debt different from technical debt?

Technical debt is about the code; data debt is about the data the code runs on. They are the same bargain, move fast now and pay to fix later, applied at different layers. Data debt is harder to spot, because messy code annoys the people working in it every day while messy data sits quietly in tables that look fine row by row and only betray themselves when you add them up.

03Why does data debt block AI projects?

Because a model treats your data as fact. If the same customer exists three times or a field means different things in two systems, the model builds a confident answer on those contradictions rather than flagging them. AI asks broad questions across every system at scale, with no human smoothing the gaps, so it surfaces and amplifies debt that narrow human-run reporting used to hide. You cannot prompt your way past records that disagree.

04How do you reduce data debt?

Not with a big-bang cleanup, which tends to stall. Service it like financial debt and pay the highest-interest balance first. Identify the few data problems standing between you and the questions actually worth answering, fix those and leave the rest documented and known. The aim is not pristine data but data you can trust where trusting it matters, with honesty about where you cannot yet.

WRITTEN BY
Philippe Kaivers

Founding partner, Imageplus. Advisory across AI strategy, governance and data, grounded in twenty years of systems in regulated production. Reads a database the way a CFO reads a balance sheet.

ABOUT THIS INSIGHT
Pillar
Operator’s diary
Published
1 June 2026
Read time
7 minutes · 1,180 words
SOURCES
  1. Technical debt as the future cost of a deliberate shortcut, the metaphor coined by Ward Cunningham: Martin Fowler, Technical Debt. Data debt applies the same idea to the data rather than the code.

CONTACT

Start a conversation.

Tell us what you want to change. We respond within two working days.