A.I.N.S.T.E.I.N

A note to my subscribers

A.I.N.S.T.E.I.N. — Tue, 07 Apr 2026 14:11:20 GMT

When I started A.I.N.S.T.E.I.N., I did not know who would show up. Some of you paid. Some of you subscribed quietly. All of you mattered more than you know.

I am making a simple change today. Everything on A.I.N.S.T.E.I.N. is now free, and everyone who ever paid has free access for life. No action needed on your part.

I am also stepping back from the regular cadence for now. When I return, it will be with more clarity, more depth, and more of what actually matters to you as a practitioner navigating AI in the real world.

Thank you for being here early. That means everything.

See you soon.

Suneeta

Full Code, Low Code, No Code: The AI Trust Gap Nobody Is Talking About

A.I.N.S.T.E.I.N. — Mon, 30 Mar 2026 14:03:42 GMT

Image created by AI

Last week I wrote about a New York bill that would restrict AI systems from providing professional advice in licensed fields. A few readers asked a sharp follow-up question: does the bill apply differently depending on how the AI system was built?

The answer might surprise you, and I wil…

New York Wants to Silence Your AI Chatbot. Here Is What That Actually Means.

A.I.N.S.T.E.I.N. — Tue, 24 Mar 2026 14:03:19 GMT

Image created by AI

Yesterday I wrote about the shift from measuring organizations to measuring AI systems. Today, the New York State Legislature is proving why that shift is urgent.

A bill introduced by Senator Kristen Gonzalez would restrict AI systems from providing what lawmakers call “substantive responses” in fields that require professional licenses. Medicine, law, engineering, psychology, dentistry, nursing, and other regulated professions where incorrect guidance can cause serious harm.

Read that again carefully. The bill does not say “companies must have policies about what their AI says.” It says AI systems must not provide certain types of responses. The subject of the regulation is the machine, not the organization.

This is the shift happening in real time.

What the bill actually does

The proposal draws a line between general information and professional advice. An AI chatbot can share educational content about, say, symptoms of a condition or how a legal process generally works. What it cannot do is cross into substantive guidance that resembles what a licensed professional would provide. It cannot offer what looks like a medical diagnosis, a legal strategy, an engineering recommendation, or a psychological assessment.

The bill also includes a private right of action. That means individuals can sue companies if their AI systems provide restricted guidance. This is not a regulatory slap on the wrist. This is litigation exposure for every company deploying a customer-facing AI system in a licensed domain.

Why this matters beyond New York

If you are thinking “I do not operate in New York, this does not apply to me,” think again.

New York tends to set the template. When New York moved on financial regulation, the rest of the country followed. The same pattern is already forming with AI. Colorado’s AI Act takes effect in 2026. The EU AI Act becomes fully enforceable in August 2026. The NAIC Model Bulletin on AI in insurance has been adopted by 24 states. NYC Local Law 144 already requires bias audits for automated hiring tools.

The direction is clear: regulators are moving from governing organizations that use AI to governing what AI systems actually do. And they are doing it jurisdiction by jurisdiction, which means any company deploying AI across state lines will soon face a patchwork of requirements that all ask the same fundamental question: does your AI system stay within its authorized boundaries?

The measurement problem this creates

Here is where the data scientist in me gets interested.

“Substantive response” is a fuzzy concept. Where exactly does educational information end and professional advice begin? When does a health chatbot cross from sharing general wellness content into offering what could be interpreted as a diagnosis? When does a legal information tool cross from explaining a process into recommending a strategy?

These are not binary questions. They are spectrum questions. And spectrum questions require quantitative measurement, not policy checklists.

Think about what an organization would need to demonstrate under this bill. Not that they have a policy saying “our AI does not give medical advice.” They would need to demonstrate that their AI system actually stays within bounds, consistently, across thousands of interactions, including edge cases where users push the boundaries with creative phrasing.

That is a behavioral measurement problem. You cannot solve it by reading the organization’s policy documents. You solve it by observing what the AI system actually says when real people interact with it. You measure boundary adherence: how often does the system recognize when it is approaching a restricted domain, and how reliably does it pull back?

This is exactly the kind of observable, quantifiable AI system property that I described yesterday. The policy says the system will not give medical advice. The behavior shows whether it actually does or does not. The gap between those two is where the litigation risk lives.

What this means for different types of AI deployments

The bill applies regardless of how the AI system was built, but the risk profile varies significantly.

Organizations that build their own AI from the ground up have complete control over system prompts, guardrails, and response boundaries. They can engineer precise limits. But they also own 100% of the liability.

Organizations using low-code platforms like Copilot Studio or LangFlow face a shared responsibility problem. The platform provides underlying model behavior and some guardrails, but the builder configures the use case and the domain scope. When the system drifts into professional advice territory, who is liable? The platform or the builder?

And then there are the no-code deployments, the custom GPTs, the drag-and-drop chatbot builders. This is the highest risk category, and it is not close. The people building on these platforms are often the exact professionals the bill is trying to protect: small healthcare clinics, law offices, dental practices. They deploy an AI chatbot on their website, feed it their documents, and assume the platform handles compliance. It usually does not.

The gap between how easy it is to deploy AI and how hard it is to govern what it says is widest in the no-code tier. And that gap is exactly where this bill’s private right of action will land hardest.

The deeper signal

Step back from the specifics of this one bill and look at what it represents.

For decades, professional licensing has been a human-to-human regulatory framework. A doctor is licensed. A lawyer passes the bar. An engineer gets certified. The license attaches to the person, and the person is accountable for what they say.

AI breaks that model. The chatbot giving health guidance is not a licensed professional. It is not a person. It cannot be sued, sanctioned, or stripped of credentials. So the regulatory framework has to evolve. It has to attach accountability to the system’s behavior and to the entity that deployed it.

This bill is one of the first attempts to do that explicitly. It will not be the last. And every attempt will come back to the same core question: can you prove, with data, that your AI system behaves within its authorized boundaries?

That is not a policy question. That is a measurement question. And it demands the kind of quantitative, reproducible, behavior-based measurement that this newsletter exists to explore.

More next Tuesday.

This is part of the “Before The Number” series at A.I.N.S.T.E.I.N., exploring what it takes to build quantitative AI governance measurement from first principles. If this resonated, share it with someone deploying AI in healthcare, legal, or any licensed profession.

We Were Measuring the Wrong Thing

A.I.N.S.T.E.I.N. — Mon, 23 Mar 2026 14:02:55 GMT

Image created by AI

I owe you an explanation for the silence.

Two weeks ago, I published “How to Measure AI Governance” and laid out the five pillars, the metrics, the frameworks, the KPIs. I meant every word of it. And then I went quiet, because something broke in my own thinking that I could not write aro…

How to Measure AI Governance

A.I.N.S.T.E.I.N. — Mon, 09 Mar 2026 14:02:07 GMT

Image by AI

If you cannot measure it, you cannot govern it.

That principle holds true across every regulated industry, from finance to healthcare to cybersecurity, and it holds true for AI.

Yet when most organizations talk about AI governance today, they are talking about policies, principles, and frameworks. They are talking about what they believe, not what they can prove. And there is a meaningful difference between having an AI ethics policy and being able to demonstrate, with data, that your AI systems are actually governed.

This essay is a practical guide to bridging that gap. It walks through the core pillars of AI governance measurement, the specific metrics that matter, the frameworks available to structure the work, and the challenges that make this harder than it sounds. If you are a CISO, a Chief AI Officer, a compliance leader, or a founder building in this space, this is the foundation you need.

Why Measurement Matters

Without measurement, AI governance is a set of intentions. It lives in documents that get written once and reviewed quarterly at best. It gives leadership a sense of comfort without giving them a basis for action.

Measurement changes that in four concrete ways.

It demonstrates due diligence. When regulators, boards, or the public ask how your AI is managed, measurement gives you evidence rather than assurances.
It allows you to identify and mitigate risks before they cause harm, because metrics like model drift detection time and fairness deviation surface problems that narrative assessments miss entirely.
It prepares you for the regulatory compliance landscape that is already here, with the EU AI Act requiring specific documentation and measurement for high-risk AI systems.
And it builds trust with every stakeholder who needs to know that your AI decisions are fair, transparent, and accountable.

The Five Pillars of AI Governance Measurement

Effective AI governance measurement is not about tracking model accuracy or inference speed. Those are performance metrics. Governance measurement focuses on accountability, fairness, transparency, compliance, and safety. These are the five pillars, and each one requires its own set of metrics.

Accountability and Ownership

This pillar measures who is responsible for your AI systems and their outcomes. It sounds basic, but in my experience, a surprising number of organizations deploy AI systems where no single person owns the governance risk. The model was built by one team, deployed by another, and monitored by no one in particular.

The qualitative goal is straightforward: every high-risk AI system should have a named business owner who is accountable for its impact. The quantitative metric that tracks this is the percentage of deployed AI systems with a defined, documented business owner. If that number is below 100% for your high-risk systems, you have a governance gap that no policy document can close.

Transparency and Explainability

This pillar measures how well your AI system’s decisions can be understood by the humans affected by them. A lending model that denies an application needs to be able to explain why. A hiring algorithm that filters out candidates needs to produce a reason that a human can evaluate.

The quantitative metric here is the percentage of AI-driven decisions that include a human-interpretable explanation. In practice, this is one of the hardest metrics to improve because many complex models, particularly large language models, are inherently difficult to explain. But the measurement itself forces the conversation about where explainability gaps exist and how material those gaps are.

Fairness and Bias Mitigation

This pillar measures the extent to which your AI systems treat different demographic groups equitably. It is not enough to say “we care about fairness.” You need to measure the actual disparity in outcomes across protected groups and track that disparity over time.

The core metric is the measurable difference in approval rates, error rates, or outcomes between demographic groups. If your lending model approves 78% of applications from one group and 61% from another, that disparity is your fairness metric, and it needs to be monitored continuously, not just checked once before deployment.

Risk and Compliance

This pillar measures adherence to both internal policies and external regulations. With the EU AI Act, NIST AI RMF, and ISO 42001 all converging on requirements for risk classification and documentation, this pillar is becoming the most operationally urgent.

The key metrics include the percentage of high-risk AI systems that have completed an Algorithmic Impact Assessment, the percentage of inventoried systems that have undergone formalized risk classification, and the policy adherence rate across all AI projects. These numbers tell you whether your governance framework is actually being followed or whether it exists only on paper.

Safety and Security

This pillar measures your AI system’s resilience against attacks, errors, and unintended harm. It includes incident response readiness and the speed at which AI-specific failures are detected and resolved.

The metrics that matter here are the average time to detect and time to resolve AI-related incidents, including model drift, toxic output, adversarial attacks, and data pipeline failures. If your organization cannot tell you how long it takes to detect when a model has drifted from its intended behavior, your safety posture has a blind spot.

Key Performance Indicators for AI Governance

Beyond the five pillars, there are specific KPIs that give leadership a clear picture of governance health across the organization.

Program health metrics include AI inventory coverage (the percentage of all AI systems currently cataloged), risk classification completion (the percentage of inventoried systems that have been formally classified by risk level), and policy adherence rate (the percentage of AI projects fully compliant with established guidelines).

Decision and accountability metrics include decision latency for risk issues (how long it takes to make a material decision on an escalated AI risk), human override rate (how frequently automated decisions are reversed by human reviewers), and governance debt (the number of deferred governance controls that were postponed to speed up deployment).

Operational integrity metrics include model drift detection time, data lineage visibility (the percentage of models with full source-to-sink tracking), and audit readiness score (the percentage of models with current documentation and version control).

Ethical impact metrics include explanation coverage and fairness deviation, both of which I discussed in the pillars section above.

The important thing about these KPIs is that they are specific, measurable, and tied to real governance risk. They are not opinions. They are not traffic lights. They are numbers that a board can track quarter over quarter and that an auditor can verify independently.

The Frameworks That Structure This Work

Organizations do not need to build their measurement approach from scratch. Several established frameworks provide the structure.

The NIST AI Risk Management Framework provides guidelines for managing risks to improve the trustworthiness of AI systems. NIST has also recently released a preliminary draft Cyber AI Profile (NISTIR 8596) that maps AI considerations directly onto the Cybersecurity Framework 2.0, embedding AI governance into operational security infrastructure rather than treating it as a separate discipline.

ISO/IEC 42001 is an international standard specifying requirements for establishing, implementing, maintaining, and continually improving an AI management system. As an ISO 42001 Lead Auditor, I work with this framework regularly, and its strength is that it provides a certifiable standard that organizations can be audited against.

The EU AI Act is the most comprehensive regulatory framework currently in effect, requiring specific measurement and documentation for high-risk AI systems. It is not optional for organizations operating in or selling into the European market, and its requirements are driving measurement adoption globally.

These frameworks tell you what to measure and why. The challenge is translating their requirements into the specific quantitative metrics I described above, and doing so continuously rather than at a single point in time.

The Challenges That Make This Hard

If measuring AI governance were easy, every organization would already be doing it. Several factors make it genuinely difficult.

Concepts like fairness and transparency are contextually dependent. What counts as fair in a lending model may differ from what counts as fair in a hiring algorithm. There is no single universal formula, and measurement requires thoughtful interpretation alongside the numbers.

Many complex AI models, particularly large language models, are inherently difficult to explain. This makes transparency measurement challenging not because the metric is wrong but because the underlying system resists the measurement.

Standardization is still evolving. While frameworks exist, universally accepted methods for calculating specific metrics like bias are not yet settled. Different tools and approaches can produce different results for the same system.

Organizations have historically incentivized performance over responsibility. Accuracy and speed get rewarded. Governance measurement introduces a different set of priorities, and that cultural shift is often harder than the technical implementation.

And finally, data quality and lineage remain fundamental obstacles. You cannot measure governance properly if you do not understand the data your AI systems are trained on, and many organizations have complex or undocumented data flows that make this difficult.

Where This Is Heading

Every one of these challenges is real, and none of them are reasons to avoid measurement. They are reasons to invest in building the measurement infrastructure now, before regulators require it and before the gap between what your organization claims about its AI governance and what it can actually prove becomes a liability.

The organizations that solve the measurement problem first will not just be compliant. They will set the standard that others measure against. They will have the data to report to boards, the benchmarks to negotiate with partners, and the scores to prove what checklists never could.

AI governance measurement is not a nice-to-have. It is the infrastructure that makes governance real.

Understanding AI Governance Measurement

A.I.N.S.T.E.I.N. — Mon, 02 Mar 2026 20:11:56 GMT

Image created by AI

Every critical system in human history eventually got measured. Before FICO, loan officers decided your creditworthiness with a handshake and a gut feeling. Same income, same history, approved at one branch and denied at another. Before s…

Everyone’s Writing the Rulebook. Nobody’s Building the Instrument.

A.I.N.S.T.E.I.N. — Tue, 24 Feb 2026 20:05:58 GMT

Image Created by AI

If you are in the governance space, the first three weeks of February 2026 arrived as a cascade, one event after another, each reinforcing the same message: the world is taking AI governance seriously. UC Berkeley’s Center for Long-Term C…

Your AI Agent Has No ID

A.I.N.S.T.E.I.N. — Mon, 16 Feb 2026 23:33:18 GMT

Image created using AI

This week, I was invited through one of the world’s largest expert networks to consult on a topic that stopped me in my tracks: the challenges and solutions for securely deploying autonomous AI agents in business environments, with a particular focus on something called “verifiable credentials” for AI agents and “trusted AI through intent binding.”

I have been building AI governance infrastructure for over two years now, and I have spent 25 years before that deploying AI and data systems across healthcare, fintech, insurance, legal technology, and education. But this invitation was different. It was not about compliance checklists or policy frameworks. It was about a question that the enterprise world is only now beginning to ask out loud: How do you prove that an AI agent acting on your behalf is actually authorized to do what it is doing?

Let that sink in for a moment.

The Problem Nobody Talks About

Right now, across every industry you can name, companies are racing to deploy AI agents that can act autonomously. These agents book meetings, process claims, triage legal inquiries, approve transactions, generate reports, and make decisions that used to require a human being in the loop. The promise is enormous: speed, scale, consistency, cost reduction. And the technology has reached a point where these agents can genuinely perform.

But here is what almost nobody is asking: What credentials does that AI agent carry?

When a human employee joins an organization, they go through background checks, they receive role-based access, they sign agreements about what they can and cannot do, and there is a paper trail connecting their identity to their authority. If that employee oversteps their boundaries, there is an audit trail. There is accountability. There is a verifiable chain that connects what they did to what they were authorized to do.

Now think about an AI agent. It gets deployed with an API key, maybe some prompt instructions, maybe a set of tool permissions configured by a developer who was moving fast to hit a sprint deadline. Where is the verifiable proof of what that agent is authorized to do? Where is the audit trail that connects its actions to a specific human decision about its scope? Where is the credential that another system, a partner, a regulator, a customer, can independently verify without simply trusting the agent’s own assertions?

It does not exist. Not in any standardized, quantitative, independently verifiable way.

Your AI agent has no ID.

Why Checklists Fail for Autonomous Systems

The traditional approach to AI governance has been borrowed from software compliance: create a checklist, assess against it annually or quarterly, produce a report, file it away. This works reasonably well for static systems where humans make the final decisions and the AI is just providing recommendations. You can audit the model, check the training data, review the outputs, and sign off.

But autonomous AI agents break this model completely.

An autonomous agent does not wait for a human to review its output before acting. It chains decisions together. It interprets ambiguous inputs in real time. It interacts with other systems, sometimes other agents, and the scope of its actions can shift based on context in ways that no static checklist can anticipate. A checklist that was accurate on Monday might be meaningless by Wednesday because the agent encountered a scenario that nobody tested for, and it made a judgment call.

This is not a theoretical concern. I have seen it firsthand. When I deployed an autonomous AI voice agent for a law firm handling workers’ compensation cases, the most dangerous moments were not when the agent got something wrong in a predictable way. They were when callers deviated from expected conversational paths and the agent had to decide, in real time, whether it was authorized to handle the new direction or whether it needed to escalate. A checklist cannot govern that decision. A quantitative, continuously updated trust boundary can.

The difference matters. A checklist says “this system was compliant when we last checked.” A trust score says “this system is operating within its verified boundaries right now, and here is the quantitative evidence.”

What Verifiable Credentials for AI Actually Means

When the consultation framed the topic around “verifiable credentials for AI agent deployment,” it pointed at something that I believe will become one of the defining infrastructure layers of the next decade.

Verifiable credentials for AI agents means that an agent carries cryptographically provable attestations about what it is authorized to do, who authorized it, what compliance standards it has been assessed against, and what boundaries it operates within. Any party interacting with that agent, whether it is another system, a business partner, a regulator, or a customer, can independently verify those claims without having to trust the agent itself.

Think of it like a digital license. Not a static certificate that was issued once and sits in a drawer, but a living, scored, continuously updated credential that reflects the agent’s current risk posture and authorization scope. When a partner organization’s system interacts with your AI agent, it can check that credential and confirm: yes, this agent has been assessed at a risk score of 247 out of 1000, it is authorized for these specific actions, it has been evaluated against ISO 42001 and the EU AI Act, and its last assessment was 14 minutes ago.

That is fundamentally different from saying “we passed an audit last quarter.”

The Market Is Moving Faster Than You Think

Here is what struck me about this invitation, and about the broader pattern I am seeing across the industry right now. This is not a topic that only governance nerds and compliance officers are thinking about. Investment firms are actively conducting diligence on companies in the AI security and governance tooling space. Major corporations are paying expert network rates to understand how verifiable trust for AI agents works. Professional services firms are benchmarking API governance frameworks in banking. The money is following the question.

And the question is the same everywhere: How do we trust AI that acts on its own?

For those of us who have been working in AI governance, this is the moment where the market catches up to the problem. For the past two years, I have heard enterprise leaders say they are “exploring” AI governance, that they know it matters but they are not ready to invest. That language is shifting. When investment firms start researching the competitive landscape for AI trust infrastructure, it means capital allocation decisions are being made. When enterprise clients specify “verifiable credentials” and “intent binding” as the topics they want to discuss, it means they have moved past awareness and into solution design.

The window between “people are asking the question” and “someone owns the answer” is open right now.

Why Measurement Beats Compliance

This brings me to the core thesis of everything I write about in this newsletter, and everything I am building.

The reason checklists and traditional compliance frameworks fail for autonomous AI is not that they are poorly designed. It is that they are qualitative instruments being applied to a quantitative problem. Asking whether an AI agent “meets” a compliance standard is like asking whether a bridge “meets” safety requirements without measuring the load it can bear. The answer is meaningless without a number.

What the market needs, and what it is beginning to demand, is quantitative measurement infrastructure for AI trust. A scoring system that can express, in a single interpretable number, how much risk an AI agent carries across multiple regulatory frameworks simultaneously. Not a binary pass/fail. Not a subjective assessment. A reproducible, auditable, continuously updated measurement that engineering teams can act on and compliance teams can report on and regulators can verify and business partners can trust.

This is the science of measurement applied to critical AI systems. It is what I call the work that happens “before the number,” the careful, rigorous thinking about what to measure, how to measure it, and why the measurement methodology matters as much as the result.

What Comes Next

If you are deploying AI agents in your organization today, or planning to, here is what I would encourage you to think about.

First, ask yourself whether you can prove, right now, what your AI agent is authorized to do. Not what it was designed to do. Not what the prompt says it should do. Can you prove it, verifiably and quantitatively, to a third party who has no reason to trust your assertions?

Second, ask yourself how you would know if your AI agent exceeded its authorized scope. Not after the fact, when a customer complains or a regulator asks. Right now. In real time. Do you have a continuous measurement of whether the agent is operating within its trust boundaries?

Third, ask yourself whether your current governance approach would survive the question: “Show me the score.” Not the checklist. Not the policy document. The score. The number that tells me, quantitatively, where this agent falls on the risk spectrum and what that number is based on.

If you cannot answer those questions today, you are not alone. Almost nobody can. But the market is telling us, loudly and with real dollars behind it, that the window to build this capability is right now.

That is what I am working on. That is what “Before the Number” is about. And I will have a lot more to say about it in the weeks ahead.

Suneeta Modekurty is the Founder and Chief AI Architect of SANJEEVANI AI, where she builds quantitative AI governance infrastructure. She is an ISO 42001 Lead Auditor and holds an O-1A visa for extraordinary ability in AI, bioinformatics, and data science. She publishes “Before the Number” on Substack, exploring the science of measurement in critical AI systems.

What Happens When There's No Number

A.I.N.S.T.E.I.N. — Tue, 10 Feb 2026 15:03:30 GMT

BEFORE THE NUMBER

Divergence of Qualitative and Quantitative Measurements Over Time

In 1846, a doctor named Ignaz Semmelweis noticed something troubling. In one ward of the Vienna General Hospital, mothers were dying at five times the rate of the ward next door. It was the same hospital, in the same city, during the same year. The only difference was that one ward was staffed by doctors who came directly from performing autopsies, while the other was staffed by midwives who did not.

Semmelweis did not have germ theory. He did not have a microscope powerful enough to see bacteria. But he had something just as important: he had counted. He had a number. And the number made the invisible visible. When he introduced handwashing with chlorinated lime, mortality dropped from 18% to under 2%. The number did not just describe the problem. It made the solution undeniable.

But here is the part of the story people forget. The medical establishment rejected his findings for twenty years. They did not reject the findings because the data was wrong. They rejected the findings because the profession had no culture of measurement. Doctors operated on reputation, seniority, and judgment. Introducing a number threatened the entire social order of medicine because it implied that a senior physician’s intuition could be proven wrong, and worse, that a junior doctor armed with better data could be proven right. The number was not just a scientific tool. It was a direct challenge to the authority structure that governed the profession. And the authority structure pushed back.

Semmelweis died in an asylum in 1865. Germ theory was not widely accepted until the 1880s. For two decades, the refusal to accept a quantitative finding cost thousands of lives. Not because the answer was unknown, but because the system was not structured to receive a quantitative answer.

This essay is about what happens in that gap. It is not about the moment a number arrives, but about the years before it does, when a system that matters is governed by opinion and the cost of that governance failure remains invisible to the very people responsible for it.

Semmelweis’s story is dramatic, but it is not unique. When you study the history of measurement across critical systems in lending, aviation, food safety, environmental regulation, and cybersecurity, the same pattern emerges so consistently that it starts to look less like coincidence and more like a law of institutional physics. Every time a critical system operates without quantitative measurement, four specific costs appear. They appear in the same order, with the same dynamics, regardless of the industry, the era, or the technology involved.

The first cost is structural inconsistency: Before credit scoring existed, two loan officers at the same bank could evaluate the same applicant and reach opposite conclusions. This was not a flaw in the system. This was the system. Without a shared quantitative reference point, every decision was a fresh act of human judgment, shaped by experience, bias, workload, mood, and variables that nobody tracked because nobody could.

Fair Isaac Corporation studied this phenomenon in the 1950s and found staggering variance in loan decisions across branches of the same institution. The variance was not small. Outcomes that should have been statistically identical were diverging by double digits. The same applicant, with the same income and the same repayment history, could be approved at one branch in the morning and denied at another branch in the afternoon. This was not corruption. It was the natural and predictable result of a system that relied entirely on individual judgment with no standardized measurement to anchor it.

The same dynamic shows up everywhere measurement is absent. In food safety before HACCP scoring, the same restaurant could pass a health inspection in one county and fail in the county next door, because inspectors had no shared quantitative standard for what constituted a violation. In education before standardized assessments, the same student could be classified as “gifted” in one school district and “average” in another, because teachers were evaluating against their own internal benchmarks rather than a common metric. In pain management before the numeric pain scale, the same tibial fracture could receive acetaminophen from one nurse and morphine from another, because pain was described in adjectives rather than measured in numbers.

The deeper problem is not that people made different decisions. It is that nobody could demonstrate the inconsistency was even happening, because there was no consistent metric to compare against. When there is no number, inconsistency becomes invisible. It hides inside the phrase “professional judgment,” and it persists precisely because no one can see it.

The second cost follows directly from the first: accountability disappears. You cannot hold someone accountable for violating a standard that does not exist in quantifiable terms. You can write a policy that says “ensure patient safety.” You can create a governance framework that says “implement responsible AI practices.” But until those principles are attached to measurable thresholds, enforcement becomes a matter of interpretation, and interpretation is the enemy of accountability.

The aviation industry learned this through tragedy. Before the adoption of measurable safety metrics such as hours between incidents, defect rates per flight cycle, and standardized checklists with quantified completion rates, airlines assessed their own safety through self-reporting. The standard was “we follow best practices.” After a crash, the investigation would inevitably reveal that “best practices” meant different things to different maintenance crews, different inspectors, different shifts, and different airports. No one had been lying. There was simply nothing specific enough to be accountable to. The standard existed in prose rather than in numbers, and prose can be interpreted generously by anyone who needs it to be.

The same dynamic plagued corporate environmental responsibility for decades. When the standard was qualitative and every company could simply declare that it was “committed to sustainability,” every company in the world was effectively compliant, because commitment is not measurable. It was only when emissions reporting introduced actual numbers in the form of tons of CO₂, parts per million, and year-over-year change that accountability became possible. This shift did not happen because regulators suddenly became tougher. It happened because there was finally something concrete to hold companies accountable against. You can argue indefinitely about whether a company is “committed to sustainability.” You cannot argue with 47,000 metric tons of carbon.

Measurement does not create accountability on its own. But without measurement, accountability is theater. It carries all the language of oversight, including policies, frameworks, committees, and review boards, but none of the teeth. Because teeth require thresholds, and thresholds require numbers.

The third cost is the most economically significant, and it is also the hardest to see, because it manifests as things that never happen. When there is no number, markets do not crash. They simply never form in the first place.

Before credit scores existed, the secondary mortgage market barely existed either. A bank in Ohio could not sell a bundle of loans to an investor in New York, because there was no standardized way to assess the risk of those loans at scale. Every loan had been originated by a local officer, evaluated using local criteria, and documented in local formats. An investor three thousand miles away would have needed to re-underwrite every individual loan in order to assess the bundle, and the cost of that diligence was prohibitive. So the transaction simply did not happen.

The secondary mortgage market did not emerge because someone invented a clever financial instrument. It emerged because someone gave every borrower a number that an investor three thousand miles away could evaluate in seconds. Credit scoring did not just measure risk. It made risk portable. It created a common language that allowed parties who had never met each other to transact with confidence. A market worth trillions of dollars had been locked inside the absence of a three-digit number.

The same pattern explains why cyber insurance took decades to mature. Until organizations had quantifiable security postures in the form of scores, ratings, measurable controls, and auditable configurations, underwriters could not price policies with any actuarial confidence. You cannot build an insurance market around “we think we’re secure.” Insurance requires a number that actuaries can model, that underwriters can compare across applicants, and that reinsurers can aggregate into portfolios. The number does not just describe the risk. It enables the market infrastructure that makes risk transferable.

When there is no number, entire markets remain latent, not because demand is absent, but because there is nothing to transact around. Buyers cannot evaluate, sellers cannot differentiate, insurers cannot price, investors cannot compare, and regulators cannot benchmark. The market sits frozen, waiting for a unit of measurement that all participants agree to trust. And the longer that wait continues, the more value remains locked inside the gap.

The fourth cost is perhaps the most insidious: improvement becomes impossible to prove. Imagine walking into a hospital board meeting and reporting that patient safety improved this quarter. The first question will be: compared to what? By how much? Measured how? Without a quantitative baseline, improvement is a feeling rather than a fact. You can spend millions on better processes, better training, and better technology and still have absolutely no way to demonstrate that any of it worked.

This is not hypothetical. It is the precise reason the quality movement in manufacturing stalled for years until statistical process control gave factories a way to measure variation and prove that their interventions were actually reducing defects. W. Edwards Deming did not just advocate for quality. He advocated for measurement, because he understood from decades of experience that without it, quality was a slogan rather than a discipline. His famous observation that you cannot improve what you cannot measure was not a platitude. It was an empirical conclusion about what happens when organizations try to get better without quantitative feedback loops.

The consequence of unprovable improvement is organizational paralysis. When the finance team asks whether the governance investment is working and the honest answer is “we believe so but cannot demonstrate it,” budgets get questioned. When leadership asks whether the new training program reduced risk and the answer is anecdotal rather than measured, confidence erodes. And gradually, organizations stop investing in getting better, not because they do not want to improve, but because they have learned that improvement without measurement is indistinguishable from stagnation. The return on investment becomes invisible, and so the investment stops.

What is striking about these four costs is not that they exist, but that the people inside the system rarely see them clearly. Inconsistency feels like professional judgment. Missing accountability feels like flexibility. Frozen markets feel like the market simply is not ready yet. Unprovable improvement feels like doing the best you can under difficult circumstances. The absence of a number is comfortable. It protects incumbents. It allows vagueness to masquerade as strategy. It lets everyone believe they are above average, because there is no average to measure against.

This is why measurement is always resisted before it is adopted. Semmelweis was ridiculed. Early credit scoring was called dehumanizing, with critics arguing that a human relationship between banker and borrower could not and should not be reduced to a number. Standardized testing was called reductive. Emissions reporting was called burdensome. Every number that eventually became infrastructure started its life as an inconvenient truth that the existing establishment preferred to ignore.

And yet, in every single case, once the number arrived and proved its value, the world did not go back. Nobody has argued for returning to gut-feel lending decisions after FICO. Nobody has advocated for removing thermometers from hospitals. Nobody has suggested that airlines stop tracking maintenance defect rates. Nobody has proposed that companies stop reporting emissions in measurable units. The resistance dissolves once the number demonstrates what it can do, because the number does not just measure the system. It reorganizes the system. It changes who has authority, what constitutes evidence, how decisions get made, and what accountability looks like. The number becomes the infrastructure that everything else is built upon.

Which brings us to the present. Today, AI systems are approving loans, diagnosing cancers, screening job applicants, scoring insurance claims, generating legal documents, triaging emergency calls, and making consequential decisions that affect the lives of millions of people across every regulated industry on the planet.

Ask how well these systems are governed, and the answer in most organizations is qualitative. It takes the form of a maturity model with descriptive levels, a readiness checklist with binary checkboxes, a consultant’s assessment delivered as a narrative report, or a regulatory framework mapped onto a spreadsheet that was current when it was created and outdated by the time it was presented. The output is language rather than measurement. “We’re at level 3.” “We’ve addressed most of the NIST categories.” “We’re working on it.”

These are not numbers. They are opinions formatted to look like numbers. And the four costs are already visible for anyone willing to look.

The inconsistency is already here. Two auditors assessing the same AI system against the same governance framework reach materially different conclusions about its compliance posture. This is not a failure of the auditors. It is a failure of the measurement approach, or more precisely, the absence of one. When the standard is a checklist of qualitative criteria, every assessment is an interpretation, and interpretations diverge.

The accountability gap is already here. The EU AI Act is in force. ISO 42001 certifications are underway. NIST AI RMF adoption is accelerating. Regulatory enforcement is not theoretical but operational. Yet enforcement against what baseline? When a regulator asks an organization to demonstrate its AI governance posture over time, the evidence that exists consists of a policy document from last year, a completed questionnaire, and a consultant’s letter. None of these constitute the kind of quantitative, auditable, time-series evidence that regulators in every other domain have learned to require.

The frozen markets are already here. AI insurance is nascent. AI procurement due diligence is a custom exercise every time, with no standardized scoring to streamline vendor evaluation. AI risk assessment in mergers and acquisitions relies on qualitative representations that are difficult to verify and impossible to compare across targets. These markets are not early because demand is low. They are early because there is nothing standardized to transact around. The infrastructure of transaction, including pricing, benchmarking, comparison, and aggregation, requires a number that does not yet exist.

And the inability to prove improvement is already here. Organizations are spending real money on AI governance by hiring responsible AI teams, building review processes, investing in training, and purchasing tools. But when the board asks whether the organization is better governed than it was last quarter, the honest answer is that nobody knows, because nobody can measure it. There is a belief that things have improved. There are actions that should have made things better. But there is no number that moved.

Every critical system in human history has eventually been measured. Not because someone decided measurement was philosophically appealing, but because the cost of not measuring became intolerable. The lending industry crossed that threshold in the 1950s. Healthcare crossed it in the 19th century with the advent of lab science and vital signs. Aviation crossed it after enough planes fell from the sky and enough investigations revealed that “best practices” had been an empty phrase. Cybersecurity crossed it when boards stopped accepting narrative descriptions of risk exposure and started demanding numbers.

AI governance has not crossed that threshold yet. But every signal suggests it is approaching. The regulatory pressure is building across multiple jurisdictions simultaneously. The liability exposure is growing as AI systems move deeper into consequential decision-making. The number of AI systems in production is compounding faster than governance practices can keep pace. And the gap between what organizations claim about their governance posture and what they can actually demonstrate with evidence is widening every quarter.

The question is not whether AI governance will get a number. History is unambiguous on this point, because every critical system eventually does. The question is what happens between now and then, how long the gap persists, how much damage accumulates inside it, and who bears the cost of an industry that governed itself by opinion when it could have governed itself by measurement.

History suggests that cost will be larger than anyone currently inside the gap realizes. It always is.

Until next time,

Suneeta Modekurty

Founder & Chief Architect, METRIS™

Before the Number is a publication about the science of measurement in critical systems.

Before the Number

A.I.N.S.T.E.I.N. — Tue, 03 Feb 2026 16:57:51 GMT

Two weeks ago, on January 22, I announced that AI governance now has a score.

Singapore released the world’s first agentic AI governance framework. South Korea’s AI Basic Act became enforceable. And on that same day, we shipped METRIS - a quantitative governance score for AI systems.

One of the responses that stayed with me came from a reader who wrote: “The distinction between frameworks and real, measurable governance is so crucial. Finally, someone speaks about dynamic AI risk.”

An AI/ML engineer called METRIS’s FICO-style approach “a total game changer for clarity” and then asked the right question: how does METRIS handle the shifting weight of different regulations as they evolve in real time?

A data engineer was more direct: “METRIS needs to be deployed everywhere, all at once. Think of the corporate goofs that could have been avoided.”

These reactions tell me the market knows it needs a number. But before I explain what METRIS does or how it works, I want to go deeper. I want to share why it needs to exist.

Because this isn’t a technology story. It’s a measurement story. And it starts with a question I’ve been asking my entire career.

The Question

I have fought ambiguity my entire professional life.

Twenty-five years across healthcare, pharma, finance, and technology. In every role, the hardest problems were never technical. They were the ones where two equally qualified people looked at the same situation and reached opposite conclusions. Where the answer depended on who you asked, what day it was, or which office you walked into.

That kind of ambiguity doesn’t just slow things down. It erodes trust. It makes fair outcomes impossible. And it creates a world where accountability is always someone else’s problem.

So I started asking a question that wouldn’t leave me alone: How did humanity solve ambiguity before?

The answer, every single time, was the same. We invented a number.

Banking Before FICO

Before 1989, getting a loan was a conversation. A loan officer looked at you across a desk. They reviewed your application. They considered your employment, your address, your references. And then they made a judgment call.

The same person, with the same income and the same history, could be approved at one branch and denied at another. The process was subjective, inconsistent, and (let’s be honest), was discriminatory. Billions of dollars were allocated based on gut feeling, personal bias, and the quality of a handshake.

Then Fair, Isaac and Company introduced a three-digit number.

FICO didn’t add technology to lending. It removed ambiguity from lending. One number. Same calculation everywhere. A 720 in New York meant the same thing as a 720 in Nebraska. Suddenly, lending decisions were comparable, auditable, and fairer than the alternative.

Today, nothing moves without a credit score. Mortgages, car loans, insurance premiums, apartment rentals. The number became infrastructure.

Medicine Before Lab Tests

A doctor feels your forehead. “You seem warm.” Another doctor examines you an hour later. “You seem fine.” Same patient. Two opinions. Who’s right?

Before quantitative diagnostics, medical decisions were based on observation, intuition, and experience. All valuable, but subjective. Two physicians could examine the same patient and reach opposite conclusions. Treatment varied not by disease, but by doctor.

Then someone measured body temperature. Then blood count. Then glucose levels. Then cholesterol, then hemoglobin A1C, then troponin levels, then genetic markers.

Numbers didn’t replace doctors. Numbers gave doctors a shared reality to work from. A blood glucose of 280 means the same thing in Asia and in America or Europe. The measurement created a common language, and with it, accountability.

Education Before Standardized Assessment

A teacher reads your essay and says “good.” Another teacher reads the same essay and says “mediocre.” A third says “brilliant.” The essay hasn’t changed. The judges have.

Before rubrics, scores, and standardized marking systems, educational assessment was entirely subjective. You can argue about whether standardized testing is perfect, it isn’t. But it made assessment comparable. A 92 means the same thing regardless of who graded it. Progress could be tracked. Gaps could be identified. Accountability became possible.

The Deeper Pattern

If you go far enough back, you find the origin of measurement itself: the moment societies decided that subjective trust wasn’t enough.

Signatures. Seals. Notarization. Weights and measures. Currency denominations. Accounting standards. Credit ratings. Safety certifications. Every one of these innovations was born from the same realization: when the stakes are high enough, “trust me” is not a system. Measurement is.

The pattern is always the same. First, a critical human activity operates on judgment alone. Then the stakes get high enough that inconsistency becomes intolerable. Then someone invents a way to measure. And the measurement becomes the new infrastructure-so foundational that within a generation, no one can imagine doing without it.

Now Look at AI

AI systems are making lending decisions. Diagnosing patients. Evaluating students. Screening job applicants. Flagging criminal suspects. Approving insurance claims. Moderating speech. Predicting recidivism.

These are the exact same domains that humanity spent centuries learning to measure. The lending system has a FICO score. The lab has quantitative diagnostics. The exam has a rubric. The contract has a signature.

But the AI that is replacing these systems? The AI that is now making these consequential decisions on our behalf?

Ask “How governed is this AI system?” and the answer you get is: a checklist. A policy document. A consultant’s opinion. A yes-or-no audit. Or worse? a shrug.

There is no number.

The Problem with Binary

The AI governance conversation today is stuck in binary. Compliant or not. Pass or fail. And that framing is the source of the paralysis.

Consider two companies.
Company A has done nothing:no documentation, no fairness testing, no monitoring, no risk assessment.
Company B has documented all its models, implemented bias testing, established human oversight protocols, but hasn’t yet completed adversarial robustness testing.

In a binary system, both fail. Same result. Same bucket. Binary made them identical. But they are not identical. One is at 50. The other is at 680. One needs a transformation. The other needs a nudge.

Binary created a market that is frozen. Companies that haven’t started say “we’re exploring.” Companies that have invested say “what’s the point?” Companies that passed say “we’re done” and stop paying attention - until the next incident.

A Score Changes Everything

A score does what binary cannot. It makes progress visible. It enables comparison. It creates continuous accountability. It creates a market that moves.

At 50, you know you’re early but you’ve started. At 400, you can see progress. At 680, you know exactly which gaps separate you from 800. At 900, you can prove your posture to your board, your regulator, your customers. And tomorrow, if your score drops because a new regulation kicked in or a model drifted, you see it immediately.

Governance isn’t pass/fail. It’s a score.

What’s Next

In my last newsletter, I announced METRIS

Today, I wanted to tell you why it exists. Because the founding insight behind METRIS isn’t technical. It’s historical. Every critical system eventually gets a number. AI’s turn is now.

In the coming weeks, I’ll share: how the scoring engine actually works, what the first assessments are revealing about the state of AI governance in the wild, and how organizations are using their score to move from “we’re exploring” to “here’s where we stand.”

If you’re building AI and wrestling with governance, reply to this newsletter. Tell me what’s broken. What’s working. What you wish existed. I read everything.

Every critical system in human history eventually got a number.

Lending got a credit score. Health got diagnostics. Education got assessments. Security got ratings. Financial health got audited statements.

AI is now the most consequential system in modern life. And it has no number.

Not yet.

Suneeta Modekurty

Founder & Chief Architect, METRIS™

ISO 42001 Lead Auditor | Sanjeevani AI LLC

A.I.N.S.T.E.I.N. is a reader-supported publication. Subscribe to follow the METRIS journey.

January 22, 2026. Mark Your Calendars.

A.I.N.S.T.E.I.N. — Fri, 23 Jan 2026 19:46:08 GMT

Not because Singapore launched a framework.

Not because South Korea’s AI law went live.

Because METRIS arrived.

And honestly? The timing couldn’t be more poetic.

On the same day that governments finally admitted AI governance needs real infrastructure - not more PDFs - we shipped exactly that.

Let me explain.

What Happened on January 22

Singapore released the world’s first agentic AI governance framework. 27 pages of guidance on how enterprises should govern AI agents.

South Korea’s AI Basic Act became enforceable. Mandatory impact assessments. Documentation requirements. Fines for non-compliance.

Two major economies. Two different approaches. Same message:

The era of “we’ll figure out AI governance later” is over.

But here’s what caught my attention in Singapore’s framework:

“AI risk is no longer static, it is dynamic and behavioral.”

Finally. Someone said it out loud.

The Problem With Frameworks

Frameworks tell you what to do.

They don’t tell you how well you’re doing it.

“Implement human oversight” → But how do you measure if it’s meaningful?
“Assess and bound risks” → But what’s your actual risk score?
“Enable accountability” → But across which of the 9 regulatory frameworks that apply to you?

We’ve spent years collecting frameworks like Pokémon cards. EU AI Act. ISO 42001. NIST AI RMF. Singapore MGF. Korea AI Basic Act.

And yet - 94% of AI repositories still fail basic governance requirements.

We know this because we measured it. 2,000+ repositories. 1,429 checkpoints. 9 regulatory frameworks.

The market is drowning in frameworks.

What it’s starving for is measurement.

Enter METRIS

METRIS is what we’ve been building at Sanjeevani AI.

Not another framework. Not another checklist.

A quantitative risk score for AI governance.

Think of it like this:

Frameworks tell you to “be healthy”
METRIS is your blood pressure reading

Here’s what it does:

0-1000 Risk Score → Know exactly where you stand
1,429 Checkpoints → Mapped across 9 regulatory frameworks
Continuous Assessment → Not point-in-time audits
Bayesian Scoring + Monte Carlo Modeling → Because governance isn’t binary

The Singapore framework calls for “continuous monitoring” and “technical controls throughout the agent lifecycle.”

Great. METRIS is how you actually do that.

Why January 22

We could have launched any day.

But when we saw Singapore’s Davos announcement on the calendar, and Korea’s enforcement date landing the same day, we knew.

This was the moment.

Not to ride their coattails - but to draw a line:

January 22, 2026 is the day AI governance stopped being a conversation and started being infrastructure.

They wrote the frameworks.

We built the measurement layer.

What This Means For You

If you’re an enterprise deploying AI - especially agentic AI - here’s the reality:

Voluntary frameworks become market expectations. Singapore’s isn’t mandatory. It doesn’t matter. Your customers, partners, and investors will expect you to comply.
Mandatory requirements are cascading. Korea today. EU AI Act implementation ongoing. Others will follow.
“We’re working on governance” isn’t an answer anymore. The question is: What’s your score?

The enterprises that can answer that question - with data, not promises - will own the trust advantage.

One Ask

If you’re building AI and wrestling with governance, whether you’re trying to comply with frameworks, preparing for audits, or just trying to figure out where you actually stand, I want to hear from you.

Reply to this email. Tell me what’s broken. What’s working. What you wish existed.

I read everything.

January 22, 2026. Mark your calendars.

The day AI governance got a score.

Suneeta Modekurty

Founder, SANJEEVANI AI | Creator of METRIS

The ROI Case Your CFO Can't Ignore

A.I.N.S.T.E.I.N. — Mon, 12 Jan 2026 15:01:06 GMT

S called me last week with a problem.

“I used your cost calculator. Ran our numbers. We’re bleeding almost $800,000 a year on data quality issues.”

“That’s a big number,” I said.

“That’s the problem. It’s too big. When I showed my CFO, she didn’t see an opportunity. She saw a disaster. Her first question was ‘How did we let this get so bad?’ Her second question was ‘How much will it cost to fix?’”

S paused.

“I didn’t have an answer.”

The CFO’s Real Question

Here’s what I’ve learned after 25 years in this space: executives don’t fund problems. They fund solutions with returns.

Showing your CFO that bad data costs $800,000 annually doesn’t get you a budget. It gets you a meeting where everyone argues about whose fault it is.

What gets you a budget is showing that a $150,000 investment returns $500,000 in the first year.

That’s not a problem. That’s a 233% ROI.

The Math That Changes the Conversation

Let me show you how to reframe the data quality conversation from “look how broken we are” to “look what we can capture.”

Step 1: Start with your cost calculator numbers

Let’s say your organization identified:

$312,000 in rework costs (data teams cleaning instead of analyzing)
$450,000 in failed project costs (3 projects that never delivered)
$200,000 in bad decision costs (choices made on flawed data)
$12,000 in compliance documentation time

Total: $974,000 annually

Step 2: Apply realistic reduction rates

You won’t eliminate 100% of these costs. But industry benchmarks show:

Step 3: Calculate the return

Using moderate investment on our $974,000 example:

Cost reduction (40%): $389,600 saved annually
Investment required: $120,000 (tools, training, dedicated resource)
First year net return: $269,600
ROI: 225%
Payback period: 3.7 months

That’s the slide your CFO wants to see.

The Costs They’re Not Counting

When I work with organizations, I always find costs the calculator missed. These are harder to quantify but often larger:

Opportunity cost. Your data scientists spend 50% of their time cleaning data. That’s not just a labor cost - it’s innovation that never happened. Products that never launched. Insights that never surfaced.

Decision latency. When leaders don’t trust the data, they delay decisions. They ask for “one more analysis.” They convene committees. In fast-moving markets, slow decisions are expensive decisions.

Talent attrition. Good data professionals don’t stay at organizations with bad data infrastructure. The cost of replacing a senior data scientist - recruiting, onboarding, ramp-up time - runs $150,000 to $300,000.

Insurance premiums. This one surprises people. D&O insurance carriers are starting to ask about AI governance. Poor data governance is becoming an underwriting factor. I’ve seen premiums increase 40-60% for organizations that can’t demonstrate controls.

Regulatory exposure. The EU AI Act penalties can reach €35 million or 7% of global revenue. GDPR fines have already exceeded €4 billion total. These aren’t theoretical risks anymore.

Building the Business Case

Here’s the structure that works:

Page 1: Current State (The Burning Platform)

Don’t lead with this, but you need it. Show the calculated costs. Keep it factual, not accusatory. Frame it as “what we discovered” not “what went wrong.”

Page 2: The Opportunity (This Is Your Real Opener)

Lead with the ROI. “A $120,000 investment yields $389,600 in annual savings - a 225% return with 3.7 month payback.”

Now you have attention.

Page 3: Risk Reduction (The Insurance Policy)

Beyond cost savings, governance reduces:

Regulatory penalty exposure
Reputational risk from AI failures
D&O insurance premium increases
Audit findings and remediation costs

Quantify what you can. Acknowledge what you can’t but name the risks.

Page 4: Competitive Advantage (The Growth Angle)

Organizations with mature data governance:

Launch AI initiatives faster (they trust their data)
Win more RFPs (they can answer the governance questions)
Attract better talent (professionals want to work with good data)
Command premium valuations (acquirers and investors check this now)

Page 5: The Ask (Specific and Staged)

Don’t ask for everything at once. Propose phases:

Phase 1: Assessment and quick wins ($30,000, 6 weeks)
Phase 2: Foundation building ($60,000, 3 months)
Phase 3: Optimization and scaling ($30,000, ongoing)

Smaller initial asks get approved faster.

The Conversation Shift

S called me again yesterday.

“I rebuilt the presentation. Led with the ROI. Showed the 225% return and 4-month payback.”

“And?”

“She asked how fast we could start.”

That’s the shift. Same numbers. Different framing. Completely different outcome.

Your CFO doesn’t need to understand data quality. She needs to understand returns. Speak her language, and the budget follows.

Your Turn

I’ve built an ROI Calculator that structures this entire conversation. It takes your cost calculator outputs and transforms them into CFO-ready business cases.

Includes:

Investment scenario modeling
ROI and payback calculations
Risk reduction quantification
One-page executive summary template
Talking points for the budget conversation

Ok, a quick note on ‘How the Calculator Works’

The math is simple but powerful:

Step 1: Enter your total current cost (from the Data Quality Cost Calculator, or estimate)

Step 2: The calculator shows three investment scenarios:

Step 3: It calculates automatically:

Annual Savings = Your Current Cost × Cost Reduction %
Net Return = Annual Savings - Investment
ROI = Net Return ÷ Investment
Payback Period = Investment ÷ Annual Savings × 12 months

Example: If your current data quality costs are $974,000 annually and you choose moderate investment:

Annual Savings: $974,000 × 40% = $389,600
Net Return: $389,600 - $75,000 = $314,600
ROI: $314,600 ÷ $75,000 = 419%
Payback: $75,000 ÷ $389,600 × 12 = 2.3 months

That’s the slide your CFO wants to see.

Your CFO doesn’t need to understand data quality. They need to understand returns. Same numbers, different framing, completely different outcome.

Try it yourself.

Download: “AI Governance ROI Calculator” and Transform your cost analysis into a CFO-ready business case with ROI modeling, payback calculations, and executive talking points.

Download AI Governance ROI Calculator

Next week: The regulatory reckoning. The EU AI Act, NIST AI RMF, and a wave of state laws are creating deadlines most organizations aren’t ready for. What happens when “we’ll figure it out later” meets “the deadline is now.”

Founder of SANJEEVANI AI. ISO/IEC 42001 Lead Auditor. 25+ years in AI, data, and compliance across HealthTech, FinTech, EdTech, and Insurance. Building METRIS, the quantitative AI governance platform.

The Anatomy of a Dataset

A.I.N.S.T.E.I.N. — Mon, 05 Jan 2026 15:01:17 GMT

S showed up with a laptop this time. Screen already open to a spreadsheet.

“I did the lineage exercise you suggested,” S said. “Traced one number from the board report all the way back to its source.”

“How did it go?”

“Painful. But useful. I found three places where the data gets transformed and nobody documented why.”

“That’s progress.”

“But it raised another question.” S turned the laptop toward me. “I got to the source dataset. The original table. And I realized I don’t actually know what’s in it.”

“What do you mean?”

“I mean I can see the columns. I can see the rows. I can see the values. But I don’t know if I can trust any of it. I don’t know what’s missing. I don’t know what’s wrong. I don’t know what assumptions are baked in. I’m looking at it, but I’m not seeing it.”

The Illusion of Understanding

This is one of the most common traps in working with data. You open a dataset. It has columns and rows. It has labels that seem clear. Customer_ID. Transaction_Date. Revenue. Status.

You assume you understand it.

But the surface of a dataset tells you almost nothing about what’s actually inside. The labels tell you what someone intended to capture. They don’t tell you what was actually captured. They don’t tell you what’s missing. They don’t tell you what’s inconsistent. They don’t tell you what’s changed since the dataset was created.

“I pulled some basic statistics,” S said. “Row counts. Column types. That kind of thing.”

“That’s a start. What did you find?”

“Nothing obviously wrong. Which is almost worse. I don’t know if there’s nothing wrong, or if I just don’t know how to look.”

“There’s a difference between looking at data and profiling data. Most people do the first. Very few do the second.”

What Data Profiling Actually Means

Data profiling is the systematic examination of a dataset to understand its structure, content, and quality. It goes far beyond opening a file and scrolling through rows.

A proper profile examines several dimensions.

Structure: How many rows? How many columns? What are the data types? Are the types consistent with what the labels suggest? A column called “Age” should contain numbers. A column called “Email” should contain text in a specific format. If the types don’t match expectations, something is wrong.

Completeness: Which fields have missing values? How many? Is the missingness random or systematic? If 90 percent of records are missing a field, that field is probably useless. If only records from a certain time period are missing a field, something changed in the data collection process.

Uniqueness: Are there duplicates? How do you define a duplicate? Two records with the same customer ID? The same email? The same name and address? The answer depends on context, and getting it wrong can significantly skew any analysis.

Distribution: What values actually appear in each column? What’s the minimum? The maximum? The mean? The median? Are there outliers? A column called “Age” should probably have values between 0 and 120. If there’s a value of 999 or -1, that’s likely a placeholder for missing data, not an actual age.

Consistency: Do related fields make sense together? If a record has a “Signup_Date” of 2025 but a “First_Purchase_Date” of 2019, something is wrong. If a “State” field says “California” but the “Zip_Code” starts with “100,” that’s New York, not California.

Validity: Do the values conform to known rules? Email addresses should match a pattern. Phone numbers should have the right number of digits. Status codes should come from a defined list. Dates should be actual dates.

“That’s a lot to check,” S said.

“It is. But most of it can be automated. The problem isn’t that it’s hard to do. The problem is that people don’t do it.”

The Problems Hiding in Plain Sight

Let me give you some examples of what profiling reveals that casual inspection misses.

The 99 Percent Problem

Let’s say you have a dataset of customer orders. It has columns for Order_ID, Customer_Name, Product, Amount, and Country. You’re asked to analyze international sales trends.

You open the file. You scroll through the Country column. You see “USA,” “Canada,” “UK,” “Germany,” “Australia.” Looks good. You have international data.

But when you actually count the values, you discover that 99 percent of the orders are from “USA.” The remaining 1 percent is scattered across 47 different countries, with most having fewer than 10 orders each.

If you’re trying to understand international buying patterns, this dataset is nearly useless. You don’t have enough data from any country except the US to draw meaningful conclusions. But you’d never know this from scrolling. You’d only know it from counting.

The Placeholder Problem

Let’s say you have a dataset of employees. It has columns for Employee_ID, Name, Department, Start_Date, and Annual_Salary. You’re asked to calculate the average salary by department.

You scroll through the Salary column. You see values like $52,000, $78,000, $95,000, $120,000. The numbers look reasonable for salaries.

But when you look at the full distribution, you find something strange. About 15 percent of the values are exactly $0. Another 5 percent are exactly $999,999.

These aren’t real salaries. The $0 values are probably contractors or interns whose salary wasn’t entered. The $999,999 values are probably executives whose salary is confidential and someone entered a placeholder instead.

If you calculate the average salary including these values, you get a number that represents nobody. The zeros pull it down. The 999,999 values pull it up. The result is meaningless.

The Format Problem

Let’s say you have a dataset of transactions. It has columns for Transaction_ID, Customer_ID, Date, and Amount. You’re asked to analyze sales trends by month.

You scroll through the Date column. You see values like “01/02/2024,” “15/03/2024,” “22/04/2024.” They look like dates. Good.

But when you look more carefully, you notice something. Some records show “01/02/2024.” Others show “2024-02-01.” Others show “02-01-2024.”

Three different formats in the same column. And here’s the problem: is “01/02/2024” January 2nd or February 1st? In the US, it’s January 2nd. In Europe, it’s February 1st. In the ISO format, it would be written differently altogether.

If this data came from multiple sources or multiple time periods, the formats might be mixed. Without checking, you could be putting January transactions in February and vice versa. Your monthly trend analysis would be wrong, and you’d never know why.

The Evolution Problem

Let’s say you have a dataset of product sales going back five years. It has columns for Sale_ID, Date, Product_Name, Category, and Revenue. You’re asked to show how each product category has grown over time.

You scroll through the Category column. You see “Electronics,” “Clothing,” “Home Goods,” “Sports.” Looks consistent.

But when you filter by year and look at the values, you discover something. In 2020 and 2021, there was one category called “Electronics.” In 2022 and 2023, that category was split into “Consumer Electronics” and “Professional Electronics.” In 2024, they merged it back into just “Electronics.”

The label looks the same in 2020 and 2024, but they mean different things. In 2020, “Electronics” included everything. In 2024, “Electronics” is the merger of two categories that existed separately for two years.

If you draw a trend line for “Electronics” across five years, the line is lying to you. The dip in 2022-2023 isn’t a real decline. It’s just that the category was split. The spike in 2024 isn’t real growth. It’s just that they merged it back.

The Silent Null Problem

Let’s say you have a dataset of customer contacts. It has columns for Customer_ID, Name, Email, Phone, and Address. You’re asked to verify that the database is complete before a marketing campaign.

You run a quick check for missing values. The system tells you every row has a value in every column. 100 percent complete. Great.

But when you look at the actual values in the Email column, you find entries like “N/A,” “NA,” “none,” “null,” “not provided,” “-,” and what looks like empty space but is actually a single space character.

These are all different ways that people entered “we don’t have this information.” But because they typed something instead of leaving it blank, the system counts them as complete.

Your 100 percent completeness is fiction. You might actually be missing email addresses for 20 percent of your customers. If you send a marketing campaign based on this “complete” data, 20 percent of your emails go nowhere.

“I’ve seen some of these,” S said. “The placeholder thing especially. We have records with birthdates of January 1, 1900.”

“That’s a classic. January 1, 1900 is the default date in many systems. It gets entered when the real date wasn’t known. If you calculate average customer age including those records, your average customer is 125 years old. Obviously nonsense.”

The Metadata Gap

There’s another layer that profiling reveals: the gap between what documentation says and what data contains.

Let me give you an example.

Let’s say you have a dataset of support tickets. It has columns for Ticket_ID, Customer_ID, Date_Opened, Date_Closed, Status, and Priority. You’re asked to analyze how quickly tickets get resolved.

You find a data dictionary that someone wrote two years ago. It says the Status column contains three possible values: “Open,” “Closed,” or “Pending.”

You start writing your analysis based on this. Open tickets are active. Closed tickets are resolved. Pending tickets are waiting for something.

But when you actually look at the Status column, you find values like “Open,” “OPEN,” “open,” “Closed,” “closed,” “CLOSED,” “Pending,” “pending,” “Resolved,” “Cancelled,” “Escalated,” “On Hold,” “Waiting for Customer,” and “Duplicate.”

The documentation said three values. Reality has fifteen. Some are just capitalization differences. Others are completely new statuses that were added after the documentation was written.

If you build your analysis assuming only three statuses exist, you’ll miss or miscategorize a significant portion of your tickets. Your resolution time calculation will be wrong because you didn’t account for “Escalated” or “On Hold” tickets.

Documentation tells you what someone once believed about the data. Profiling tells you what’s actually there today. These are often very different things.

“So the documentation is lies,” S said.

“Not lies. Outdated truths. The documentation was probably accurate when someone wrote it two years ago. But then the support team added new statuses. Someone started entering values in all caps. A new manager created an ‘Escalated’ category. And nobody updated the documentation because nobody owns it.”

The Upstream Blindness Problem

Here’s another thing profiling reveals that’s particularly important if you’re building AI or machine learning models.

Let me explain with an example.

Let’s say you’re building a model to predict which customers will cancel their subscription. You have a dataset of past customers with columns for Customer_ID, Age, Income, Subscription_Length, Support_Tickets_Filed, and Cancelled (yes or no).

You train the model. It works well in testing. You deploy it.

Six months later, the model is making terrible predictions. Customers it said would stay are leaving. Customers it said would leave are staying. What happened?

You go back and profile the original training data. Here’s what you find.

The Age column had placeholder values. About 10 percent of customers had an age of -1, which was entered when age was unknown. The model learned that customers with age -1 had specific cancellation patterns. But -1 isn’t a real age. It’s a data entry convention.

The Income column had duplicates. Some customers appeared multiple times with slightly different information. The model learned those examples more heavily because it saw them more often. But they weren’t more important. They were just duplicated.

The Support_Tickets_Filed column changed meaning over time. In the first year of data, it counted all tickets. In the second year, it only counted unresolved tickets. The model learned patterns based on a number that meant different things at different times.

The model didn’t know these were data quality issues. The model just saw patterns. And it learned whatever patterns helped it make predictions, including patterns that only existed because of data problems.

This is how you get models that work in testing but fail in production. They learned the quirks of the training data, not the underlying reality.

“So profiling is actually a prerequisite for machine learning,” S said.

“It should be. But teams are under pressure to ship models quickly. Profiling feels like a delay. So they skip it, train on whatever data they have, and hope for the best. Sometimes they get lucky. Often they don’t. And when the model fails, they blame the algorithm instead of examining the data.”

What a Proper Profile Contains

Let me describe what a thorough data profile actually looks like, using a concrete example.

Let’s say you have a dataset of online orders. It has columns for Order_ID, Customer_ID, Order_Date, Product_ID, Quantity, Unit_Price, Total_Amount, Shipping_Address, and Status.

A proper profile would examine this dataset across several dimensions.

Summary statistics for every column

For Order_ID: How many total orders? Are all IDs unique, or are there duplicates?

For Quantity: What’s the minimum value? Maximum? Average? Is the minimum 1, or are there zeros or negative numbers that shouldn’t exist?

For Unit_Price: What’s the range? Is the minimum $0.01 or $0.00? Is the maximum reasonable, or is there a $999,999 placeholder?

For Status: What values appear? Just “Shipped” and “Delivered”? Or also “Cancelled,” “Returned,” “Processing,” “On Hold,” and twelve other variations?

Completeness analysis

Which columns have missing values?

Maybe Shipping_Address is missing for 5 percent of orders. Are those digital products that don’t need shipping? Or are they data entry errors?

Maybe Customer_ID is missing for 2 percent of orders. Are those guest checkouts? Or are they system errors?

Is the missingness random, or is it concentrated in certain time periods or certain product types?

Uniqueness analysis

Order_ID should be unique. Is it? If the same Order_ID appears twice, do both rows have identical information, or different information? Which one is correct?

Are there duplicate orders where everything is the same except Order_ID? That might indicate the same order was entered twice by mistake.

Distribution analysis

For Quantity, most orders probably have quantities between 1 and 5. If there’s an order with quantity 10,000, is that a real bulk order or a data entry error?

For Total_Amount, what does the distribution look like? Is it clustered around certain price points? Are there outliers like $0.00 orders or $50,000 orders that need investigation?

Cross-field validation

Does Quantity times Unit_Price equal Total_Amount? If not, which field is wrong?

If Status is “Delivered,” is there a delivery date somewhere? If Status is “Cancelled,” is Total_Amount zero or was the customer still charged?

Does Order_Date make sense? Are there orders dated in the future? Orders dated before the company existed?

Subscribe now

Temporal analysis

Are orders distributed evenly over time, or are there gaps? A gap might mean data wasn’t collected during that period.

Did the average order value change suddenly at some point? That might indicate a pricing change, or it might indicate a data issue.

Did new Status values start appearing at a certain date? That might indicate a system change that needs to be accounted for.

“That sounds like a lot of work for every dataset,” S said.

“It sounds like more work than it is. Most of this can be automated. You run a profiling tool, and it generates these statistics in seconds. The real work isn’t generating the numbers. It’s reviewing them and deciding what they mean.”

The Decision That Follows

Profiling isn’t an end in itself. It’s the foundation for a decision: can you trust this data for your intended purpose?

Let me give you four scenarios.

Scenario 1: Green light.

You profile the order dataset. Everything looks clean. Order_IDs are unique. Quantities and prices are in reasonable ranges. Total_Amount matches Quantity times Unit_Price in 99.9 percent of cases. Missing values are minimal and explainable. Status values match the documentation.

Decision: You can proceed with confidence.

Scenario 2: Yellow light with caveats.

You profile the dataset and find some issues. About 3 percent of orders have a Status of “Unknown” that isn’t in the documentation. About 1 percent have Total_Amount that doesn’t match Quantity times Unit_Price.

Decision: You can proceed, but document the limitations. Exclude the “Unknown” status orders from your analysis, or treat them as a separate category. Flag the mismatched amounts for investigation but don’t let them block the whole project.

Scenario 3: Red light, fixable.

You profile the dataset and find significant issues. The Order_Date column has three different formats. About 15 percent of Unit_Price values are $0.00, which are probably placeholder values. There are duplicate Order_IDs that need to be resolved.

Decision: You cannot proceed until these issues are fixed. The date formats need to be standardized. The $0.00 prices need to be investigated and either corrected or excluded. The duplicates need to be resolved. This is data cleaning work that must happen before analysis.

Scenario 4: Red light, unfixable.

You profile the dataset and find fundamental problems. The data only covers three months, but you need a full year for seasonal analysis. Half the customers are missing demographic information that’s essential for your segmentation. The Status definitions changed twice during the time period and there’s no way to reconcile them.

Decision: This data cannot support your intended purpose. You need different data, or you need to change what you’re trying to do. This is painful to admit, but better to know now than to build something on a broken foundation.

“How often is it that last one?” S asked.

“More often than people want to admit. Projects get approved based on assumptions about data availability. Someone says ‘we have two years of customer data’ and everyone assumes it’s usable. Then someone finally profiles it and discovers half the fields are empty, the definitions changed three times, and it’s actually only useful for six months of analysis. By that point, there’s pressure to proceed anyway. That’s how bad analyses get built. That’s how bad models get deployed.”

Making This Practical

S closed the laptop.

“So before I trust any dataset, I should profile it. Understand what’s actually there. Compare it to what’s documented. Make an explicit decision about whether it’s fit for purpose.”

“That’s exactly right.”

“And if I’m going to train a model on it, I should be especially careful. Because the model will learn whatever patterns are in there, including patterns that only exist because of data problems.”

“Yes. The model doesn’t know that -1 means ‘missing.’ The model doesn’t know that January 1, 1900 means ‘unknown date.’ The model just sees numbers and finds patterns. Your job is to make sure the patterns it finds are real.”

“And most of this profiling can be automated, so it’s not as slow as it sounds.”

“Correct. The barrier isn’t technical. There are tools that generate profiles with a few clicks. The barrier is cultural. It’s making profiling a standard step that happens before anyone builds anything, not an afterthought that happens when something goes wrong.”

Where This Goes Next

“Next week,” I said, “we should talk about what happens after you deploy something. You’ve profiled the data. You’ve built your model or your report. It’s working today. But data changes over time. What happens when the data you trained on stops representing the data you’re seeing? That’s called drift, and it’s one of the most common reasons things fail in production.”

S nodded. “Same time?”

“Same time.”

Next week: Data Drift. Why yesterday's training data fails tomorrow, and how to detect when your data stops representing reality.

You can’t trust what you haven’t examined. And scrolling through rows is not examination.

Download: “Dataset Profiling Checklist” — A systematic guide to understanding what’s actually in your data before you build anything on top of it.

Download Dataset Profiling Checklist

Founder of SANJEEVANI AI. ISO/IEC 42001 Lead Auditor. 25+ years in AI, data, and compliance across HealthTech, FinTech, EdTech, and Insurance. Building METRIS, the quantitative AI governance platform.

The Hidden Cost of "Good Enough" Data

A.I.N.S.T.E.I.N. — Mon, 29 Dec 2025 15:02:32 GMT

S came in looking frustrated.

“I’ve been trying to explain this to people at work,” S said, dropping into the chair. “What we talked about last week. About data being the real problem. About the six dimensions.”

“How’s that going?”

“They nod. They agree in principle. Then they say ‘but our data is good enough.’”

S made air quotes around “good enough.”

“And you don’t think it is?”

“I don’t know. That’s the problem. Nobody knows. Nobody’s measuring. They just assume it’s fine because nothing has visibly exploded.”

“Yet.”

“Yet,” S repeated. “So how do I show them what ‘good enough’ is actually costing?”

Subscribe now

The Invisible Tax

Here’s the thing about data quality problems. They don’t announce themselves.

A server crash is obvious. An application error throws a message on the screen. A security breach makes headlines.

But bad data? Bad data just sits there. Quietly. Doing damage you can’t see.

Think of it like a slow leak in your roof. Water drips into the insulation. The wood starts to rot. Mold grows in places you never look. Everything seems fine from the inside. Until one day the ceiling collapses.

Data quality problems work the same way. They accumulate silently. They compound over time. By the time you notice something is wrong, the damage has spread far beyond the original source.

This is why “good enough” is so dangerous. It feels safe. It feels pragmatic. But it’s actually a bet. You’re betting that the problems you can’t see won’t become problems you can’t ignore.

Most organizations lose that bet eventually.

Where the Money Actually Goes

S leaned forward. “Okay, but can we put numbers on this? When I say ‘data quality costs money,’ people want specifics.”

Let’s get specific.

Gartner has estimated that poor data quality costs organizations an average of $12.9 million per year. But that number is abstract. It doesn’t tell you where the money goes.

Here’s where it actually goes:

Rework. This is the biggest one, and the least visible. Analysts spend time cleaning, correcting, and reconciling data instead of analyzing it. Studies suggest that data professionals spend 40 to 60 percent of their time on data preparation. Think about that. You’re paying skilled people good salaries, and half their time goes to janitorial work that shouldn’t be necessary.

Failed projects. Data science projects fail at a rate of 80 to 85 percent, depending on which study you read. Many of those failures trace back to data issues. The model couldn’t be trained because the data was insufficient. The model was trained but didn’t generalize because the training data didn’t represent reality. The model worked in testing but failed in production because nobody accounted for data drift. Each failed project represents months of work, significant salary costs, and opportunity cost of what those people could have been doing instead.

Bad decisions. This is the hardest to quantify but potentially the most expensive. When decisions are based on incorrect data, the costs show up downstream. A marketing campaign targets the wrong customers. A supply chain forecast misses actual demand. A hiring model screens out qualified candidates. A credit model approves risky borrowers. Each of these decisions made sense based on the data available. The data was just wrong.

Compliance failures. Regulators are increasingly asking organizations to demonstrate data quality. The EU AI Act requires that training data be “relevant, representative, free of errors and complete.” GDPR requires that personal data be accurate and kept up to date. When you can’t demonstrate compliance, you face fines. When you can demonstrate compliance but the documentation takes weeks to assemble, you face operational costs that add up quickly.

Eroded trust. This one doesn’t show up on any balance sheet, but it might be the most corrosive. When stakeholders stop trusting the numbers, they stop using them. Executives make gut decisions instead of data-driven ones. Analysts hedge their findings with so many caveats that the insights become useless. The entire investment in data infrastructure delivers diminishing returns because nobody believes the output.

The Compounding Problem

“That’s a lot of categories,” S said. “But they feel separate. Like different problems that happen to share a cause.”

“They’re not separate. They feed each other.”

This is the part that most people miss. Data quality problems don’t stay contained. They compound.

Here’s how it works.

Bad data enters your system at the point of collection. Maybe a form field wasn’t validated properly. Maybe an integration dropped some records. Maybe someone made a typo during manual entry.

That bad data flows into your data warehouse. Now it’s sitting alongside good data, and there’s nothing to distinguish them. Your warehouse doesn’t know which records are accurate and which are garbage.

An analyst pulls data for a report. They don’t know about the quality issues because nobody documented them. The report goes to leadership. Leadership makes a decision based on the report.

The decision turns out to be wrong. But nobody traces it back to the data. Instead, they blame the analyst’s methodology. Or the model. Or the market conditions. The actual root cause remains hidden.

Meanwhile, the same bad data is being used to train machine learning models. The models learn the patterns in the data, including the patterns that exist only because of errors. The models get deployed. They make predictions. The predictions inform more decisions.

Each step amplifies the original problem. A small error at the source becomes a medium error in the warehouse becomes a significant error in the model becomes a costly error in the real world.

“So fixing it later is always more expensive than fixing it early,” S said.

“Exponentially more expensive. IBM estimated that data errors caught at the point of entry cost about $1 to fix. The same errors caught during data storage cost about $10 to fix. Caught during analysis, $100. Caught after a decision has been made and implemented, potentially thousands or millions.”

“And most organizations catch them at which stage?”

“Most organizations catch them when something visibly breaks. Which is the most expensive stage of all.”

The “Good Enough” Trap

S sat back. “So when people say ‘our data is good enough,’ what they’re really saying is...”

“They’re saying they haven’t looked closely enough to see the problems. Or they’ve seen the problems but haven’t calculated the costs. Or they’ve calculated the costs but decided to defer them to the future.”

“Which just makes the future costs higher.”

“Exactly. ‘Good enough’ is a form of technical debt. And like all technical debt, it accrues interest.”

There’s a psychological component to this too. Once an organization has decided that its data is “good enough,” it becomes very hard to revisit that decision. Admitting that data quality is a problem means admitting that past decisions made on that data might have been wrong. It means acknowledging that significant investment might be needed to fix things. It means taking responsibility for something that has no obvious owner.

So people don’t look. They don’t measure. They don’t ask uncomfortable questions. And the problems compound.

“How do you break that cycle?” S asked.

“You make the invisible visible. You measure. You quantify. You show people what ‘good enough’ is actually costing them in terms they can’t ignore.”

Making It Visible

The first step is to stop treating data quality as a binary. Data is not simply “good” or “bad.” It exists on a spectrum across multiple dimensions. And each dimension can be measured.

Last week, we talked about the six dimensions: accuracy, completeness, consistency, timeliness, validity, and uniqueness. Each of these can be expressed as a number.

Accuracy can be measured by sampling records and verifying them against source documents or real-world facts. If you check 1,000 records and 950 are accurate, your accuracy rate is 95 percent. Is that good enough? It depends on the use case. For a marketing mailing list, maybe. For a medical diagnosis system, absolutely not.

Completeness can be measured by counting null or missing values in required fields. If a dataset has 100,000 records and 15,000 are missing a critical field, your completeness rate is 85 percent. What decisions are being made with that 15 percent gap? What are those decisions costing?

Consistency can be measured by comparing how the same entity is represented across different systems. If a customer appears in three databases with three different addresses, you have a consistency problem. How many downstream processes are affected by that inconsistency?

Timeliness can be measured by tracking the age of records relative to their expected refresh rate. If customer data should be updated monthly but hasn’t been touched in two years, you have a timeliness problem. What decisions are being made based on a reality that no longer exists?

Validity can be measured by checking records against defined business rules. Emails should match a certain format. Ages should fall within a certain range. Status codes should come from a defined list. Every violation represents a record that might cause problems downstream.

Uniqueness can be measured by identifying duplicate records. If the same customer appears five times with slightly different information, not only is the data wrong, but any analysis based on customer counts will be overstated.

“Once you have these numbers,” I told S, “you can start having a different conversation. Instead of ‘our data is good enough,’ you can say ‘our data is 94 percent accurate, 85 percent complete, and has a 12 percent duplicate rate.’ Those numbers mean something. They can be tracked over time. They can be tied to business outcomes.”

Connecting Quality to Outcomes

S picked up on something. “You said ‘tied to business outcomes.’ How do you actually make that connection? How do you go from ‘our data has a 12 percent duplicate rate’ to ‘this costs us X dollars’?”

This is where it gets interesting. And where most organizations stop short.

The connection isn’t always direct, but it can be traced. Let me give you a few examples.

Duplicate customer records mean inflated marketing costs. If your database says you have 100,000 customers but 12,000 of those are duplicates, you’re sending 12,000 unnecessary marketing emails. You’re paying for 12,000 extra impressions in advertising campaigns. You’re overstating your customer base to investors and board members. Each of these has a cost.

Incomplete address data means failed deliveries. If 15 percent of your customer addresses are missing or outdated, some percentage of physical shipments will fail. Each failed delivery has a direct cost for reshipping plus an indirect cost for customer frustration and potential churn.

Stale training data means model degradation. If your machine learning model was trained on data that’s two years old and the world has changed since then, your model’s predictions are increasingly wrong. Each wrong prediction has a cost. A fraud detection model that misses fraudulent transactions costs you money. A recommendation engine that suggests irrelevant products costs you sales.

Inconsistent definitions mean misaligned teams. If the marketing team defines “active customer” differently than the finance team, they’re making decisions based on different realities. When those decisions conflict, someone has to clean up the mess. That cleanup has a cost in time, meetings, reconciliation, and delayed execution.

“So the framework,” S said, “is to measure quality, then trace each quality dimension to specific business processes, then estimate the cost of failures in those processes.”

“That’s exactly right. It’s not always precise. Some costs are hard to quantify. But even rough estimates are better than the fiction of ‘good enough.’”

The ROI Question

S paused for a moment. “Okay. So we measure quality. We calculate costs. Then what?”

“Then you can make an actual business case for improvement.”

This is what transforms data quality from an IT concern into a strategic priority. When you can say “we’re losing approximately $2 million per year due to data quality issues, and an investment of $500,000 would reduce that by 60 percent,” you’re speaking a language that executives understand.

The ROI of data quality work is often surprisingly high. The problem is that nobody calculates it. They see the cost of the investment but not the cost of the status quo. So the status quo wins by default.

“But calculating it requires measuring first,” S said.

“Which brings us back to where we started. You can’t manage what you don’t measure. You can’t improve what you can’t quantify. You can’t make a business case for something you can’t express in numbers.”

“So the first step is always measurement.”

“The first step is always measurement.”

What Changes This

S stood up to go. Then stopped.

“Last week you mentioned that this is a governance issue, not just a technical issue. What did you mean by that?”

“I mean that fixing data quality isn’t just about better tools or more cleaning scripts. It’s about organizational change. It’s about who owns data quality, how it’s measured, how it’s incentivized, and how it connects to larger business objectives.”

“That sounds harder than just buying software.”

“It is harder. It’s also the only thing that actually works. You can buy the best data quality tools in the world, but if nobody owns the problem, if there are no standards, if there’s no accountability, those tools will just generate reports that nobody reads.”

“So how do you actually build that? The governance piece?”

“That’s a bigger conversation. For next week?”

S nodded. “Next week.”

Next week: Metrics that matter versus metrics that mislead. Why single numbers are almost always lying to you, and how to build measurement systems that actually drive improvement.

“Good enough” is a bet that the problems you can’t see won’t become problems you can’t ignore. Most organizations lose that bet eventually.

Download: “Data Quality Cost Calculator” — A simple framework to estimate what data quality issues are costing your organization.

Download Data Quality Cost Calculator

Why Every AI Problem Is Actually a Data Problem

A.I.N.S.T.E.I.N. — Mon, 22 Dec 2025 15:00:59 GMT

S showed up with a new energy this week. Less confused, more frustrated.

“I’ve been reading,” S said, dropping into the chair across from me. “Papers. Blog posts. Case studies. Everyone’s talking about models. GPT-this, neural network-that, transformer-whatever.”

“And?”

“And I keep thinking about what we discussed. The Amazon hiring tool. The healthcare algorithm. The court system. In every case, the problem wasn’t really the model, was it?”

I smiled. S was getting there.

“The model worked,” S continued. “It did exactly what it was designed to do. It found patterns. It made predictions. It scaled decisions.”

“So what was the problem?”

S paused. Then, quietly: “The data.”

The Dirty Secret of AI

Here’s something the AI industry doesn’t advertise loudly: the model is the easy part.

Building a neural network? There are frameworks for that. Training an algorithm? There are tutorials. Deploying a prediction engine? Cloud providers will do it for you with three clicks.

But the data that feeds those models? That’s where everything breaks.

A 2021 study by MIT and IBM found something remarkable: improving data quality by just 10% often delivered better model performance than any algorithmic improvement.

Not sometimes. Often.

Think about that. Companies spend millions on fancier models, more parameters, better architectures. And a methodical data cleanup would have worked better.

Andrew Ng, one of the most respected names in machine learning, has been saying this for years. He calls it “data-centric AI”. This is the recognition that for most real-world problems, the bottleneck isn’t the algorithm. It’s the data.

“So why does everyone focus on models?” S asked.

“Because models are sexy. Data cleaning is not.”

“That’s it? Vanity?”

“Partly. Also, models are contained. You can point to a model. You can benchmark it. You can publish a paper about it. Data is messy. Data is distributed across systems, teams, years of accumulated decisions. Data doesn’t fit on a leaderboard.”

Subscribe now

What “Data Quality” Actually Means

S leaned forward. “Okay, so data matters more than models. But what does ‘good data’ even mean? How do you measure it?”

This is where most conversations go wrong. People treat “data quality” as a vague aspiration, like “innovation” or “synergy.” Something everyone agrees is good but nobody defines.

In reality, data quality has specific, measurable dimensions. Six of them are widely recognized:

1. Accuracy Does the data reflect reality? If your database says a customer lives in Chicago but they moved to Denver three years ago, that’s an accuracy problem. Seems obvious. But in large systems with millions of records, accuracy degrades constantly and… “silently”.

2. Completeness Is the data present? Missing values aren’t just annoying. they’re informative in the wrong way. If 80% of your “income” field is blank, any model trained on that data will learn to work around the gaps. Sometimes in ways you don’t expect.

3. Consistency Does the same thing mean the same thing everywhere? One system records dates as MM/DD/YYYY. Another uses DD/MM/YYYY. One team defines “active customer” as anyone who logged in this year. Another team means anyone who made a purchase. Same label. Different meanings. Model trained on this will learn confusion.

4. Timeliness Is the data current? A credit risk model trained on 2019 data might be useless in 2024. The world changed. Customer behavior changed. The economy changed. But the model is still looking for patterns that no longer exist.

5. Validity Does the data conform to defined formats and rules? An email field should contain emails. An age field shouldn’t contain negative numbers. A country code should match a real country. Validity violations often indicate upstream problems such as poorly designed forms, broken integrations, or manual entry errors.

6. Uniqueness Is each record represented once? Duplicate records create phantom patterns. If the same customer appears twice with slightly different information, the model might learn that these are two different people with correlated behavior. They’re not. It’s just messy data.

“These aren’t abstract concepts,” I told S. “Each one is measurable. Each one has thresholds. Each one can be monitored.”

“But most companies don’t monitor them.”

“Most companies don’t even know these dimensions exist. They think of data quality as ‘does it load without errors.’ That’s not quality. That’s just functioning.”

The Real Cost of “Good Enough”

S pulled out a notebook. I recognized this. It’s the signal that something was landing.

“Can you quantify this? Like, what does bad data actually cost?”

“Gartner estimated that poor data quality costs organizations an average of $12.9 million per year.”

“Average?”

“Average. Some industries are worse. Healthcare. Finance. Insurance. Anywhere decisions are high-stakes and data is distributed across legacy systems.”

But the headline number hides the more interesting story: where those costs come from.

Direct costs:

Rework: Analysts spending 40-60% of their time cleaning data instead of analyzing it
Errors: Incorrect decisions based on incorrect data
Integration failures: Systems that can’t talk to each other because data formats don’t match

Indirect costs:

Missed opportunities: Patterns you can’t see because the data is too noisy
Model failures: Predictions that don’t generalize because training data didn’t represent reality
Compliance violations: Regulatory penalties when data governance requirements aren’t met

Hidden costs:

Eroded trust: Stakeholders who stop believing the numbers
Decision paralysis: Leaders who won’t act because they don’t trust the data
Technical debt: Workarounds that accumulate until the system becomes unmaintainable

“The thing about data quality costs,” I said, “is that they’re distributed. No single failure looks catastrophic. It’s a thousand small problems, each one seemingly tolerable, adding up to systemic dysfunction.”

“Death by a thousand cuts.”

“Exactly. And because no single cut is fatal, nobody prioritizes the bandages.”

The Lifecycle View

S flipped to a new page. “So where does data quality break down? Is there a pattern?”

There is. And it follows the data lifecycle.

At creation: Garbage enters the system. Bad forms, broken integrations, manual errors, unclear definitions.

At storage: Quality degrades over time. Records go stale. Schemas drift. Corruption happens.

At processing: Transformations introduce errors. Joins create duplicates. Business logic gets misapplied.

At analysis: Models amplify noise. Bias compounds. Overfitting hides problems until production.

At decision: Actions based on bad analysis create real-world harm. And often, that harm generates data that feeds back into the system — creating a loop of dysfunction.

“Every stage is a potential failure point,” I said. “And most organizations only check at one or two stages, usually the ones closest to the final report.”

“So they catch problems at the end, when it’s too late.”

“Or they don’t catch them at all. The decision looks confident. The numbers look precise. Nobody realizes the foundation is sand.”

Why This Is an AI Governance Problem

S sat back. “This feels like a data engineering conversation. What does it have to do with AI governance?”

“Everything.”

There’s a connection here that most people miss. When we talk about AI governance, people tend to think about models, algorithms, and outputs. But if you actually read the major AI governance frameworks (AI governance frameworks like the EU AI Act, NIST AI RMF, ISO/IEC 42001), you’ll find that data is at the center of all of them. This isn’t an afterthought. Data is treated as the foundation.

Take the EU AI Act as an example. Article 10 specifically requires that training, validation, and testing datasets be “relevant, representative, free of errors and complete.” This is not a suggestion or a best practice. It is a legal requirement for any AI system classified as high-risk. If your data doesn’t meet these standards, your AI system is not compliant. It’s that simple.

The NIST AI Risk Management Framework takes a similar approach. It dedicates an entire function to data-related concerns. The framework talks extensively about measuring data quality, documenting where data comes from, tracking how data changes over time, and monitoring for drift that could affect model performance.

ISO/IEC 42001, which is the international standard for AI management systems, requires organizations to demonstrate that they have proper data governance practices in place. You cannot get certified without showing that you understand and control the data flowing into your AI systems.

The message from regulators is clear. If you want to govern AI, you have to start by governing data.“So when regulators come asking about your AI system,” I said, “they’re not just asking about the model. They’re asking about the data. Where did it come from? How was it validated? How do you know it’s still representative?”

“And most companies can’t answer those questions.”

“Most companies have never been asked. That’s about to change.”

The Uncomfortable Question

S closed the notebook. Looked at me directly.

“Data quality is important. It’s measurable. It’s connected to everything downstream. So why isn’t it treated that way?”, She asked.

I’ve thought about this question for years. The answers are unsatisfying but honest:

Organizational silos. Data is created by one team, stored by another, processed by a third, analyzed by a fourth. No one owns quality end-to-end.

Misaligned incentives. Data engineers are measured on pipeline uptime, not data accuracy. Data scientists are measured on model performance, not data completeness. Executives are measured on outcomes, not the quality of inputs.

Invisible failures. When a model makes a bad prediction because of bad data, the root cause is rarely traced back. The model gets blamed. The data stays broken.

Short-term thinking. Fixing data quality is expensive upfront and pays off slowly. Shipping a model is fast and visible. Organizations optimize for what they can see.

“So the system is designed to ignore the problem,” S said.

“The system is designed to optimize for other things. Data quality is a casualty of those optimizations.”

“Until it isn’t. Until something breaks badly enough that people notice.”

“Right. And by then, the cost of fixing it is ten times what it would have been to do it right from the start.”

What Changes This

S asked the question I was waiting for: “So what would it actually take to treat data quality as foundational? Not as an afterthought?”

A few things:

Measurement: The first change is to start measuring at all. There’s an old saying in management: you can’t manage what you don’t measure. This applies directly to data quality. Most organizations don’t measure data quality in any systematic way. They might check it once a year during an audit, or maybe once a quarter when a report is due. But that’s not real measurement. That’s occasional inspection. Data quality changes constantly. New records come in. Old records go stale. Systems get updated. People make errors. If you only look once a quarter, you’re flying blind for three months at a time. What you need is continuous monitoring. That means dashboards that track all six dimensions we discussed: accuracy, completeness, consistency, timeliness, validity, and uniqueness. It also means alerts that notify the right people when any of these dimensions fall below acceptable thresholds. Without this kind of ongoing measurement, you’re reacting to problems after they’ve already caused damage.

Ownership: The second change is to assign clear ownership. Someone has to be accountable for data quality. And I don’t mean “the data team” as some vague abstraction. When everyone is responsible, no one is responsible. You need a specific person with a name and a title who owns data quality. This person needs real authority to make decisions about data standards, data processes, and data tools. They need a budget to invest in the infrastructure and people required to maintain quality. And there need to be real consequences when data quality fails on their watch. If nobody’s job depends on data quality, data quality will always lose to whatever has a deadline attached to it.

Integration: The third change is to integrate quality checks into every stage of work. Data quality can’t be a separate workstream that happens before the “real work” begins. That approach treats quality as a one-time gate you pass through and forget about. Instead, data quality has to be embedded into every stage of the data lifecycle. When data enters a pipeline, there should be automated quality checks. When data feeds a model, there should be validation rules that flag problems before training begins. When data informs a decision, there should be confidence bounds that reflect the quality of the inputs. If the underlying data is shaky, the decision-maker needs to know that. Quality isn’t a phase. It’s a continuous practice that runs alongside everything else.

Incentives: The fourth change is to align incentives. Until data quality becomes part of how people are evaluated, it will always be deprioritized. People focus on what they’re measured on. If a data engineer is measured only on pipeline uptime, they will optimize for uptime and ignore quality. If a data scientist is measured only on model accuracy, they will chase accuracy without questioning whether the training data was reliable. To fix this, you need to tie data quality metrics to the things people care about. Include quality scores in performance reviews. Make quality thresholds a requirement for project approvals. Block model deployments if input data doesn’t meet validation standards. When quality affects promotions, bonuses, and project success, people start paying attention to it.

“That’s a lot of organizational change,” S said.

“It is. And that’s why most organizations don’t do it. They’d rather deal with the symptoms than address the cause.”

“But the symptoms keep getting worse.”

“They do. And at some point, the cost of ignoring the problem exceeds the cost of fixing it. The smart organizations are figuring that out now. Before the regulators force them to.”

Where This Goes Next

S stood up to leave. Paused at the door.

“So we’ve established that AI is fundamentally a data problem. That data quality is measurable. That it breaks down at every stage of the lifecycle. That it’s a governance issue, not just a technical issue.”

“Yes.”

“What I still don’t understand is: how do you actually measure it? Not in theory. In practice. What are the metrics? What are the thresholds? How do you know when something is ‘good enough’ versus when it’s a risk?”

“That’s the language of measurement,” I said. “And it’s more nuanced than most people realize.”

“Next week?” she asked.
“Yes. Next week,” I said.

Next week: The difference between a metric that matters and a metric that misleads, and why single numbers are almost always lying to you.

The data that trains your AI is the foundation everything else stands on. Cracks in that foundation don’t disappear. They propagate.

Download: “Data Quality: The 6 Dimensions” — A visual guide to what data quality actually means and the questions to ask at each stage of the lifecycle.

Download Data Quality: The 6 Dimensions

Founder of SANJEEVANI AI. ISO/IEC 42001 Lead Auditor. 25+ years in AI, data, and compliance across HealthTech, FinTech, EdTech, and Insurance. Building METRIS, the quantitative AI governance platform.

Context ROT: The Hidden Crisis Destroying AI Governance From Within

A.I.N.S.T.E.I.N. — Sun, 21 Dec 2025 01:14:39 GMT

A few days ago, I ran a poll asking the AI governance community a simple question: What creates the deepest data silos in your organization?

The responses clustered around two familiar culprits: “People & Processes” and “AI Systems.” Both received the lion’s share of votes, and for good reason. These are the visible, tangible causes we discuss in boardrooms and document in governance frameworks.

But here’s what the conversation is missing and what might be the most dangerous oversight in modern AI governance:

Redundant, Obsolete, and Trivial (ROT) data is the silent multiplier that transforms manageable silos into governance nightmares.

This isn’t just about storage costs or cleanup projects. This is about a fundamental crisis in how we think about data quality, lineage, and trustworthiness in AI systems.

The Triple Threat: How Silos Actually Form

Before we dive into ROT, let’s establish the foundation. Data silos in AI governance emerge from three interconnected sources:

1. People & Processes: The Distributed Truth Problem

Teams create silos organically as they solve problems independently. Marketing builds their customer segmentation model using their version of customer data. Sales forecasts revenue using their CRM export. Product analytics trains churn prediction on their event stream.

Each team believes (genuinely believes) that their copy represents the source of truth.

The result? You don’t have one customer dataset. You have twelve. Each slightly different. Each with different update frequencies. Each with different data quality rules. And when you try to reconcile them, you discover:

Customer #47392 exists in six systems with four different email addresses
Last purchase date varies by up to three weeks depending on which system you query
Lifetime value calculations differ by 40% across departments

This isn’t malice. This is organizational entropy.

2. AI Systems: The Technical Fragmentation

Even when teams want to collaborate, AI systems actively work against them.

Modern ML pipelines fragment data across stages:

Raw data in data lakes
Cleaned data in feature stores
Training datasets in experiment tracking systems
Validation sets in model registries
Production inference logs in monitoring platforms

Each system speaks a different schema language. Each has different versioning conventions. Some track lineage; others don’t. The data scientist who trained your production model six months ago has since left the company, and nobody can definitively say which exact dataset version was used.

When model drift appears, you’re left investigating shadows.

3. ROT Data: The Exponential Multiplier

Now here’s where it gets truly problematic.

Both people-driven silos and system-driven fragmentation are difficult problems. But they’re manageable if you have clean, current, traceable data.

ROT data makes both problems exponentially worse.

Understanding the ROT Phenomenon

Let me break down what ROT actually means in an AI governance context:

Redundant Data: Multiple copies of the same information across systems, teams, and time periods. Not just duplicates in the traditional sense, but semantically identical data stored in different formats, granularities, and contexts.

Obsolete Data: Information that was once valuable but is no longer current, accurate, or relevant. This includes deprecated customer segments, outdated product taxonomies, superseded regulatory classifications, and retired feature definitions.

Trivial Data: Low-value noise that clutters systems without contributing meaningful signal. Debug logs that never get analyzed. Test datasets that outlived their experiments. Temporary tables that became permanent fixtures.

The insidious thing about ROT? It compounds.

Redundant data creates more opportunities for obsolescence. Obsolete data gets duplicated by automated processes. Trivial data accumulates in the gaps between systems. And before you know it, 60-80% of your AI governance surface area consists of data that actively undermines your objectives.

Introducing Context ROT: When Insights Lose Meaning

This brings us to what I believe is the core crisis: Context ROT.

Context ROT occurs when the data supporting your insights, decisions, and governance metrics becomes so stale, duplicated, or untraceable that the insights themselves lose meaning.

Let me illustrate with a real scenario I encountered recently:

Case Study: The Phantom Drift Incident

A financial services company was monitoring their credit risk model using a sophisticated drift detection system. One Tuesday morning, the dashboard lit up red. That’s a significant feature drift detected in the “employment_sector” variable.

The model risk team investigated. They pulled the training distribution and compared it to the production distribution. Sure enough, massive shift. The “retail” sector had dropped from 18% to 3% of applications.

Alarm bells rang. Had the economy shifted that dramatically? Was this a data quality issue? Should they retrain the model?

After three days of investigation, they discovered the truth:

The “employment_sector” taxonomy had been updated 14 months earlier. The new taxonomy split “retail” into “retail_essential” and “retail_discretionary.” The production system was using the new taxonomy. The model monitoring system was comparing against training data that used the old taxonomy.

There was no drift. There was no model degradation. There was Context ROT.

The governance dashboard was showing a critical alert based on data that no longer matched reality. The investigation consumed dozens of hours. Trust in the monitoring system eroded. And this wasn’t an isolated incident. It was one of dozens of similar issues.

How ROT Undermines Core AI Governance Principles

The impact of ROT extends far beyond false alarms. It systematically degrades every pillar of responsible AI:

1. Model Trustworthiness

Question: Can you trust predictions built on stale data?

If your training dataset includes customer segments that were deprecated 18 months ago, and your model learned patterns from those outdated segments, what exactly is your model predicting?

When 40% of your training data is redundant (multiple copies of the same transactions from different source systems), you’re not training on a representative sample. You’re training on a distorted one where certain patterns are artificially amplified.

2. Explainability

Question: How do you explain decisions when you can’t verify data lineage?

A regulator asks: “Why was this loan denied?”

Your model says: “Primary factor was debt-to-income ratio of 0.87”

The regulator asks: “Where did that ratio come from?”

You investigate and find three different debt-to-income calculations in your pipeline:

One from the original application
One from a credit bureau
One from an internal risk model

All three are stored. All three are slightly different. Which one did the model actually use? The lineage is broken because of redundant data storage without proper version control.

You cannot explain a decision you cannot trace.

3. Risk Metrics and Compliance

Question: Are your compliance dashboards showing reality or illusion?

Your model fairness dashboard reports that your approval rates by demographic group are within acceptable bounds. Excellent.

Except the demographic data feeding that dashboard is 22 months old for 35% of customers. You’re measuring fairness against obsolete information. Your compliance metric is technically accurate but substantively meaningless.

This is Context ROT in its purest form.

4. Audit Readiness

Question: Can you prove which data version was used for which decision?

An auditor requests evidence that a specific model decision from six months ago complied with regulations in effect at that time.

You need to show:

The exact input data used
The model version deployed
The feature transformations applied
The regulatory rules in effect

But your data lake contains seventeen versions of the customer profile table from that time period. Some are incremental snapshots. Some are full loads. Some have conflicting timestamps. Which one was actually used?

Without definitive answers, you’re exposed to regulatory risk not because you did anything wrong, but because you can’t prove you did it right.

The Compounding Effect: Why ROT Makes Silos Worse

Here’s the crucial insight that connects everything:

ROT doesn’t just coexist with silos. It actively amplifies them.

When Marketing’s customer dataset develops ROT, they don’t clean it up. They create a new, “cleaner” copy. Now you have two siloed datasets instead of one.

When Sales discovers obsolete account records, they don’t remove them. They add a flag to indicate “legacy” status. The obsolete data persists, confusing every downstream system.

When Product’s feature engineering pipeline generates trivial intermediate tables, those tables get backed up, replicated, and referenced by other teams who assume they’re important. The trivial becomes permanent.

Each silo becomes a breeding ground for ROT. Each ROT dataset spawns new silos. The cycle accelerates.

And the organization’s ability to govern AI degrades proportionally.

Breaking the Cycle: A Framework for ROT Remediation

Eliminating ROT isn’t a weekend cleanup project. It’s a fundamental shift in how organizations think about data as a governance asset.

Here’s a framework I’ve developed for organizations serious about addressing this:

Phase 1: Discovery and Assessment

Map the Shadow Data Landscape

Most organizations have no idea how much ROT exists in their systems. Start with:

Data lineage mapping: Where does each dataset come from? Where does it go? What transformations occur?
Replication audit: Identify all copies of semantically similar data. Don’t just look for exact duplicates — look for different representations of the same business entity.
Staleness analysis: When was each dataset last updated? When was it last meaningfully used (not just queried, but used in a decision)?
Value assessment: What decisions depend on this data? If it disappeared tomorrow, what would break?

This discovery phase often reveals shocking truths. I’ve worked with organizations where 70% of their “critical” data assets hadn’t been used in production for over a year.

Phase 2: Categorization and Prioritization

Not all ROT is created equal. Prioritize based on:

High-Risk ROT:

Obsolete data still feeding production models
Redundant data creating inconsistent governance metrics
Trivial data consuming significant storage or processing resources

Medium-Risk ROT:

Deprecated datasets with unclear retirement dates
Duplicate data with conflicting quality rules
Low-value data in critical pipelines

Low-Risk ROT:

Archived data properly isolated from production
Development/test data clearly labeled
Historical data with valid retention justification

Focus remediation efforts on high-risk ROT first.

Phase 3: Implementation of Hygiene Infrastructure

This is where most organizations fail. They treat ROT as a cleanup project rather than a continuous governance function.

Sustainable ROT remediation requires infrastructure:

1. Automated Data Lifecycle Policies

Define retention schedules based on data category and regulatory requirements. Implement automated archival and deletion workflows. Make the default state “expiring” rather than “permanent.”

2. Version Control for Datasets

Treat datasets like code. Every version should be tagged, traceable, and tied to specific model deployments. When you deprecate a dataset, the deprecation should be tracked as explicitly as the creation.

3. Canonical Source Designation

For every business entity (customer, product, transaction), designate one canonical source of truth. All other representations should be clearly marked as derived views with explicit lineage back to the canonical source.

4. Duplication Prevention Controls

Before creating a new dataset, require teams to search for existing similar datasets. Implement approval workflows for new data stores. Make it harder to duplicate than to reuse.

5. Quality Gates in Data Pipelines

Implement automated checks that flag:

Data older than defined freshness thresholds
Duplicates across systems
Orphaned datasets with no downstream consumers
Schemas that don’t match registered standards

Phase 4: Cultural Transformation

Technology alone won’t solve this. You need organizational buy-in.

Data Ownership Accountability

Assign explicit ownership to every dataset. The owner is accountable for:

Maintaining data quality
Preventing unauthorized duplication
Executing timely deprecation
Documenting lineage and dependencies

Deprecation Workflows

Make it socially acceptable — even celebrated — to sunset old data. Create clear processes for:

Announcing deprecation timelines
Migrating downstream dependencies
Archiving for compliance
Verifying complete removal

Governance Metrics That Matter

Track and report:

Percentage of production models using deprecated data
Time-to-remediation for identified ROT
Storage costs attributed to trivial data
Lineage completeness scores

Make ROT reduction a KPI for data teams, not just a best practice.

The Strategic Imperative: ROT as Infrastructure

Here’s my central thesis:

Data hygiene is not operational maintenance. Data hygiene is governance infrastructure.

You wouldn’t build a critical application without version control, testing, and deployment pipelines. You wouldn’t run production systems without monitoring, logging, and incident response.

Why would you build AI governance on data infrastructure that lacks basic hygiene controls?

Organizations that will lead in AI governance over the next decade aren’t the ones with perfect processes from day one. They’re the ones who recognize that:

ROT is inevitable — entropy always increases without active intervention
ROT is measurable — you can quantify redundancy, obsolescence, and triviality
ROT is addressable — with the right infrastructure and incentives
ROT reduction is continuous — not a project, but a practice

Looking Forward: The Governance Advantage

There’s a competitive dimension to this that’s worth acknowledging.

As AI regulation intensifies globally, from the EU AI Act to emerging US frameworks to sector-specific requirements, the ability to demonstrate clean data lineage, explainable decisions, and trustworthy metrics will become a significant advantage.

Organizations still struggling with ROT will find compliance prohibitively expensive. Audits will take months instead of weeks. Regulatory inquiries will expose fragility.

Organizations that have invested in data hygiene as governance infrastructure will move faster, prove compliance easier, and earn greater trust from stakeholders.

The time to address ROT isn’t when the auditor asks for evidence. It’s now.

Conclusion: The Residue That Keeps Silos Alive

Data silos don’t form from a single root cause. They compound across people, processes, and technology, creating an interconnected web of fragmentation.

And ROT is the residue that keeps them alive.

It’s the organizational plaque that hardens over time, making every governance initiative harder, every audit more painful, and every AI deployment riskier.

But unlike some governance challenges, ROT is solvable. It requires commitment, infrastructure, and cultural change but it’s fundamentally tractable.

The question is whether your organization will treat it as the critical governance issue it is, or continue relegating it to the backlog as “technical debt we’ll address someday.”

Because in AI governance, Context ROT isn’t just a data quality problem.

It’s a trust crisis waiting to happen.

What’s your experience with ROT in AI systems? Have you seen governance metrics distorted by stale, redundant, or trivial data? I’d love to hear your stories and strategies in the comments.

The Silent Architecture of Trust in AI

Tue, 16 Dec 2025 15:01:10 GMT

S called me again after last week’s article.

“Okay, I get it now. AI tunes itself to patterns in data. It doesn’t understand. It just finds the frequency that works. But what happens when the data is... wrong?”

That’s the question that keeps me up at night.

The Hiring Tool That Worked Perfectly

A company deployed an AI hiring tool. Fast, efficient, everyone impressed.

Then someone asked: “Why did it reject this candidate?”

They checked the system. The algorithm worked. The metrics looked accurate.

But when they traced backwards, they found the real issue: training data from 2019 that excluded entire demographics.

The AI didn’t fail. It tuned itself exactly to what it was given. It found a frequency and that frequency was biased.

This is what happens when the signal is wrong.

Data Is Evidence, Not Truth

Here’s the uncomfortable reality most teams don’t want to hear: data is not truth. It’s evidence. And evidence carries assumptions, blind spots, and biases.

Every row in a dataset represents a choice someone made about what to record and how. Those choices shape what the AI learns.

In AI systems, data plays two roles:

It teaches the model how to see the world
It judges the model’s decisions later

When both the teacher and the judge are flawed, the system has no way to know it’s wrong.

This is why governance must begin before the first line of code. Before the model. Before the architecture. At the data.

The Anatomy of Bad Data

Most bad data isn’t intentional. It’s small, everyday errors that quietly corrupt AI systems.

Inaccurate Data A salary entered as $50000 instead of $50,000. Seems minor. But now every calculation downstream is wrong.

Incomplete Data A patient’s record missing age or medical history. The model can’t learn accurate patterns from what isn’t there.

Inconsistent Data “USA”, “U.S.”, and “United States” treated as three separate values. The system sees three different countries.

Outdated Data Using 2018 customer trends to predict 2025 behavior. The world changed. The model didn’t.

Biased Data A hiring dataset with mostly male candidates. The AI learns to favor what it saw most. Not because anyone told it to but because that’s what the data showed.

Bad data isn’t just incorrect. It’s unverified assumptions treated as fact.

The Domino Effect

Remember the FM tuning analogy? The system adjusts based on every example it sees.

Now imagine what happens when flawed data enters the pipeline:

Feature engineering amplifies wrong signals
Model tunes to flawed patterns
Testing passes because the test data has the same flaws
Deployed model reinforces errors in production
Retraining locks the bias in as “truth”

Each step makes it harder to find the original problem. By you notice something’s wrong, the error has multiplied through the entire system.

This isn’t sabotage. It’s systemic inertia. And it happens more often than anyone wants to admit.

The Real Cost

McKinsey reports bad data costs enterprises 15–20% of revenue.

But the real cost isn’t money. It’s justice.

Predictive policing systems that target the wrong neighborhoods—because historical arrest data reflected biased enforcement, not actual crime.

Loan models that reject qualified applicants because missing fields correlated with demographics the system learned to penalize.

Healthcare algorithms that recommended less care for certain patients because cost data was used as a proxy for health needs.

These weren’t evil intentions. They were patterns in data, learned at scale, deployed without adequate governance.

AI governance isn’t slowing innovation. It’s protecting people from systems that don’t know they’re causing harm.

Governance: The Immune System

Think of governance as your AI’s immune system. It detects, prevents, and corrects damage before it reaches people.

Several frameworks guide this:

ISO/IEC 5259 — Data quality standards
ISO/IEC 42001 — AI Management Systems
NIST AI RMF — Govern, Map, Measure, Manage

At its core, governance answers one question: “What data caused this decision, and who is responsible?”

If you can’t answer that, you don’t have governance. You have hope.

Data as Ethical Material

This might be the most important shift in thinking: data isn’t just technical material. It’s ethical material.

Every row in your dataset represents a human life: their job application, their loan request, their medical history, their interactions with your product.

When you train an AI on that data, you’re encoding decisions that will affect people like them. At scale. Automatically. Without human review.

Clean data isn’t a feature. It’s a responsibility.

The Loop That Never Ends

Data → Model → Decision → Feedback → Retraining → Data

This loop runs continuously in every AI system. New data comes in. Models update. Decisions change.

Governance keeps this loop honest. It asks:

Where did this data come from?
What assumptions does it carry?
Who does it represent and who does it miss?
What decisions will it drive?
Who’s accountable when something goes wrong?

Without governance, the loop runs blind. And blind systems cause harm they can’t see.

Key Takeaways

For Builders:

Audit data before you architect models
Track data lineage like you track code
Build feedback loops for bias and drift

For Leaders:

Data quality is a governance issue, not an IT task
Invest in data stewardship roles
Ask “Where did this data come from?” before “What can this model do?”

For Everyone:

The AI systems affecting your life are only as good as their training data
When something feels wrong, it might be a data problem nobody caught
Demand transparency, not just explanations

A Question to Sit With

If data is the DNA of AI, what kind of organism are we creating?

Next Week

We’ve covered what AI is, how it learns, and why data is the foundation of everything.

Next week, we go deeper: specific case studies of AI failures that made headlines. Amazon’s hiring AI. Healthcare algorithms. Credit systems. What went wrong, why nobody caught it, and what governance would have changed.

Suneeta Modekurty
Founder, A.I.N.S.T.E.I.N.

🚨 FLASH BRIEFING: Trump's AI Executive Order

A.I.N.S.T.E.I.N. — Fri, 12 Dec 2025 18:34:34 GMT

The Headlines vs. The Reality

What happened: President Trump signed an executive order creating an “AI Litigation Task Force” to challenge state-level AI regulations through lawsuits and funding cuts.

What the headlines say: “Federal government takes over AI oversight”

What’s actually happening: We just entered a period of ‘maximum regulatory uncertainty’ and your risk exposure probably increased, not decreased.

Three Things Most Analysis Is Missing

1. The EU AI Act Doesn’t Care About US Politics

August 2026 is still coming. If you sell to European customers, process EU citizen data, or have European operations, nothing changed for you today.

In fact, companies betting on US deregulation while ignoring EU requirements are setting themselves up for a painful collision with €35 million fines.

2. State Laws Don’t Disappear Overnight

The executive order creates a process for challenging state laws, not an immediate preemption. Colorado’s SB 205, California’s CPRA AI provisions, Illinois’ BIPA, NYC’s Local Law 144… these remain enforceable until successfully challenged in court.

Timeline reality check:

- Commerce Department evaluation: 90 days minimum

- Litigation process: 12-36 months per state

- Appeals: Add another 12-24 months

Translation: You’re operating under existing state laws for at least 2-3 more years in many jurisdictions.

3. Uncertainty Is the Real Risk

The worst position? Neither fully compliant with current state laws nor prepared for whatever federal framework emerges.

Companies now face:

- Enforcement risk from states racing to act before preemption

- Litigation risk from the new federal Task Force

- Reputational risk from policy whiplash

- International risk from diverging US/EU approaches

What Smart Companies Are Doing Right Now

This week:

- Audit current state-law exposure (which laws apply to your AI systems today?)

- Document your compliance posture (if enforcement accelerates, you want evidence of good faith)

- Brief leadership on the transition period risks

This month:

- Prioritize EU AI Act readiness (the clearest, most enforceable framework)

- Model financial exposure under multiple regulatory scenarios

- Identify which compliance investments remain valuable regardless of US policy direction

This quarter:

- Build governance infrastructure that transcends any single framework

- Establish audit trails and documentation practices

- Develop board-ready risk quantification

Need help auditing your current exposure? Reply to this email with “AUDIT” and I’ll send you our AI Governance Quick Assessment checklist.

The Quantified View

Here’s what our risk models show for a mid-sized company with AI systems operating across US and EU markets:

Scenario 12-Month Risk Exposure

Pre-executive order baseline $1.2M - $3.8M

Transition period (current) $1.8M - $5.2M

Full federal preemption (2027+) $0.9M - $2.1M

EU-only compliance focus $2.4M - $7.1M

The counterintuitive finding: Risk exposure ‘increases’ during the transition period due to enforcement uncertainty, potential state “enforcement sprints,” and the complexity of tracking evolving requirements.

Want to see your organization’s specific risk numbers?

Request a METRIS Assessment →

The Bottom Line

This executive order is a gift to companies who’ve been dragging their feet on governance, if they use the transition period wisely.

It’s a warning to companies who interpret “deregulation” as “do nothing.”

And it’s irrelevant to your EU AI Act obligations.

The winners in 2026 won’t be the companies that guessed correctly about US policy direction. They’ll be the ones who built governance infrastructure robust enough to adapt to whatever comes.

What We’re Watching

- Commerce Department’s 90-day evaluation (due mid-March 2026)

- State attorney general responses (expect California and New York to push back hard)

- EU reaction to US regulatory divergence

- Insurance and investor community signals

We’ll send updates as these develop.

Subscribe now

About This Brief

I’m Suneeta Modekurty, Founder of SANJEEVANI AI and creator of METRIS, a quantitative AI governance platform that transforms compliance assessments into financial risk projections. When regulations shift, our models help you understand what it actually means for your bottom line.

Three ways to stay ahead:

Follow me on LinkedIn for real-time analysis as this evolves
Download the technical paper on Zenodo (DOI: 10.5281/zenodo.17850617)
Reply with your biggest compliance question, I answer every email

You received this special edition because you’re on our AI Governance Intelligence list. Regular editions continue as usual, every week Tuesday 9:00 am CST

How AI Actually Learns

A.I.N.S.T.E.I.N. — Tue, 09 Dec 2025 15:01:53 GMT

My friend (S) called me after reading last week’s article.

“Okay, I get that AI learns from examples instead of following rules. But how does that actually work? How does showing a computer millions of pictures teach it anything?”

I remember asking the same question years ago. I had a statistics background, understood regression, could build models. But the idea that a system could learn patterns without explicit programming took time to sink in.

Let me explain it the way I wish someone had explained it to me.

The Training Loop

Training an AI system is simpler than most people think.

You show it an example. It makes a guess. You tell it if the guess was right or wrong. It adjusts slightly. You repeat this millions of times.

That’s it. Show, guess, adjust, repeat

Take spam detection. You show the system an email. It guesses the email is spam or not spam. You tell it the correct answer. If it guessed wrong, you fine-tune it like tuning an FM radio to get the station you want. You’re adjusting the frequency until the signal comes through clear.

Each training example is another turn of the dial. A million examples means a million tiny adjustments. Eventually, the system locks onto the right frequency. Finally, it starts recognizing patterns accurately.

Engineers call this “supervised learning.” I call it industrial-scale pattern tuning.

How to find the Right Frequency

Think of it like this.

When you tune a radio, you’re searching through noise to find a clear signal. Too far left, static. Too far right, static. You keep adjusting until the music comes through.

AI training works the same way. The system starts in noise with random guesses, no real pattern recognition. Each training example helps it tune toward the signal. “Wrong answer? oh oh, adjust left. Still wrong, it’s ok, adjust right. Closer. Closer. There…. We Gooo!”

After millions of these micro-adjustments, the system finds the frequency where patterns become clear. It can now distinguish spam from legitimate email, fraud from normal transactions, disease from healthy tissue.

The thing is, nobody programs these frequencies directly. The system finds them through repetition. Show enough examples, give enough feedback, and it tunes itself.

This is powerful. It’s also why we can’t always explain exactly what the system learned. It found a frequency that works, but describing that frequency in human terms is harder than it sounds. And it feeds heavily on data to be able to do it.

Why It Needs So Much Data

S asked the follow-up: “Why millions of examples? Can’t it learn faster?”

The world is messier than training sets assume.

Most dogs are easy to recognize. But what about a chihuahua that looks like a large rat? A mop that resembles a sheepdog? A wolf that could pass for a husky? A blurry photo taken at night?

If your system only saw a hundred dogs during training, it never encountered these edge cases. It tuned itself to recognize “typical” dogs and goes static on anything unusual.

Volume matters because edge cases matter. More data means more weird situations to tune against. Fewer surprises when the system hits the real world.

Ok, let’s talk about what you’ve experienced this yourself, even if you didn’t know why:

Your phone’s autocorrect keeps “fixing” your name or a word you use often to something wrong. The system was tuned on data that didn’t include your name or your industry’s jargon.
You travel for work, use your credit card in a new city, and it gets blocked for “suspicious activity.” The fraud detection system wasn’t tuned for your edge case. So, a sudden location change by a legitimate user.
Your voice assistant works fine for your colleague but constantly mishears you. It was tuned on voices that sound like your colleague. Your accent, speech pattern, or tone wasn’t well-represented in the training data.

These aren’t bugs. The systems are working exactly as tuned. They just weren’t tuned on enough edge cases to handle you in that situation.

I’ve seen this repeatedly. A system works perfectly in testing, then falls apart in production because the training data was too clean, too narrow, too optimistic about what reality actually looks like. The test environment was a quiet studio recording. Reality is a live concert with feedback and crowd noise.

Subscribe for one weekly, practical, jargon-free breakdown of how AI really works

Subscribe for Weekly AI Insights

Here’s the Secret

AI doesn’t “understand” anything.

When a system recognizes a cat, it’s not thinking “whiskers, pointy ears, therefore cat.” It found a frequency where certain pixel patterns correlate with the label “cat.” The ‘why’ doesn’t exist for the system. Only the correlation does.

This is both the power and the danger.

The power: AI can tune into patterns humans would never detect. It can process more examples than any person could review in a lifetime. It operates continuously without fatigue.

The danger: AI has no common sense. It doesn’t know that a drawing of a cat isn’t a real cat. It doesn’t understand context. It just matches patterns, including patterns that shouldn’t matter.

I’ve sat in meetings where executives said “our AI understands customer intent.” No. It’s tuned to predict customer intent based on historical patterns. That’s different. Understanding implies reasoning, context, judgment. AI doesn’t reason. It correlates at the frequency it was trained on.

AI’s Strengths and Blind Spots

After years working with AI, I’ve developed a rough rule.

AI tunes well for:

Pattern recognition at scale (images, text, transactions)
Finding anomalies in large datasets
Predicting outcomes from historical data
Generating content similar to training examples

AI can’t tune for:

Reasoning about cause and effect
Common sense and context
Situations that never appeared in training
Explaining why it made a decision

There’s an observation from robotics researcher Hans Moravec: what’s easy for humans is hard for AI, and vice versa.

A five-year-old can walk across a messy room, recognize grandma from any angle, and hold a conversation. These “simple” tasks are incredibly hard for AI and no amount of tuning gets you common sense.

But that same five-year-old can’t analyze a million transactions for fraud or review every medical study published in the last decade. AI tunes for that easily. Humans can’t.

AI is a tool with specific strengths. Expecting it to replace human judgment misunderstands what tuning actually achieves.

Join a growing community of leaders, founders, and practitioners learning how to build AI systems responsibly.

Join the Community

What This Means for Governance

Now, let’s connect the dots:

AI learns by tuning itself to patterns in data
It doesn’t understand those patterns. It replicates them
Whatever patterns exist in your data become the frequency your AI broadcasts on

What if those patterns include bias? What if the data is outdated? What if the data is incomplete? or is unrepresentative of the people you’re serving?

The AI doesn’t know. It can’t know. It tunes to whatever you fed it and treats those patterns as the correct signal.

This is why AI governance doesn’t start with the model. It starts with data.

A perfectly tuned system trained on the wrong data broadcasts the wrong signal very confidently, at scale, with no awareness that anything is off. Does this mean the system is broken? No, the system isn’t broken. It’s tuned exactly to what it was given.

If you’re responsible for AI, Data, Risk, or Compliance-get my weekly governance insights.

Get AI Governance Insights

The Questions You Should Be Asking

Before you trust any AI system, ask:

What data was it tuned on?
How old is that data?
Who’s represented in the training set and who’s missing?
Has it been tested on data it never saw during training?
What happens when it encounters something completely new?
What are the known failure modes?

These aren’t technical questions. They’re governance questions. And in my experience auditing systems across EdTech, HealthTech, FinTech, and Insurance, most organizations can’t answer them.

They deploy the system. They trust the vendor. They hope for the best.

Hope is not a governance strategy.

Next Week

Now that you understand how AI learns, and how it tunes itself to patterns in data, we’re ready to look at what happens when that data is wrong.

Next week: The Silent Architecture of Trust, why data quality isn’t an IT problem, it’s the foundation of whether your AI works at all. We’ll look at what bad data actually looks like, how small errors compound into system-wide failures, and what governance means in practice.

Your AI is only as good as the signal it was tuned on. Next week, we examine that signal.

Knowledge spreads trust, your share helps more teams build responsible, well-governed AI.

Suneeta Modekurty
Founder, A.I.N.S.T.E.I.N.

What Actually Is AI? (And Why Your Company Needs to Understand It)

A.I.N.S.T.E.I.N. — Tue, 02 Dec 2025 15:03:31 GMT

Let me tell you about a conversation that inspired this article.

Yesterday, a friend called me. She’s been working at a Fortune 500 financial services company for ten years. She is very smart, strategic, manages critical projects that affect thousands of customers. We were catching up when she said something that stopped me cold: She said, “Suneeta, I need to be honest with you. I don’t really understand what AI is. I use ChatGPT for emails, we have some AI thing in our CRM, and my boss keeps asking about ‘AI strategy.’ But if someone asked me to actually explain what’s happening... I couldn’t. And neither could my boss.”

She paused. “Is that bad?”

Here’s what I told her: No, it’s not bad. It’s normal. And she’s not alone.

Most executives I talk to are in the same position. They’re using AI tools daily. Their companies are making million-dollar decisions about AI systems. But they couldn’t explain what’s actually happening under the hood if their job depended on it.

And increasingly, their jobs might.

This isn’t a knowledge gap you can afford to ignore anymore. Not because AI is replacing your job tomorrow (it’s not). But because your company is already deploying AI systems that make important decisions about customers, about employees, about money, about risk. And if you don’t understand what AI actually is, you can’t ask the right questions about whether those systems are working properly.

So let’s fix that right now. No, we are using no jargon, nor any math. Just clear answers to the questions you’ve been afraid to ask.

What AI Actually Is (The Simple Truth)

Here’s the definition that matters:

AI is software that learns patterns from examples, rather than following explicit rules that we write.

That’s it. That’s the core difference between AI and every other type of software you’ve ever used.

Let me show you what I mean.

Traditional Software: Rules We Write

Traditional software does exactly what we tell it to do, step by step. A programmer writes explicit instructions:

“If the email subject line contains the word ‘invoice,’ move it to the Accounting folder.”

“If the transaction amount exceeds $10,000, flag it for review.”

“If the customer’s zip code starts with ‘902,’ calculate shipping as $15.99.”

These are rules. Clear, explicit, unambiguous. The software doesn’t think. It doesn’t learn. It doesn’t adapt. It just follows our instructions, perfectly, every single time.

This works beautifully for tasks where the rules are known, stable, and can be written down explicitly. Calculating taxes. Processing payroll. Routing phone calls. Millions of business processes run this way, reliably, every day.

AI: Patterns It Discovers

AI works fundamentally differently. Instead of giving it rules, we give it examples. Lots of examples. Then we ask it to figure out the patterns.

“Here are 10,000 emails that humans labeled as ‘accounting-related.’ Figure out what makes something accounting-related.”

“Here are 50,000 transactions where fraud investigators marked which ones were fraudulent. Figure out what fraud looks like.”

“Here are 100,000 customer service conversations that humans rated as ‘resolved satisfactorily.’ Figure out what makes customers satisfied.”

The AI system analyzes these examples, identifies patterns. Some are obvious, some are subtle, some that no human would have thought to look for and learns rules from the data itself. Rules we didn’t write. Rules we might not even understand. Rules that work... until they don’t.

This is powerful. It lets us automate tasks that are too complex, too nuanced, or too context-dependent for anyone to write explicit rules. It’s why AI can recognize faces in photos, transcribe speech with accents, recommend products you didn’t know you wanted, and increasingly, make decisions that used to require human judgment.

A Clearer Analogy

Think about teaching a child to recognize dogs.

You don’t give them explicit rules: “Dogs have four legs, fur, a tail, and bark.” That’s the traditional software approach. It would fail immediately. What about three-legged dogs? Hairless dogs? Silent dogs? Dogs that look like wolves?

Instead, you show them dogs. Many dogs. Different breeds, sizes, colors. You point and say “dog.” Eventually, through exposure to examples, they learn what “dog-ness” is. They can recognize dogs they’ve never seen before, even unusual ones, because they’ve learned the pattern.

That’s how AI works. Show it enough examples of something, and it learns to recognize that thing even in situations it’s never encountered before.

But here’s the critical part that almost everyone misses: AI can only learn what’s in the examples you show it.

If you only show a child golden retrievers, they might not recognize a chihuahua as a dog. If you only show them dogs in daylight, they might struggle at night. If the examples you provide are limited, biased, or flawed, the patterns they learn will be limited, biased, or flawed.

This brings us to the most important concept in understanding AI.

Why Data Is the Food AI Eats (And Quality Determines Everything)

My friend asked me: “But if AI is so smart, can’t it figure out when the data is wrong?”

No. It can’t. And this is where most AI problems begin.

AI Has No Independent Knowledge

AI doesn’t “know” anything except what its training data taught it. It has no common sense. No life experience. No ability to say “wait, this seems wrong.”

If you train an AI on biased data, it learns the bias. If you train it on incomplete data, it learns incomplete patterns. If you train it on old data, it learns outdated rules. And it applies those patterns with perfect, unwavering confidence.

You are what you eat. AI is what it learns from.

This isn’t metaphorical, it’s literal. The quality of your AI system is determined almost entirely by the quality of its training data. Not by the sophistication of its algorithm. Not by how expensive it was. Not by the reputation of the vendor.

By. The. Data.

What Makes Data “Good”?

Good data has three essential characteristics:

1. Representative: It includes diverse examples that reflect the real world the AI will operate in.

If you’re building a medical diagnosis AI, your training data needs patients of different ages, races, genders, and geographies. If you only train on data from wealthy urban hospitals, it will fail in rural clinics. If you only train on data from healthy young adults, it will fail with elderly patients.

2. Accurate: The examples are labeled correctly and measured properly.

If humans mislabeled things during training marking non-spam as spam, categorizing inquiries incorrectly, applying inconsistent standards, the AI learns those errors as truth. Garbage in, garbage amplified out.

3. Contextual: It preserves the circumstances under which data was collected.

Data collected during a pandemic might not reflect normal behavior. Data from one region might not transfer to another. Data from 2020 might not predict 2025. Context matters, and when we strip it away, patterns become misleading.

What Makes Data “Bad”?

Bad data comes in predictable forms:

Biased: Systematically over-represents some groups and under-represents others. Your hiring AI trained on historical data where 90% of engineers were male? It just learned that “good engineer” correlates with “male.”

Incomplete: Missing critical information that humans need to make good decisions. Patient records without documented allergies. Credit applications without income verification. Resume databases without actual job performance data.

Outdated: Reflects how things used to work, not how they work now. Consumer behavior from 2019 doesn’t predict consumer behavior post-pandemic. Market dynamics from stable periods don’t predict crisis periods.

Inaccurate: Simply wrong. Typos. Measurement errors. System glitches. The decimal point that shifted. The sensor that drifted. The form field that accepted “N/A” as a zip code.

Each of these flaws gets encoded into the AI’s understanding of the world. And then it makes millions of decisions based on that flawed understanding.

When Bad Data Becomes Bad Decisions: Real Stories

Let me show you what this looks like in practice. These aren’t hypothetical scenarios. These are real companies that spent millions on AI systems, deployed them confidently, and watched them fail in ways that made headlines.

Amazon’s Hiring AI: When History Becomes Destiny

In 2018, Reuters reported that Amazon had been developing an AI system to automate resume screening.[1]
The goal was noble: reduce bias in hiring by removing human subjectivity. Let algorithms evaluate candidates objectively, based purely on qualifications.

The AI was trained on ten years of resumes submitted to Amazon. They were resumes of people who were hired, promoted, succeeded. The system learned patterns from this data: what successful Amazon employees looked like on paper.

There was just one problem: For most of that decade, Amazon’s technical workforce was predominantly male. Not because men were objectively better engineers, but because tech industry hiring had historically skewed male.

The AI learned this as a pattern. It observed that in the historical data, “successful technical employee” strongly correlated with “male.” So it began penalizing resumes that indicated female gender. It downgraded graduates of women’s colleges. It downgraded resumes containing the word “women’s” as in “women’s chess club captain.”

The algorithm wasn’t broken. It was doing exactly what it was trained to do: find patterns in historical hiring data and apply them. The problem was that historical hiring patterns included bias, and the AI faithfully learned and amplified it.

Amazon shut the system down. But not before it revealed an uncomfortable truth:

AI doesn’t eliminate human bias. It automates it. At scale. With mathematical precision.

Apple Card: When Algorithms Can’t Explain Themselves

In 2019, a tech entrepreneur named David Heinemeier Hansson tweeted that Apple Card had given him 20 times the credit limit his wife received, despite her having a higher credit score.[2]
Other couples reported similar experiences. The pattern was clear and troubling. Apple and Goldman Sachs (the bank behind Apple Card) insisted there was no gender discrimination in their algorithm. Their AI evaluated creditworthiness based on objective factors, they said. Not gender.

They were probably telling the truth. The algorithm likely didn’t use gender as an input variable at all.

But here’s what happens with AI: Even when you don’t directly use protected characteristics like gender or race, AI can learn to use proxy variables that correlate with those characteristics. Zip code. Shopping patterns. Transaction history. The AI finds patterns that happen to correlate with gender, even if it never sees the gender field.

The New York Department of Financial Services launched an investigation. The challenge wasn’t proving the algorithm was biased and the outcomes spoke for themselves. The challenge was getting anyone to explain why the algorithm made the decisions it did.

This revealed another uncomfortable truth: When AI makes decisions that seem wrong, often nobody can explain why. Not even the people who built it.

Healthcare AI: The Cost of Unrepresentative Data

A 2019 study published in Science revealed that a healthcare algorithm used by hospitals across the United States was systematically discriminating against Black patients.[3]
The algorithm helped doctors decide which patients needed extra medical care, affecting millions of people annually.

The problem? The algorithm used healthcare costs as a proxy for healthcare needs. It assumed that patients who cost the system more money were sicker and needed more care.

But Black patients, on average, had lower healthcare costs not because they were healthier, but because they had less access to care due to systemic barriers. The AI learned that “lower cost = healthier” and recommended less care for Black patients who were actually just as sick as white patients with higher costs.

The algorithm worked perfectly from a technical standpoint. It accurately predicted what it was trained to predict: costs. But costs weren’t the right thing to measure. The data reflected historical inequity, and the AI perpetuated it.

The Pattern Behind the Failures

Look closely at these stories and you’ll see the same structure:

Well-intentioned deployment: Nobody set out to build discriminatory systems
Training on historical data: AI learned from how things were done before
Successful pattern matching: The AI correctly identified patterns in that data
Problematic real-world outcomes: Those historical patterns included historical biases
Inability to explain or fix quickly: By the time problems surfaced, the AI was already deployed

This pattern repeats across industries. Predictive policing tools that over-target certain neighborhoods (trained on biased arrest patterns). Loan approval systems that reject qualified applicants from certain zip codes (trained on discriminatory lending history). And countless other cases that never make headlines.

The AI worked perfectly. The data was the problem.

Why This Matters to Your Business Right Now

After I shared the Amazon and Apple Card stories with my friend, she said: “Okay, but we’re not building hiring algorithms or credit systems. We’re just a regular company.”

Then I asked her about their customer service chatbot. Their fraud detection system. Their inventory forecasting tool.

“Oh,” she said. “Those count as AI?”

Yes. They do.

You’re Already Using AI (Whether You Realize It or not..)

Most organizations can’t produce a complete list of the AI systems they’re currently using.

Let me help you find them:

In your customer service: That chatbot on your website? AI. The email routing system that decides which department handles which inquiry? Probably AI. The “recommended articles” in your knowledge base? AI.

In your finance department: Fraud detection systems? AI. Expense report anomaly detection? AI. Cash flow forecasting tools? Increasingly AI.

In your HR systems: Resume screening tools? AI. Interview scheduling systems that “optimize” candidate selection? AI. Performance prediction models? AI.

In your marketing: Email subject line optimization? AI. Ad targeting? AI. Product recommendations? AI. Dynamic pricing? AI.

In your operations: Inventory forecasting? AI. Logistics optimization? AI. Predictive maintenance alerts? AI.

These systems are making decisions. Some small, some significant. Some are just recommending options to humans. Others are operating autonomously, making hundreds or thousands of decisions daily.

And here’s what should concern you: Most companies don’t know what data these systems were trained on, whether that data was any good, or whether the systems are still working as intended.

The Risks Are Growing

Three forces are converging to make AI governance urgent:

1. Regulatory Pressure: The EU AI Act is now in force, with penalties up to €35 million or 7% of global revenue for violations.[4]
US states are passing their own AI laws. You will be required to explain how your AI systems make decisions and prove they’re not discriminatory.

2. Legal Liability: Companies are being sued for discriminatory AI decisions in hiring, lending, insurance, and housing. Courts are no longer accepting “the algorithm did it” as a defense.

3. Reputational Risk: In an age of social media, an AI failure can become a PR crisis in hours. Amazon, Apple Card, and others learned this the expensive way.

But Also: The Opportunity

Here’s what most companies miss: Good AI governance isn’t just risk mitigation. It’s competitive advantage.

Companies that can prove their AI systems are trustworthy with evidence and not just promises are winning contracts. They’re getting better insurance rates. They’re attracting customers who don’t trust competitors. They’re raising capital more easily because investors can see quantified risk management.

The question isn’t whether to govern AI. It’s whether you’ll do it before your competitors do.

Questions You Should Ask Your Team This Week

You don’t need to become a data scientist to govern AI effectively. But you do need to start asking better questions. Here are five that every executive should pose to their teams:

1. “What AI systems are we currently using?”

Not “do we use AI” because you do. The question is where. Ask for a complete inventory: every tool, system, or platform that uses machine learning, algorithms, or automation to make decisions. Most companies are shocked by how long this list becomes.

2. “What decisions do these systems make autonomously vs. recommend to humans?”

There’s a difference between “AI suggests, human decides” and “AI decides, human rubber-stamps.” Know which is which. The autonomous ones deserve much more scrutiny.

3. “What data were they trained on, and when was that data collected?”

If the answer is “we don’t know” or “the vendor won’t say”, that’s a red flag. You’re making decisions based on patterns learned from data you can’t verify. If the data is more than 2-3 years old, the patterns might be obsolete.

4. “Who’s monitoring whether these systems still work correctly?”

AI doesn’t break like traditional software. It drifts. Performance degrades gradually as the world changes. If nobody’s actively monitoring accuracy, fairness, and reliability, you’re flying blind.

5. “What happens when these systems are wrong, and who’s accountable?”

Have you actually defined what “wrong” means for each system? What’s your process when someone disputes an AI decision? Who reviews? Who has authority to override? If these answers aren’t documented, you don’t have governance.

These questions won’t give you complete answers immediately. That’s fine. The goal is to start the conversation, to make AI systems visible and accountable rather than invisible and assumed.

What’s Coming in This Series

This is the first of many conversations we’re going to have about AI, data, and governance. Over the coming weeks, I’ll break down everything you need to understand. You may not become a technical expert, but you need to be an informed decision-maker.

Next week, we’ll go deeper into how AI actually learns, the training process, why it needs so much data, and what happens inside these systems when they’re “learning patterns.” You’ll understand why some AI tasks are easy and others remain impossibly hard.

In the weeks that follow, we’ll explore:

Why data quality determines everything (and how to measure it)
Real stories of AI failures and what they teach us
The AI systems already running in your business (and how to find them)
What “AI governance” actually means in practice
Why traditional compliance checklists don’t work
How leading companies are measuring AI trustworthiness quantitatively

This isn’t about fear-mongering. AI is neither savior nor threat. It’s a tool. An enormously powerful tool. One that can amplify both human wisdom and human error at unprecedented scale.

The question isn’t whether to use AI. That ship has sailed. The question is whether you’ll use it responsibly, whether you’ll understand it well enough to govern it effectively, and whether you’ll build systems your stakeholders can actually trust.

My friend asked if not understanding AI was bad. I said no, at least not yet. But it becomes a problem when you’re responsible for systems you can’t explain. When your company is using AI to make decisions about people’s jobs, money, or opportunities, and you can’t answer basic questions about how those decisions get made.

That’s a choice you can no longer afford to make.

Don’t miss next week’s deep dive into how AI actually learns.

Know a colleague making AI decisions without asking these questions? Share this with them before they learn the hard way.

Share A.I.N.S.T.E.I.N

References

Dastin, Jeffrey. “Amazon scraps secret AI recruiting tool that showed bias against women.” Reuters, October 10, 2018. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G ↩
Telford, Taylor. “Apple Card algorithm sparks gender bias allegations against Goldman Sachs.” The Washington Post, November 11, 2019. https://www.washingtonpost.com/business/2019/11/11/apple-card-algorithm-sparks-gender-bias-allegations-against-goldman-sachs/ ↩
Obermeyer, Ziad, et al. “Dissecting racial bias in an algorithm used to manage the health of populations.” Science, Vol. 366, Issue 6464, October 25, 2019, pp. 447-453. https://www.science.org/doi/10.1126/science.aax2342 ↩
European Commission. “Artificial Intelligence Act: Council gives final green light to the first worldwide rules on Artificial Intelligence.” Press release, May 21, 2024. https://www.consilium.europa.eu/en/press/press-releases/2024/05/21/artificial-intelligence-ai-act-council-gives-final-green-light-to-the-first-worldwide-rules-on-ai/ ↩

About the Author

Suneeta Modekurty is an ISO/IEC 42001 AIMS Lead Auditor and O-1A visa holder for extraordinary ability in data science. She has 25 years of experience building AI systems across EdTech, HealthTech, FinTech, and Insurance. Connect on LinkedIn.