Identity is an inference problem

Big Idea:

Identity Is an Inference Problem (Not a Data Problem)

We live in an age where almost everything about us gets reduced to numbers. Identity, trust, compliance, intent, all of it eventually becomes vectors, probabilities, and mathematical representations. At the lowest level, computers still only understand bits and bytes. Zeros and ones. That hasn’t changed.

What has changed is how convincingly those numbers are turned back into something that feels human. With generative AI and large language models, we now interact with systems that speak in natural language, respond with nuance, and seem to “understand” us. But under the surface, nothing mystical is happening. Those fluent responses are still the result of mathematical transformations over numbers.

Even systems we’ve used for years work this way. When your phone unlocks with your face, it’s not recognizing you as a person. It’s comparing a newly captured scan to a stored mathematical representation and calculating how similar they are. If the similarity crosses a threshold, the system says “yes.” If not, it says “try again.” That’s not certainty. That’s estimation.

This is where the idea of inference really starts to matter.

Inference, in plain terms, is what happens when you try to figure something out that you can’t directly observe. You don’t have facts. You have clues. You have evidence. You have patterns. And from those, you form a belief about what is likely true.

If that word sounds academic, let’s nudge it into plain English. To infer simply means to make an educated guess based on evidence. It’s something we humans do every single day without even trying.

Imagine you’re at a coffee shop and you see someone from behind. They have the same haircut as your friend Dave. They’re wearing that same vintage jacket Dave loves. You infer that it’s Dave. But are you 100% sure? No. You’re maybe 90% sure. You approach with caution until he turns around. That gap between “I think it’s Dave” and “It is definitely Dave” is the inference. It’s probabilistic.

In another scenario, you hear a strange noise at night and infer that someone might be home. You see dark clouds and infer that it might rain. You sense hesitation in a conversation and infer that something is being left unsaid. None of these are facts. They’re judgments under uncertainty.

AI systems work the same way.

When a model generates a response, predicts an outcome, flags a transaction, or recommends a product, it isn’t retrieving truth from a database. It’s estimating the most likely result given what it has seen before. That’s why so much of machine learning revolves around probabilities, confidence scores, weights, and thresholds. These systems don’t know. They guess, sometimes very well, and they guess with varying certainty.

This becomes especially clear when you look at identity in the digital world.

For a long time, we pretended identity was simple. An email address represented a person. An account ID represented a user. A business name represented a company. That worked when systems were small, formal, and tightly controlled. But the real world isn’t like that anymore, if it ever was.

One person can have multiple phone numbers. Multiple people can share the same device. A business might operate under several names across different platforms. Payment methods change. Cookies disappear. Signals fragment. Context shifts over time.

So when a system asks, “Is this the same person?” or “Is this the same business?” there is no single field to check. There is no authoritative row in a table that provides the answer. There is only evidence. And evidence is incomplete, noisy, and sometimes contradictory.

At that point, identity stops being a data problem and becomes an inference problem.

Fraud makes this painfully obvious.

There is no such thing as a transaction that arrives already labeled as fraudulent. What exists are behaviors, patterns, relationships, timing anomalies, historical context. From those signals, systems, and the humans operating them, try to answer a much harder question: “Given this evidence, how likely is malicious intent?”

Intent itself is invisible. You can’t store it. You can’t log it. You can only infer it from how behavior deviates from expectation, and even then, you’re never fully certain. That’s why fraud systems constantly balance false positives and false negatives. That’s why human review still exists. And that’s why rules-based certainty breaks down the moment behavior adapts.

What’s interesting is that many systems still pretend this uncertainty doesn’t exist. They force hard decisions because schemas demand them. A record must merge or not merge. A transaction must be blocked or allowed. A classification must be chosen, even when nothing fits well.

That forced certainty is where things quietly go wrong. Innocent users get flagged. Legitimate businesses get entangled in risk models they don’t understand. Bad actors slip through because their behavior lives just below arbitrary thresholds. Trust erodes, not because systems are wrong, but because they refuse to admit what they don’t know.

Once you start seeing identity as inference, a lot of things click into place. You understand why entity resolution is hard. Why fraud systems are brittle. Why accuracy metrics don’t tell the full story. Why explainability isn’t a “nice to have,” but a requirement. Why trust isn’t a property of a model, but of an entire system.

As signals become weaker, with more privacy, less tracking, more fragmentation, we rely increasingly on non-linear evidence. We don’t observe truth directly anymore. We reconstruct it. We estimate it. We revise it over time.

And maybe that’s the real shift happening underneath all the AI progress.

We aren’t building systems that know the world.

We’re building systems that reason about it under uncertainty.

Identity isn’t something those systems store. It’s something they infer, imperfectly, probabilistically, and always subject to change.

Once we accept that, we can stop chasing certainty; and start designing for trust instead.

Kudos to the work that (Senzing)[https://senzing.com/] is doing on this front.

Remaining ThoughtFul in AI Acceleration

Identity My New Problem Space

Search is Changing (Again)

Identity Is An Inference Problem

Enjoyed this post?

You may also like

Remaining ThoughtFul in AI Acceleration

Identity My New Problem Space

Search is Changing (Again)