Zillow Offers didn’t fail because its AI was bad. It failed because its AI was good enough that nobody thought they needed a human anymore.
It was a Monday morning in February 2021.
Zillow’s senior executives gathered to announce a new direction. The Zestimate — their AI valuation model — was accurate enough, they decided, to make it the offer price. Accurate enough to buy homes at scale. Accurate enough to remove the friction of human judgment from the loop entirely.
Nine months later the company had taken losses exceeding $500 million and laid off a quarter of its workforce.
The AI didn’t fail. The decision did.
So what happened?
Zillow is not a reckless company. It isn’t run by people who didn’t understand AI. It is one of the most sophisticated technology companies in real estate, with years of investment in its valuation model and genuine confidence in what it had built.
But that confidence was the problem.
The Zestimate worked well in stable markets with high transaction volume and comparable properties. It was accurate enough under the conditions it was trained on. What it couldn’t do was reason about what happened when those conditions changed.
When the pandemic housing market began to cool, the algorithm didn’t know what it didn’t know. It kept buying. It kept pricing. It had no way to model the difference between a market it had been trained on and a market that was shifting beneath it.
Nobody was asking the questions the model couldn’t ask itself: what happens to prices if demand reverses? What happens if sellers start gaming the offer price? What happens if the conditions that made this model accurate stop being the conditions that exist?
There was nobody in the room with the authority to ask those questions on the algorithm’s behalf.
The three layers Zillow had and the one it didn’t
Think of any AI deployment at consequential scale as having three layers.
The first is the sensing layer. The data. In Zillow’s case, transaction records, property attributes, market feeds, comparable sales data. Zillow’s sensing layer is amazing. Years of investment, proprietary data, genuine competitive advantage.
The second is the reasoning layer. Not just what the data shows, but what it means, and what happens next under different conditions. The layer that asks: if this market shifts, what does the model stop being able to see? This is causal reasoning — and in 2021 it didn’t exist at the scale Zillow needed. Nobody had it. That’s not a Zillow failure. It’s where the technology was.
The third is the human decision layer. The architecture for what a human actually does with the output. Who has the authority to challenge the model. On what grounds. With what protection. Within what timeframe.
Zillow had the first layer. The second didn’t exist at that time. But the third – the human decision layer – was available to design. And Zillow chose not to.
That is the critical detail. Zillow didn’t just fail to design the human layer. It made a deliberate decision to remove human judgment from the loop. The Zestimate became the offer. The algorithm triggered the purchase. The human was not in the room.
That is not a technology failure. That is a design failure.
The question nobody asked before the build started
Every organisation deploying AI at consequential decision points will tell you the same thing: we have confidence in our model. We have tested it. We have validated it. We know its accuracy.
What almost nobody asks is: what does the model not know? What are the conditions under which it stops being accurate? And when those conditions arrive, because we know they always do, who in this organisation has the authority, the tools, and the organisational standing to say stop?
In Zillow’s case, the answer was nobody. The model had been trusted so completely that the human override had been engineered out of the system.
Not by accident. By design.
Rich Barton, Zillow’s CEO, said it himself when the shutdown was announced: “The unpredictability in forecasting home prices far exceeds what we anticipated.”
That unpredictability was always there. The model just couldn’t see it. And there was no human layer designed to catch what the model missed.
This isn’t just a Zillow story
Zillow is not an outlier. It is a preview.
The same design failure is being replicated across industries right now, at speed, by sophisticated organisations that have invested heavily in AI and are confident in what they’ve built. Healthcare systems automating coverage decisions. Logistics platforms routing supply chains without human checkpoints. Financial institutions running credit models that nobody is authorised to override.
In each case the story is the same. The sensing layer is built. The AI is running. The human is present … somewhere, in some org chart, with some nominal oversight function. But the interface between machine output and human authority – who can challenge it, on what grounds, with what protection, within what timeframe – has been left to chance.
Or worse, designed out entirely.
The UnitedHealth Medicare reviewer who saw something the AI missed and was fired for acting on it. The ERCOT grid operator with four minutes and thirty-seven seconds and no system designed to help him use his judgment. The insurance adjuster in a client meeting where the room went quiet because nobody had designed the answer to a question that was always going to be asked.
These are not edge cases. They are the same failure, in different industries, at different scales.
What designing the human layer actually requires
The primary locus of control is not at the moment of decision. It is at the moment of design.
That line comes from a piece on AI in military decision-making, but it belongs in every boardroom where an AI deployment is being planned.
Designing the human layer means asking different questions before the build starts. Not just: how accurate is the model? But: what does the model not know, and how will we know when it stops knowing it?
Not just: is there a human in the loop? But: what is that human actually authorised to do when the model is wrong?
Who is this decision for. What does that human need to exercise genuine judgment. What does it mean to challenge the AI’s recommendation, and what protection exists for doing so. How do you make uncertainty visible rather than burying it in a confident output. How do you know the human is adding judgment rather than just absorbing liability.
These are not engineering questions. They are not compliance questions. They are design questions, and they belong at the beginning of every AI deployment, not at the end when the write-downs arrive.
Zillow had the data. It had the model. It had years of investment in a sensing layer that was incredibly impressive.
What it didn’t have was a system designed to know what it didn’t know, and a human layer with the authority to act on that gap.
That is the problem available to solve right now. Not when the market shifts. Not when the lawsuit arrives. Before the build starts.

Has your organisation designed the human layer — or just assumed it’s there?
If you’re not sure, that’s usually your answer: mattsheehan@spatialnext.io
Matt Sheehan
Matt is a geographer and AI strategist with 25 years at the intersection of geospatial intelligence and decision-making. He maps the architecture connecting three layers most organisations haven’t yet seen together: the sensing layer the geospatial industry has built, the causal reasoning layer now arriving, and the human decision layer nobody is designing. The third layer is where most deployments fail. And where Matt has his primary focus.


