Coding got easier. Winning with AI products didn't. Here's why.
"Coding is largely solved". But that didn't simplify AI product-building. It just moved the difficulty somewhere harder to see.
Boris Cherny, the person running Claude Code at Anthropic, said something recently that stopped me in my tracks:
“I think at this point it’s safe to say that coding is largely solved. At least for the kind of programming that I do, it’s just a solved problem because Claude can do it. And so now we’re starting to think about what’s next, what’s beyond this.”
He wasn’t hedging. He was describing his actual reality: Claude writes the code, reviews it, files bug reports, and decides what to ship next. Everyone on his team codes, too. Even the finance guy.
By the end of this year, he thinks the title “software engineer” might disappear entirely and get replaced by “builder.”
I know we’ve been digging graveyards for many a role lately, but if his claim is even directionally right (and the pace of things suggests it is), it surfaces a question that’s been buried under the AI hype for two years.
When building is no longer the constraint, what actually is?
Signull, one of the sharper product voices on X, shared this tweet on the topic:
It’s funny how he thinks the need isn’t for a product manager. Maybe. But the role being described is pretty much in line with the JD that so many CPOs and Product Leads have been applying to in the past decade - a role that fuses product vision with design sense and deep domain knowledge.
We can keep debating what to call it. The point stands. The product builder/thinker role that everyone wrote off last year might be the most important one going forward. Haven’t we come full circle?
But I digress. Note how both Boris and Signull seem to agree that the next frontier is “deciding what to build”.
Um. I don’t think so.
Here’s where this assessment falls short.
“Knowing what to build” is already being co-piloted.
Top PMs are leveraging AI to absorb the firehose of context and make sense of sales call recordings, Granola notes, Slack threads, support tickets, JIRA backlogs, feedback forms.
AI is now drafting experiments to test, flagging patterns worth investigating, and generating hypotheses faster than any planning cycle ever could. The human might still makes the call but will soon will be reduced to a mere reviewer. That’s pretty much what all the AI PM courses are selling these days.
Yes. “What to build” will require human nuance but eventually will become another AI-assisted workflow.
The obstacles sit downstream, mostly with what happens AFTER you ship.
Think about this: A 2025 study from MIT’s NANDA initiative, based on 150 interviews with senior leaders and analysis of 300 public AI deployments, found that 95% of enterprise AI pilots fail to deliver any measurable return.
It’s because of a gap between “working AI feature” and “AI product that people trust, pay for, and keep using”. It’s full of obstacles that no amount of product judgment, human or AI-assisted, clears automatically.
I foresee headwinds for AI products in the coming year or so. I’ve identified 7 of these for now (certainly not exhaustive). They exist in traditional SaaS too, but I feel AI teams hit them faster, harder, and with much less warning.
In this edition, we’ll cover these structural headwinds that catch AI product teams off guard and propose fixes for each, from real companies figuring this out.
Let’s get into it.
Are AI products already feeling these “headwinds” I speak of?
Let’s talk about Jasper, an AI product tool that predated ChatGPT.
In 2022, it crossed $75M ARR in under two years, the fastest-growing AI writing product anyone had seen. Then ChatGPT launched, and the underlying capability Jasper was built on became freely available to anyone with a browser.
That story felt like a one-time shock at the time. It wasn’t. Tome followed a different path to the same destination.
Tome was an AI presentation tool that attracted a whopping 20 million users in 18 months and raised $81 million on genuine excitement. But most of those users never paid, and when Microsoft and Google embedded AI directly into PowerPoint and Slides, the core use case disappeared.
By 2024, Tome had abandoned its original product entirely.
What about Pi, the personal AI assistant from Inflection which was arguably the most thoughtfully designed of the three?
It was warm, conversational, and genuinely liked by people who used it. But as ChatGPT, Claude, and Gemini matured, buyers couldn’t articulate why they’d choose Pi over tools they were already paying for. The product didn’t deteriorate. The market around it simply rendered it redundant.
This is the structural reality of building on AI right now.
Each time a foundation model broadens its capabilities, the products built narrowly on top of it lose their footing, and the next generation of products is already making the same bets without knowing it.
Let’s break it down:
Headwind 1: Product teams are shipping more, but customers don’t have “more” attention.
The average enterprise now runs over 100+ SaaS apps. Employees spend a meaningful chunk of their week just reorienting between them. And the average organization is managing hundreds of SaaS renewals every year.
Your customers are not waiting for your update. They’re already overwhelmed.
This means every new release you ship is competing for attention against a buyer who’s exhausted from managing the tools they already have. “We shipped something new” is not a compelling reason for them to stop what they’re doing.
The fix isn’t to ship less (although I feel pacing down still has it’s merits). The real solution is to be more disciplined about what you actually launch.
Some thoughts around this:
→ Build a tiered launch framework.
Tier 1 (new product line, category bet) gets campaigns, PR, and exec involvement.
Tier 2 (meaningful integration or workflow change) gets a targeted push and a customer webinar.
Tier 3 (incremental update) gets a changelog entry and an in-app message.
Tier 4 ships silently. Potloc, a B2B research platform, implemented exactly this. Their Tier 4 updates don’t surface in-app at all, and the result was less noise and more focused energy on what genuinely mattered.
→ Set a quarterly launch committee to decide what crosses each threshold. Don’t let teams self-classify their own releases upward.
→ Announcement effectiveness ≠ adoption. Track these separately. A high-effort campaign that generates no engagement is signal about your communication, not your product.
The principle is simple: announce less often than you ship.
Save your marketing energy for the things that genuinely move the needle. A release and a launch are not the same thing.
Headwind 2: Margin erosion
Traditional SaaS runs at 75–85% gross margins. Fast-scaling AI SaaS startups are often landing in the 25–40% range, per Bessemer Venture Partners.
And 84% of companies report at least 6% gross margin erosion from AI infrastructure costs as per Mavvrik’s 2025 State of AI Cost Management study.
Every LLM call in your critical path is a variable cost that scales with usage. When you have 100 customers using an AI feature, the cost feels manageable.
But when you have 10,000 customers, each generating 10 inferences a day, you’re potentially burning through tens of thousands of dollars a week in tokens alone. Most CFOs don’t see it coming because it wasn’t in the original budget model.
Replit is the clearest illustration of how fast this can spiral.
When they rolled out AI coding agents, gross margins swung wildly with usage spikes, ranging from the mid-30s to briefly negative territory in 2024 during a surge in AI usage, before pricing changes were implemented. They eventually restructured toward hybrid pricing that tied revenue to compute usage rather than fixed seats. The business recovered but only after a painful lesson in unit economics.
We hit a version of this at vFairs as well. When we first built an AI content generator for event websites, we launched it as a separate module. Uptake was super low and we had to remind customers about it’s utility.
Eventually, we baked the AI controls directly into the event site builder. Uptake jumped to > 50% of exposed sessions. That felt like a win.
And then the AI bill went up 5x in a short window.
The adoption flywheel spun, but the cost structure hadn’t been updated to reflect it.
We had to get more intentional fast about which models were being used, for which tasks, and whether each use case actually justified the cost. Switching to lighter models for certain tasks dropped token burn significantly without meaningfully affecting the output quality users experienced.
Some thoughts:
→ Match model to task. Not every use case needs the most capable or expensive model. Use lightweight models for routing, classification, and simple formatting. Reserve frontier models for outputs users will directly evaluate and act on.
→ Bake unit economics into your launch plan before you ship, not after. Run a cost-per-1,000-queries estimate at 10x your current usage before any AI feature goes GA.
→ Cache repeated queries and batch similar requests wherever possible. Two identical queries shouldn’t each incur a full inference cost.
→ Monitor cost-per-user and cost-per-request in real time and get someone to monitor it. I’d even create a n8n trigger to highlight when costs go beyond a threshold.
As Jacob Jackson, ML leader at Cursor, put it:
“The right way to price is relative to the value being delivered.”
Headwind 3: Uncertainty in Monetization
Most SaaS companies have added AI features. Far fewer have successfully monetized them. OpenView Partners’ benchmarks suggest only around 1 in 7 has made it work. [1]
That gap isn’t because the features were bad. It’s because most teams treated “we added AI” as the pricing story. And that stopped working the moment every competitor could say the same thing.
The teams getting this right are moving toward outcome-based and hybrid models.
Kyle Poyar has an amazing graphic that shows the various models that exist in the market today (credit: Kyle Poyar’s Growth Unhinged):
Some ideas to tackle this:
→ Define your value metric before you name your price.
No, “we added AI” is not a value metric. Resolved queries, hours saved, revenue influenced - those are better ones. e.g. Zendesk charges $1.50 per successfully resolved interaction. Intercom’s Fin charges $0.99 per resolved query. Both reduce adoption risk: the customer only pays when the AI actually works.
→ Run willingness-to-pay conversations during product development, not at launch. Five customer calls asking “what would you pay for this outcome?” will save you three pricing architecture overhauls later.
→ Consider a hybrid model (this is the one we were considering): a base subscription covering access, plus a usage or outcome-based component that scales with value delivered. This reduces adoption friction while protecting unit economics.
Also, I’d run a pricing post-mortem after your first renewal cycle. The real check of customer enthusiasm is on repeated payments. If renewal rates are below plan, pricing is almost always one of the first places to look.
There’s no shame in iterating by the way.
Ex: Salesforce launched Agentforce at $2 per conversation in September 2024. By May 2025 they’d moved to $0.10 per action. By summer 2025 they’d added per-user licenses for unlimited usage. Three fundamentally different pricing architectures in eight months.
The lesson isn’t that Salesforce failed. It’s that even the most sophisticated GTM machine in enterprise software struggled to price AI correctly under real market conditions. Prepare for it.
Headwind 4: Messaging is a sea of sameness
Have you noticed how the home page of many AI products start sounding the same?
This was an issue before too but I feel some companies are taking this pursuit of “AI-forwardness” too far.
A VP of Marketing at a 500-person SaaS company captured it nicely in Wynter’s 2025 B2B SaaS Branding Survey:
“There is a massive push towards becoming agentic companies. This is compounded by the fact that everyone is now using the same AI tools to generate copy. Everyone sounds like they do the same thing.”
I mean take a look at these headlines for copywriting tools:
In a market where the feature layer is converging, differentiation has to come from somewhere else.
Three moves from messaging experts:
Name what you replace, not what you do.
Anthony Pierri, who has reviewed 400+ SaaS homepages, argues that most AI products describe themselves when they should be describing the thing they displace. Ex: Loom’s most effective positioning was “Skip the meeting, send a Loom.” The competitive alternative is named in the headline. Buyers immediately understand what changes, and why.Find the claim your competitor structurally cannot make. I like how April Dunford frames this: “AI only differentiates when it amplifies an advantage competitors can’t replicate - proprietary data, workflow depth, years of domain-specific training.” Ex: Harvey AI doesn’t say “AI for lawyers.” They position around legal reasoning built specifically for complex queries that general-purpose models approximate badly. A competitor adding an LLM to their legal tool cannot make it.
Drop altitude on every AI claim. Emma Stratton expanded on this: “AI-powered” sits at the highest possible altitude it applies to everything and describes nothing. The fix is specificity: a named workflow, a named persona, a measurable outcome. This obsession with “AI-powered, speed, control, and measurable impact” fails this test.
Headwind 5: The trust gap widens fast
AI has a reliability problem.
On complex tasks, leading models still hallucinate in the 10–20% range. In legal research, hallucination rates for general-purpose LLMs run between 58–82%, a figure from Stanford’s own benchmarking work. [1]
I like Kushal Chakrabarti’s framing on this one:
…a model can be 97% accurate but only 70% reliable and the gap can bankrupt you.
The Air Canada case is worth knowing if you don’t already.
Their chatbot mistakenly promised a bereavement discount that didn’t exist. The customer applied for it, was denied, and sued. Air Canada argued the chatbot was a “separate legal entity” responsible for its own information.
The court didn’t accept that. Air Canada lost, and the ruling made clear that companies are responsible for what their AI tells customers.
The risk is so real that Lloyd’s of London launched dedicated AI chatbot error insurance in May 2025, covering legal fees, court damages, and costs from hallucinations causing customer harm. Imagine that. When the insurance industry starts pricing a risk, it’s no longer theoretical.
The fix is to treat governance as a GTM asset, not a legal formality:
→ Build an eval suite before you ship and run it on every major model or prompt change. Evals aren’t a QA step. They’re how you make defensible claims about reliability. [Hamel Husain’s guide to LLM evals is one of the most practical starting points: hamel.dev/blog/posts/evals]
→ Implement human-in-the-loop controls for high-stakes outputs. Anything customer-facing, contractually significant, or irreversible should have a review step until your reliability benchmarks consistently clear the bar.
→ Build audit trails and transparent data policies from the start. If a procurement team can see exactly how your system handles a bad output, they’re less likely to block the deal.
Headwind 6: The early traction trap
This is a trap a lot of AI product teams fall into.
Impressive initial engagement gets celebrated as product-market fit, when actually it’s just novelty. Users tried it once, found it interesting, and moved on.
62% of companies are stuck in what McKinsey calls “pilot purgatory.” [3] They’ve trialed AI. Usage just hasn’t deepened into real habit or measurable workflow change.
The vanity metric version of this looks like: “10,000 AI queries this month.” Sounds great. But did those queries change how anyone works? Did they come back? Did they tell someone else?
Lovable got this right. They went from an open-source project with 50,000 GitHub stars to $10M ARR in a few months. But that’s not to say it was a complete overnight success. They too incurred a few launch cycles before they got it right:
The core lever was simple: they focused on getting users to a tangible outcome (a working app) within minutes which is the real retention lever. That confidence turned curious users into advocates who shared publicly on LinkedIn and X. That’s what you call meaningful activation, as opposed to “toe-dips”.
Ideas borrowed from experts like Elena Verna:
→ Define activation depth before launch. Not “tried it” but “completed a meaningful outcome at least three times.” Set this as a product metric from day one.
→ Track time-to-value as a core metric. How long does it take a new user to reach their first meaningful outcome? Every hour you shave off that number compounds in retention.
→ Run activation interviews with users who churned after one or two sessions. The gap between first session and churn almost always traces to a specific moment of friction or unmet expectation — and it’s usually fixable.
Vanity metrics feel good but depth metrics tell you if you have something real.
Headwind 7: Sales can’t keep up with product’s pace
Product is shipping weekly. Sales, CS, legal, and support are not keeping pace.
When your GTM teams don’t fully understand what the AI does, or what it can’t do, they oversell, undersell, or get caught flat-footed on objections.
Sales and marketing misalignment is a well-documented problem in B2B and AI shipping cycles make it worse. Reps are already overwhelmed by the volume of content and context they’re expected to absorb. I’ve been guilty of this myself.
The result? Customers come back a quarter later saying they didn’t know feature X existed.
And this connects back to trust. The Air Canada situation was partly a cross-functional failure. When legal, CS, and product aren’t aligned on what the AI can commit to, gaps appear in customer-facing situations. That’s how a bereavement discount policy error becomes a court case.
We’re working through this ourselves at vFairs. We’ve been experimenting with Slackbots for our sales team, and the uptake has been genuinely promising.
We recently rolled out a pricing bot in beta, built around a specific, high-friction problem: sales reps spending too long crafting custom quotes, with inconsistent outputs. The goal is to test whether just-in-time AI enablement can solve that one friction point well, before we expand the scope. Early signals are encouraging.
Don’t give sales a general AI tool and hope they figure it out.
Some thoughts on how not to lose sales/CS as product teams ship faster:
→ Have internal launches before external ones. For significant releases, slow down and align with product marketing to get sales and CSMs on board. Conduct an internal webinar and have a readiness session. Bi-weekly PM office hours also helps.
→ Write a one-page release brief for every meaningful ship. They don’t need to know about every single configuration (they probably won’t need it either). Instead of 30 page PDFs: just brief them about what changed, who it’s for, how to demo it in 90 seconds, what objections to expect, where to get more detail.
→ We’ve established an internal-facing interactive demo and a test instance of our event platform which sales finds convenient to browse and learn from. We’re also exploring creating a library of micro-pitches for every product, solution, or significant use case. I feel sales teams learn quicker from other sales reps.
→ Move from periodic training to just-in-time enablement. We used Relay.app to start automating pre-demo research for our sales teams, summarizing everything from company funding to LinkedIn presence. Also, the Slackbots we rolled out help sales fetch data they need, rather than memorizing everything.
The common thread
Look across all 7 headwinds and one pattern becomes clear.
AI accelerates your ability to create supply: features, outputs, updates, content.
But the demand-side constraints like customer attention, trust, and budget scrutiny haven’t accelerated at the same pace.
Most teams optimise for the supply side i.e. shipping more features. But the leverage has shifted. The teams that will win will be the ones who sort out their launches, win consistent attention, and monetize for profits.
Recap
Attention is finite. Tier your launches. Save your energy for what genuinely moves the needle.
Margin erosion hits you at scale. Match model to task, monitor unit economics in real time, and bake pricing into your launch plan before you ship.
Price on value delivered. Outcome-based and hybrid models are outperforming flat AI tiers.
Sameness is a real risk. Your positioning and messaging are essential to get right to allow audiences to understand how you stand out.
Trust gaps cost real money. Invest in learning and creating an evals culture.
Early traction can lie. Define activation depth. Track meaningful engagement, not query counts.
Your GTM team is often the last to know about your shiny new feature. Just-in-time enablement beats periodic training.
Till next time,
Aatir
---
Sources
[Margin stat] Mavvrik, 2025 State of AI Cost Management. Survey of 372 enterprise organizations. mavvrik.ai
[1] OpenView Partners, 2023 SaaS Benchmarks Report. openviewpartners.com
[1] Stanford HAI — Dahl et al. (2024), “Free? Assessing the Reliability of Leading AI Legal Research Tools.” Published in Journal of Empirical Legal Studies. hai.stanford.edu
[2] McKinsey & Company, State of AI Survey (2025). mckinsey.com
[3] McKinsey & Company, State of AI Survey (2025). mckinsey.com









The 25-40% gross margin figure for AI SaaS versus 75-85% for traditional SaaS is the clearest articulation I've seen of why AI products are structurally different businesses — and it means most of the advice about building AI products is being written by the 1-in-7 survivor cohort who've somehow navigated this. The 58-82% hallucination rate in legal research is the kind of domain-specific failure data that makes general capability claims feel very far from deployment reality. I'm working through the founder side of this at theaifounder.substack.com, and what I keep coming back to is whether the margin compression is temporary (inference costs keep falling) or structural (human validation overhead doesn't). How do you think about the timeline for gross margin convergence, and does it happen before or after the current generation of AI product companies runs out of runway?
The margin gap is striking—25-40% for AI SaaS vs. 75-85% for traditional SaaS—but I'd separate it from the trust problem, because they have different solutions. Margin compression might reverse as inference costs fall, but the 95% pilot failure rate isn't an infrastructure problem; it's a product design and expectation-setting failure that cheaper compute doesn't fix. Of the 7 headwinds you identify, which do you think is the actual market-killer versus a tractable engineering problem that gets solved in 18 months—and does the answer change depending on whether you're building a vertical AI product versus a horizontal platform?