← All posts

I stopped paying for AI

March 27, 2026 · 5 min read · by the founder
$ on-device

Well into the build of CardCue Pro, I hit a wall that will be familiar to anyone who's shipped an AI-backed app recently: the API bill.

The card-scanning flow is AI-heavy by necessity. A user points their phone at a gift card, and the app has to figure out: what store is this, what's the type, what's the balance, where's the PIN, where's the barcode, does it expire, is it a single-use certificate, is it a punch card or a loyalty card or a straight-up gift card, and, in the edge cases, what language is it in and is the font doing something weird with the zero on the balance.

That's a lot of reading comprehension. The obvious 2023-era solution was: send the image to GPT-4V (or Claude, or Gemini Pro), ask a very specific question, get back structured JSON, populate the form. It worked. It cost a few pennies per scan.

A few pennies per scan, spread across a user base that scans 30 cards in onboarding, is a few dollars per user just to get them through the first session. For a no-account, no-backend, no-server app with no subscription model, that math does not hold. And worse, sending gift card images, some of which contain PINs and barcodes, to a third-party cloud API was undercutting the whole privacy argument the app was built on. If the data had to leave the phone at all, what was the point of everything else?

So I started looking for a way to do the scanning on-device. Which seemed, at the time, ambitious. On-device AI, until very recently, meant "simpler models that don't quite work as well, with a lot of quantization and a performance hit." You could do OCR on-device. You could even do simple classification. But structured reasoning."extract this JSON from this messy real-world image", was still a cloud problem.

And then iOS 18.1 shipped.


iOS 18.1 introduced Apple Intelligence, which, past the branding, is a set of genuinely new on-device capabilities. One of them, FoundationModels, is an on-device LLM you can call from Swift. It's small, single-digit billions of parameters, but it's well-tuned for structured reasoning tasks, which is exactly the category the card-scanning problem falls into. You can hand it an image, a Vision-derived text layout, and a prompt like "Extract the following fields from this gift card..." and it returns structured JSON. On-device. In about 400 milliseconds. For free.

I stopped paying for AI.

30 CARDS · FIRST SESSION Cloud AI (GPT-4V, Claude) $0.03 per scan × 30 scans $0.90 per user · every first session · forever On-device (FoundationModels) $0.00 per scan × ∞ scans $0 per user · at infinite scale · forever Round-trip latency: cloud ≈ 1.5–4s over LTE · on-device ≈ 400 ms, airplane mode included
The dollar bar and the latency bar both favor local. The privacy bar isn't a bar at all, it's a category.

The dollar math alone is compelling, scanning a card now costs me zero, at infinite scale, forever. But the architectural math is more important. The data never leaves the device. The PIN, the barcode, the image itself, none of it goes to my server (I don't have one), none of it goes to OpenAI or Anthropic, none of it gets logged in a third-party observability platform, none of it sits in a CloudTrail bucket waiting for a future breach. The scan happens inside the Secure Enclave's neighborhood, and the phone throws the image away when it's done.

The On-Device Intelligence press kit gets into the specific framework chain. Vision Framework for OCR, SensitiveContentAnalysis for barcode classification, FoundationModels for the structured extraction, and WeatherKit for the contextual layer. For the purposes of this post, the punch line is: modern iPhones have enough intelligence baked into them that a small team can build, at zero marginal cost, experiences that required a $0.03/scan cloud bill eighteen months ago.


I think about this a lot, because it's a story about how the economics of building an app have shifted overnight and not everyone has noticed yet.

If you are a solo developer today, or a small team, and you are paying for cloud AI for any task that can be described as structured reasoning over one image or a few paragraphs of text, you should reevaluate. Apple has quietly made that category free. It costs you iOS 18.1 as a minimum deployment target, which is a meaningful user-base cut, but it's a cut that shrinks every month as phones update.

If you're a larger team, the math is different, cloud AI has scale advantages, better models, easier debugging, and you can serve non-Apple platforms. I'm not making a universal claim.

But for a category of app, utility apps, privacy-first apps, no-subscription apps, no-account apps, the on-device path is now real in a way it wasn't. And for my specific product, it was the only path that preserved everything else. If Cue had to send card images to OpenAI, it would not be the product I was trying to build. It would be a different, weaker product that happened to have the same icon.


I'll leave you with an image.

When you scan a card in CardCue Pro, there's a moment, about 400 milliseconds, where the phone thinks. The screen briefly shows a subtle shimmer. In that 400ms, an on-device LLM is reading a gift card I just pointed a camera at and extracting eight structured fields from it. Apple's Foundation Model is doing the kind of work that, two years ago, would have required sending an image to a data center in Oregon, paid for with a credit card linked to my apartment.

It costs nothing now. It leaves no trace. It happens entirely inside the device held in a hand.

The phone is remarkable, and the last two years of on-device AI development have made this kind of thing not just possible but cheap. Most apps haven't caught up yet. The ones that do, the ones that trust iOS enough to let Apple do the expensive work, are the ones that can finally stop passing the cost of their own convenience onto the user.

Follow CardCue Pro