I tried to stop paying for AI. CardCue Pro

Well into the build of CardCue Pro, I hit a wall that will be familiar to anyone who's shipped an AI-backed app recently: the API bill.

The card-scanning flow is AI-heavy by necessity. A user points their phone at a gift card, and the app has to figure out: what store is this, what's the type, what's the balance, where's the PIN, where's the barcode, does it expire, is it a single-use certificate, is it a punch card or a loyalty card or a straight-up gift card, and, in the edge cases, what language is it in and is the font doing something weird with the zero on the balance.

That's a lot of reading comprehension. The obvious 2023-era solution was: send the image to GPT-4V (or Claude, or Gemini Pro), ask a very specific question, get back structured JSON, populate the form. It worked. It cost a few pennies per scan.

A few pennies per scan, spread across a user base that scans 30 cards in onboarding, is real money per user just to get them through the first session. For a solo developer, that line on the invoice has a way of catching the eye. Every scan, a tiny toll. Every new user, a small stack of tolls. I started daydreaming about a version of the app where the intelligence lived in the phone and the toll booth closed.

On-device AI, until very recently, meant "simpler models that don't quite work as well, with a lot of quantization and a performance hit." You could do OCR on-device. You could even do simple classification. But structured reasoning, "extract this JSON from this messy real-world image", was still a cloud problem.

And then Apple shipped FoundationModels, an on-device LLM you can call from Swift. Small, single-digit billions of parameters, but tuned for structured reasoning tasks, which is exactly the category the card-scanning problem falls into. You hand it a Vision-derived text layout and a prompt like "Extract the following fields from this gift card..." and it returns structured JSON. On-device. In a few hundred milliseconds. For free.

So I did what any founder staring at an API invoice would do. I tried to stop paying for AI.

I built the whole path. Vision reads the text layout, the on-device model turns it into the structured fields, no network call, no toll booth, zero dollars, forever. On my clean test images, it sang. I let myself draft the victory post. You are, in a sense, reading its ghost.

Then I pointed it at my own card drawer.

Thirty real cards. Curved plastic, embossed numbers, a glitter finish or two, balances printed in fonts where the zeros look like tiny planets. The on-device path went zero for thirty. Not "a little worse than the cloud." Zero. Not one card came through with the merchant, the balance, and the barcode all correct. The model that had been so composed on my screenshots fell apart the moment it met the exact junk drawer this app exists to rescue.

I sat with that for a weekend. Then I rolled it back.

The cost column favors the chip. The accuracy column pays the bills. One drawer, one afternoon, one very quiet founder.

Here's the thing I had to admit over that weekend: accuracy is the product promise. The entire pitch of CardCue Pro is that you point your phone at a card and the card is simply in the app, correct, ready to scan at a register. A user whose balance comes in wrong doesn't think "ah, the on-device model needs one more generation of training." They think the app is broken, and honestly, they're right. Free but wrong isn't free. You pay in retyped balances, support emails, and deleted apps.

So here's the architecture that actually ships. Apple's Vision framework reads the card on-device first, and it does a lot: the text layout, the barcode, the sensitive-content check. Then, by default, the photo goes securely to a cloud AI, Claude, by Anthropic, which does the part the drawer kept proving is hard: the weird fonts, the curved plastic, the balance hiding in the fine print. It sends back the structured fields, the app fills in the form, and I get an API bill I have made my peace with. The On-Device Intelligence press kit walks through what runs locally and where the cloud takes over, if you want the frame-by-frame.

And if your line is "nothing goes to the cloud, period," I built the switch for you. Settings has an on-device-only scanning mode: in that mode the photo never leaves your phone, and you'll retype a balance more often. That's the trade, stated plainly, and it's yours to make. If you want the fine print on the default mode instead, the privacy policy spells it out, including that Anthropic can retain a scan image for up to 30 days before deletion and doesn't use it to train models.

I haven't deleted the on-device path. It lives behind a flag, and every major model update I point it at the drawer again, the way a parent keeps offering the broccoli. On-device models are improving at a pace that makes me think the dashed little chip in the drawing up top earns its solid outline eventually. The day it reads the whole drawer, I flip the default, stop paying for AI for real, and publish the smuggest follow-up post you have ever read.

Until then, I'm suspicious of my own daydream, and I think other founders should be suspicious of theirs. "We moved it all on-device" is a lovely sentence. It photographs well. But if the trade underneath it is the user quietly gets worse results so I can stop paying an invoice, that isn't a privacy feature. It's cost-cutting wearing privacy's coat.

I'll leave you with an image.

When you scan a card in CardCue Pro, there's a moment where the phone thinks. A shimmer, a beat, and then the card appears with its store and its balance and its barcode, correct. What's happening in that beat is not a philosophy. It's the best reader I could get my hands on doing the one job the whole app depends on, and me paying a few pennies for it without resentment.

Someday that reader will live entirely in the phone, and the scan will cost nothing, and you won't even notice the week it changes. That's the post I sat down to write in March. This is the one that turned out to be true.

← Previous How a radius became a feeling Next → Apps should sound like people

Follow CardCue Pro