Yomikomi — Japanese Reading Toolkit

📖The Problem

Reading Japanese shouldn't feel like detective work

Like many Japanese learners, I relied on Yomitan on desktop — it's fantastic. But the moment I picked up my phone to read manga or a novel, everything fell apart. My workflow became: spot an unknown word → screenshot → Google Translate → copy the Japanese text → search in 'imiwa?' It's exhausting and completely kills reading immersion.

OCR tools like Mokuro existed, but required setting up a local environment and were tethered to a computer. I wanted something I could open in Safari on my phone and just... read.

🌱The Origins

It started with anime subtitles

The seed was planted with an earlier project: subtitles-hook. The idea was simple — load a local video, generate subtitles via Turboscribe, and hover over words while Yomitan does the heavy lifting. It worked beautifully on desktop.

But watching anime on my phone with no Yomitan available made me think: what if the dictionary lived in the browser itself? That question led me down a rabbit hole of WASM, ONNX, and client-side AI models.

subtitles-hook on GitHub

🔬Earlier Experiments

jp-reader: the first attempt

Before Yomikomi, I built jp-reader — a companion app for reading Japanese text with dictionary lookups baked in. It was convenient, but still required a computer and a running server. Every solution I built kept pulling me back to the same constraint.

jp-reader on GitHub

⚡The Pivot

Anki decks → Reading companion

Yomikomi actually started as an Anki deck tool. I found ways to parse .apkg files as SQLite databases in Node.js, which worked great — but only on the backend. Then I discovered sql.js, a full SQLite compiled to WebAssembly. Suddenly, I could parse Anki decks entirely in the browser, no server needed.

That discovery changed everything. If I could run SQLite in the browser, what else could I run? The answer was PaddleOCR via ONNX Runtime Web — Japanese text recognition, client-side, no cloud, no API keys.

🖥️Backend vs. Browser

The server vs. client tradeoff

I first tried running OCR models on a backend server — a Docker container with PaddleOCR or YomiToku. Quality was excellent, but self-hosting requires infrastructure, a VPS, and ongoing maintenance. The app still supports this mode (see the server/ directory in the repo), but I personally stopped using it.

The browser-native approach wins on accessibility: install nothing, pay for nothing, your images never leave your device. Yes, model quality is slightly lower than server-side inference, and yes, large transformer models can crash iOS Safari — these are known limitations I'm actively working on.

🚀Today

Where Yomikomi stands now

If you're a Japanese learner who wants to read books, manga, or any Japanese text on your phone — without installing extensions, without running servers, without sending your content to Google — Yomikomi might be exactly what you need.

Add it to your iPhone Home Screen via Safari → Share → Add to Home Screen, and it runs like a native app. Dictionary, OCR, tokenizer, translator — all local, all private.

The story behind Yomikomi