Buchenberg — MT lab

How it works

Each sentence of a source book is translated into a target language using three different models — Gemma 3 12B, Ministral 3 14B, and NLLB-600M — at various temperatures. The resulting translations are then back-translated to English and compared to the original using cosine similarity on multilingual embeddings. A fourth model, Gemma 4 31B, acts as a blind judge — rating each candidate on grammar, naturalness, and fidelity. Combining both scores selects a winner for this first phase.

That winner then becomes the anchor for a second phase: self-refinement. Two refine models take the current best translation as a hint and mutate it — an anchored mutation that keeps the sentence grammatical while searching for something better. The final winner is chosen across both phases, so the finished document is a hybrid not only of models but of phases: the best translation for each sentence, regardless of model or phase.

📐 Back-translation scoring

Translate to target language, then back to English. Measure cosine similarity between original and round-tripped text. A high score means the translation preserved meaning.

⚖️ LLM judge

Gemma 4 31B evaluates each candidate translation blindly on three axes: grammar, naturalness, and fidelity to the original. Judge score carries 60% of the final ranking weight.

🧬 Self-refinement

The first-phase winner is fed back as a hint to two refine models, which mutate it rather than translate from scratch. This anchored mutation keeps grammar intact while exploring for better wording — evolution over language itself.

🏆 Sentence-level winner

No single model — or phase — wins every sentence. The final document combines the best-scoring translation per sentence across both phases — a hybrid that outperforms any individual model.

100% open source. All models used are freely available. No proprietary APIs, no cloud translation services. Source books are from Project Gutenberg — public domain, freely distributable. The pipeline runs on commodity hardware with a PostgreSQL backend and Ollama for local and cloud LLM inference.