Each sentence of a source book is translated into a target language using three different models — Gemma 3 12B, Ministral 3 14B, and NLLB-600M — at various temperatures. The resulting translations are then back-translated to English and compared to the original using cosine similarity on multilingual embeddings. A fourth model, Gemma 4 31B, acts as a blind judge — rating each candidate on grammar, naturalness, and fidelity. Combining both scores selects a winner for this first phase.
That winner then becomes the anchor for a second phase: self-refinement. Two refine models take the current best translation as a hint and mutate it — an anchored mutation that keeps the sentence grammatical while searching for something better. The final winner is chosen across both phases, so the finished document is a hybrid not only of models but of phases: the best translation for each sentence, regardless of model or phase.