Last week, Mark Dominus published a deep dive into a single Unicode character: U+FDFD, the Arabic ligature for the basmala. The character exists because early font engines could not render Arabic cursive properly. The solution was to encode the entire phrase “بِسْمِ ٱللهِ ٱلرَّحْمَٰنِ ٱلرَّحِيْمِ” as one glyph. The story of how that glyph came to be involves a twice-mutilated vizier, a vanished Qur’an, a Beirut newspaperman, and an Egyptian physician who taught himself font engineering.
It is a magnificent piece of type history. It is also a warning to anyone building or deploying large language models.
Dominus’s post is a commentary on Saleh’s earlier interactive article about Arabic typesetting. The core problem: Arabic script is always cursive. Letters must connect. Early font engines rendered Arabic with separate, disconnected letters, producing text that looks “grossly wrong” to a reader. The basmala, the phrase that opens 113 of 114 Qur’anic surahs, could not be rendered as a jumble. So Unicode added U+FDFD.
That single codepoint now ships in every modern font. Firefox renders it one way. Android renders it differently. Dominus shows magnified images of the Android glyph and notes that Khaled Hosny, designer of the Amiri font, called the Android version “very bad,” criticizing “the bizarre fusion of the letters” and the insertion of “Allah” into the middle of “bismi.”
The technical fix worked. The aesthetic result is uneven. And the whole episode sits almost entirely outside the training data of every major language model.
Consider what a language model “knows” about U+FDFD. It has seen the Unicode standard. It has seen blog posts like Dominus’s. It has seen the Wikipedia entry. It can recite that the character is called “ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM.” It can explain the four words. It can even describe the calligraphic tradition.
What it cannot do is look at the two renderings Dominus shows and tell you which one is correct. It cannot see the difference between the Firefox glyph and the Android glyph. It cannot evaluate Hosny’s criticism. It cannot distinguish a well-formed ligature from a poorly constructed one.
This is not a niche problem. The basmala is the most visible example, but the issue generalizes. Arabic, Persian, Urdu, and other languages that use Arabic script rely on ligature rendering that varies by font, by platform, and by decade. The training data for language models is overwhelmingly English, and where it includes Arabic, it includes the text as rendered by whatever font engine the scraper used. If the scraper captured the crappy rendering, the model learns the crappy rendering as correct.
The same blind spot applies to every script that depends on complex shaping. Devanagari, Bengali, Thai, Georgian, and many others require shaping engines that handle conjuncts, reordering, and contextual forms. The Unicode standard defines the character encoding. The shaping engine, usually HarfBuzz or a platform equivalent, decides how the glyphs actually look. Language models train on the encoded text and have no access to the shaped output. They learn the abstract representation, not the visual reality.
This matters for AI builders in three ways.
First, evaluation. If you benchmark a model on Arabic text tasks, you are testing its ability to manipulate Unicode codepoints, not its understanding of how Arabic actually reads. A model that passes a question about the basmala may still produce output that, when rendered, looks like the disconnected-letter version. The evaluation is measuring the wrong thing.
Second, fine-tuning. If you fine-tune a model on Arabic text scraped from the web, you are training it on renderings that may be decades old. The basmala glyph on Android, which Hosny calls very bad, is the version millions of users see daily. A model trained on that rendering will learn that the bad glyph is correct. It will then generate text that, when rendered on a different platform, may look even worse.
Third, user experience. For users of Arabic, Persian, or Urdu, a language model that produces typographically correct output is not a nice-to-have. It is a basic requirement. The basmala is a religious phrase. Rendering it incorrectly is not just a bug. It is disrespectful. Dominus makes this explicit: “It would be blasphemous to render this phrase… as a jumble of letters.”
The AI industry has spent the last two years obsessed with capabilities: longer context windows, better reasoning, multimodal understanding. The basmala story is a reminder that the foundations are still shaky. A model that can pass the bar exam but cannot produce a correctly ligatured Arabic phrase has a gap in its understanding that no amount of scaling will fix.
The solution is not to add more Unicode codepoints. The solution is to integrate shaping-aware evaluation into the training and testing pipeline. If a model cannot distinguish between a well-formed Arabic ligature and a broken one, it should not be deployed for Arabic users. If the training data contains poorly rendered text, that text should be flagged or filtered.
Dominus ends his post with a note of anticipation: “I am looking forward to understanding more of them.” The AI industry should share that sentiment. Understanding the basmala means understanding what a language model cannot see. That is the first step to fixing it.