Discussion about this post

User's avatar
hohoda's avatar

the "language tax" framing is sharp. it's invisible at the UI layer, which is exactly why it's easy to declare AI "democratized" while the infrastructure underneath is still heavily English-first.

though i'd push back a bit on the "build your own LLM" path as the solution — that's only accessible to a handful of countries with the capital, compute, and talent (China, France, UAE, maybe Japan). for everyone else, that bar is unreachable.

the more tractable fix might actually be at the tokenizer level — train tokenizers on balanced multilingual corpora rather than English-dominant ones. DeepSeek, for instance, has a tokenizer optimized for Chinese, which changes the economics without requiring every country to build from scratch. curious if you see that as viable, or do you think the data imbalance runs too deep to fix from the tokenizer up?

Tsetsy's avatar

This is such an important point. We talk about ‘democratizing AI’ but rarely about the hidden costs baked into the system itself

4 more comments...

No posts

Ready for more?