Character encoding: the light comes on...

Apr 27, 2004

Well, after being given a Thai language file for the Forum, a Bulgarian language file for the Ringmaker, and throwing myself at character sets I couldn't read at all... I think I'm finally starting to get the hang of this character encoding business.

I mean, I understood it before, just not how the different sets worked together and how they were displayed when the current document uses the right encoding.

For instance, why a character would display correctly or incorrectly in UTF-8 was voodoo to me. :) Now I understand!

Anyway, I found a nice script to recode Bulgarian to UTF-8 at but I couldn't find the same for windows-874 (Thai).

So after much searching I asked on Usenet and what do you know? I got pointed to a Perl script that recoded Thai and from there it was a simple matter to translate it to PHP. If you'd like the function I came up with, you can download it from my PHP page.

