Character encoding: the light comes on...

Apr 27, 2004

Well, after being given a Thai language file for the Forum, a Bulgarian language file for the Ringmaker, and throwing myself at character sets I couldn't read at all... I think I'm finally starting to get the hang of this character encoding business.

I mean, I understood it before, just not how the different sets worked together and how they were displayed when the current document uses the right encoding.

For instance, why a character would display correctly or incorrectly in UTF-8 was voodoo to me. :) Now I understand!

Anyway, I found a nice script to recode Bulgarian to UTF-8 at PHP.net but I couldn't find the same for windows-874 (Thai).

So after much searching I asked on Usenet and what do you know? I got pointed to a Perl script that recoded Thai and from there it was a simple matter to translate it to PHP. If you'd like the function I came up with, you can download it from my PHP page.


Comments closed

Recent posts

  1. Customize Clipboard Content on Copy: Caveats Dec 2023
  2. Orcinus Site Search now available on Github Apr 2023
  3. Looking for Orca Search 3.0 Beta Testers! Apr 2023
  4. Simple Wheel / Tire Size Calculator Feb 2023
  5. Dr. Presto - Now with MUSIC! Jan 2023
  6. Archive