Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - Question about usage of Japanese/Unicode UTF8 in webpage

Reply to topic
Author Message
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8648
  • Location: Paris, France
Reply with quote
Question about usage of Japanese/Unicode UTF8 in webpage
Post Posted: Wed Feb 11, 2004 10:35 am
New SMS Power pages should also includes game names (and perhaps other information) in Japanese. We probably also want to have correct accents for Brasilian title, etc. All those characters are not part of the ASCII standard.

I'm not sure however which is the best way to embed them in a webpage.

The idea is that:
- Japanese characters should display on most browsers/systems
- In case they are not supported (eg: missing font), they must not screw up the whole thing.

Unicode UTF8 encoding seems to be the good choice, but I'm not sure if it is supported by all browsers. I'm curious to see how an old browser will handle such page, and what kind of garbage would be displayed instead of the Japanese text.

Following are two tests:

No encoding specified, using escaped character values. Maxim used this technique for the /scans section. I'm wondering if there's any reason behind that. Maxim?
http://wip.smspower.org/tests/test_charset_1.html

Using UTF8:
http://wip.smspower.org/tests/test_charset_2.html

Does both works with your browser?
Do you get prompted to install a Japanese font? (I can't decide weither it is better to get prompted for it or not, but I suppose that we don't want to bother people who don't have the font installed on their system)
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14734
  • Location: London
Reply with quote
Post Posted: Wed Feb 11, 2004 12:20 pm
Quote
> No encoding specified, using escaped character values. Maxim used this technique for the /scans section. I'm wondering if there's any reason behind that. Maxim?

Because it worked best in my tests; it keeps everything in 7-bit ASCII; and it works regardless of the browser's encoding mode (otherwise you need to rely on META tags to work properly).

Quote
> http://wip.smspower.org/tests/test_charset_1.html

I see it correctly - crappy Konqueror in KDE/Linux.

Quote
> Using UTF8:
> http://wip.smspower.org/tests/test_charset_2.html

I see it correctly in the title bar but I only see two characters in the page. No prompting, possibly because the fonts are here.

Maxim
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8648
  • Location: Paris, France
Reply with quote
Post Posted: Wed Feb 11, 2004 1:14 pm
Quote
> Because it worked best in my tests; it keeps everything in 7-bit ASCII; and it works regardless of the browser's encoding mode (otherwise you need to rely on META tags to work properly).

Well, do you know of any decent browser which do not support them? Those language and encoding things are difficult to figure out, because several browser seems to performs data recognition (which is contrary to all rules), so even incorrect tags could display proper characters in IE/Mozilla.

Can you go further with the "it worked best in my tests" ?
Using 7-bit ASCII is not such a big problem since conversion scripts could perform it automatically, but it makes Japanese text about 3 times bigger and impossible to manually tweak.

Quote
> > http://wip.smspower.org/tests/test_charset_1.html

> I see it correctly - crappy Konqueror in KDE/Linux.

> > Using UTF8:
> > http://wip.smspower.org/tests/test_charset_2.html

> I see it correctly in the title bar but I only see two characters in the page.

My faults. Test 1 display "Enduro Racer" in title bar and page. Test 2 display "Master System" in title bar and "Sega" in Hiragana in page. That should be ok.

Quote
> No prompting, possibly because the fonts are here.

We'll have to see about that.
Thanks for your testing.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14734
  • Location: London
Reply with quote
Post Posted: Wed Feb 11, 2004 1:24 pm
Quote
> Well, do you know of any decent browser which do not support them? Those language and encoding things are difficult to figure out, because several browser seems to performs data recognition (which is contrary to all rules), so even incorrect tags could display proper characters in IE/Mozilla.

This is partly the problem. My experience of using a browser set to a non-Western default encoding is that it tends to revert a lot, sometimes despite any tags.

Quote
> Can you go further with the "it worked best in my tests" ?

I was focussing on plain text display at first, because UTF-8 and Unicode files have a header that is supposed to be detected (and also define the byte ordering) but that failed a lot. Then I tried using encoding but I found that it tended to become messed up in the transition from a text editor to HTML via whatever else happened - mainly because Windows' routines wold tend to try to convert to GB2312 and it'd be destroyed in the process. NCRs (all that &# number ; and &#x hex ; stuff) at least manage to stay where you put them, even if they are unreadable (perhaps a modern HTML editor would help, but Notepad is not that).

Quote
> Using 7-bit ASCII is not such a big problem since conversion scripts could perform it automatically, but it makes Japanese text about 3 times bigger and impossible to manually tweak.

How are you encoding to UTF-8? I didn't find a good solution (and I am interested because Winamp 5 provisionally accepts UTF-8 so VGM tags should be getting better).

Maxim
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8648
  • Location: Paris, France
Reply with quote
Post Posted: Wed Feb 11, 2004 1:39 pm
Quote
> > Using 7-bit ASCII is not such a big problem since conversion scripts could perform it automatically, but it makes Japanese text about 3 times bigger and impossible to manually tweak.

> How are you encoding to UTF-8? I didn't find a good solution (and I am interested because Winamp 5 provisionally accepts UTF-8 so VGM tags should be getting better).

UltraEdit works pretty well and handle various encoding and conversion (including 16-bit and UTF-8 Unicode encoding). Visual Studio editor also handle Unicode but I'm unsure if you can force it to save to a certain encoding.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8648
  • Location: Paris, France
Reply with quote
UTF-8 links and testing
Post Posted: Wed Feb 11, 2004 3:07 pm
Tests of Unicode file encodings and Japanese/English display
http://boblet.net/test/utf8/

UTF-8 SAMPLER
http://www.columbia.edu/kermit/utf8.html

UTF-8 and Unicode FAQ for Unix/Linux
http://www.cl.cam.ac.uk/~mgk25/unicode.html
(very long doc with historical and technical stuff)

In the end, I still don't know which browsers (old version, etc.) do not support UTF-8 and Japanese properly.
  View user's profile Send private message Visit poster's website
Reply to topic



Back to the top of this page

Back to SMS Power!