Friday, May 8, 2009

Finding what font renders Unicode characters in the PUA

Unicode 5 allows for 137,468 privately used codepoints. This means that there are codepoints that are reserved for private organizations to do with what they will - which means that no characters have been officially assigned to them.

In practice, this means that you need to have some way of knowing where the document with the embedded Unicode character came from. This is normally left up to a high-level protocol to negotiate how to render the character.

A great example is the character U+F8FF, which is used by a number of fonts. According to Wikipedia, this font is the Apple Computer logo with Apple fonts installed, the Microsoft logo when using Wingdings, and the Luxi font (designed for the X Window System) shows a euro symbol. There are a heck of a lot of other fonts that also use this codepoint.

And this was my challenge the other day. I'm currently in the process of troubleshooting a particularly tricky Unicode issue, but I wanted to see what used U+100084, but finding what uses this codepoint was a bit tricky. Then I discovered that fileformat.info has a Unicode font list and a Unicode font search tool. I had to use the local file list, and for some reason I found that my PC had Palatino Linotype installed on it - a very expensive font. However, it was the only font that rendered the character. Therefore, I came to the conclusion that this came from a PDF document, which had the font embedded.

Very useful!

P.S. Please, please, if you are thinking of using a font that requires a private use codepoint, do everyone a favour. Don't.

No comments:

Post a Comment