There's a loaded question if I ever saw one :)
If you've never been asked the question here is your likely response "Well, it's written in .Net, so yeah, I think so." Anyone who has worked on a multi-lingual app knows this person is in for a world of hurt.
Our flagship application has a UI localized in nine (9) languages. All of them Latin based scripts with ASCII representations. For those of you with similar experience you know the process is fairly straight forward but can also be manual and labor intensive.
You have the translation house of your choice translate all of the strings in your app and it is then your task to incorporate the translations back into the product. This is where things get real boring real fast. For each language you need to verify that everything is displaying properly i.e. labels aren't cut off, do you need to grow or shrink your buttons, do you need to completely re-arrange portions of your UI so they make sense in context, etc. (I'd be interested in hearing any feedback on automating this process). This can also significantly increase system/quality assurance testing.
A request recently came in to our development team to provide an estimate for localizing our UI into an additional twenty-four (24) languages. Including Russian, Japanese, Arabic, Hebrew. Now things get really interesting supporting Cyrillic, symbolic?, and right-to-left languages. We are estimating (hoping) that the process will be largely similar and will verify this with some prototyping. But here is the real kicker, we must also support DATA in 33 languages (we now currently only support English data).
I mentioned once in a previous post that our application has separated itself from its competitors by the raw speed in searching through product catalogs. The way we achieve that is with a proprietary flat-file format and a proprietary index scheme (proprietary meaning we code it ourselves, we haven't invented some super secret s-tree index or anything). Our current format is all ASCII and we will need to overcome that limitation to introduce Unicode support. We are also entertaining the possibility of switching to an in-process database ala SQL Server CE or Sqlite. Some research by our lead developer revealed that Unicode is not always Unicode. This got me digging as well, for instance:
string s1 = "H\u1EAFT"; // Visually represented as HắT
string s2 = "H\u0103\u0301T"; // Visually represented as HắT (the same as s1)
s1.Equals(s2); // false
So, visually (i.e. from the User's perspective) s1 and s2 are the same. And the user would expect a search of HắT to return any matching elements, regardless of their underlying binary representation. However, .Net comparison methods will tell you they are not the same, largely because s1.Length == 3 and s2.Length == 4. If I told you that I fully understand the details I'd be lying but the issue is Unicode Normalization. .Net 2.0 adds a new method to the String class called Normalize(). This will normalize the string into a common binary format that you can then use to compare against another string.
s1.Normalize().Equals(s2.Normalize()); // true
I'm not aware of a similar, managed, implementation in .Net 1.1. However, I believe (untested as of yet) that you could call out to NormalizeString() in Normaliz.dll to achieve the same thing. My Reflectoring in string.Normalize leads me to believe this is what happens in 2.0.
The fun continues when you transition to a database. Sqlite uses memcmp() for comparison (I have never written a line of c-code in my life but my searching indicated that memcmp is a binary comparison). So if you issue a SELECT * FROM Product WHERE Description=s1, it will only return results that were inserted as H\u1EAFT. So to use Sqlite you must load all of your data with the same normalization and normalize your search criteria accordingly.
I went on to perform the same tests against SQL Server 2000 and was quite surprised by the results. Selecting with s1 or s2 returned both results, however, the data returned still maintained the same binary representation. So SQL Server is storing my data exactly as I inserted it, but it is smart enough to know that they are the same when I query for them, even though they are different. I found this post that explains that SQL Server does "the right thing in any index you create on the column into which the insert is done". This lead me to believe that I was only seeing this functionality because I had an index on the column, however, I still saw the same behavior after dropping the index.
I now know more about Unicode that I really wanted to (hopefully you do too). The next time someone asks you if your app supports Unicode, just start talking until they say forget it.