|
Susan Mernit and Andrew Carvin are wondering about the issues surrounding the compatability of syndication technologies like RSS and non-English character sets, particularly those that are right-to-left (והמבין יבין). The concern is that RSS might not support the all the diverse languages across the globe, which might limit the possibilities for syndication in certain parts of the world.
The good news is, there's nothing to worry about. It's best to think in terms of representing character sets, not languages. Without getting too technical, a character (like the letter 'q') is really just a unique, previously aggreed upon encoded value. The same goes for 'Д', '楽' or 'ש'. Files, RSS feeds included, are just a large, ordered list of encoded characters, encoded according to some specification, like Unicode, which has pre-established mappings for just about every character set known to exist. Assuming all the various components (web server, XML files) are configured and produced in the proper fashion, there are no barriers to representing any of the world's character sets in XML. That means we can syndicate the words of anyone in the world in whatever character set we like.
Displaying them is a different issue; US versions of Windows often require additional software to properly display non-English character sets. Browsers and other client programs must often guess at how to display non-Latin characters, if a site does not provide the appropriate markup in HTML as to which direction text should be presented.
A perfect example of this are the feeds produced by the BBC World Service, easily viewed in aggregate via the BBC World Service Blogdigger Group. There are feeds in Czech, Greek, Farsi, Turkish, Vietnamese, Arabic, Russian, Japanese, Chinese and more. Because Blogdigger converts and stores data as Unicode, we can aggregate them all into a single context, index them and search across them in any character set.
The end result: anyone, anywhere can put their thoughts out for syndication, whether for consumption by individuals or by services like Blogdigger, helping the individual voice, no matter what language it speaks, be heard worldwide.
|