Here's an example of a problem search:http://www.labourstart.org/cgi-bin/newsquery1.pl?searchtext=Norfolier&language=no&number=100
It finds 3 news stories in Norwegian with the term 'Norfolier' in them. One is recent and is encoded in Unicode (UTF-8). The other two are much older. If you display the page with search results in Unicode, the two older ones would appear corrupted. If you dislay the page in the older character encoding (iso-8859-1), the first story would have corrupted characters.
The solution is to use a bit of Perl code that converts -- but it wasn't working. I've now figured it out, and the search results appear as they should, all in Unicode.
The problem will go away once we convert our archive to Unicode, but that's not on the immediate to-do list.
The good news here is that by solving this problem, we open the way to solving it for the labour newswires, which are experiencing the same issue.
This is one of the issues posted on our Techblog, and though we had two responses, they weren't helpful. Obviously, at least in the labour movement, we're doing some pioneering things here with multilingualism and I guess that not a lot of union webmasters have come across these kinds of problems.