Python script works in terminal but not in vi?
Monday, December 29, 2008 11:09:10 AM
First of all, let me show a sample script (tested with Python 2.5.2):
#!/bin/env python x = u'almo\xe7o' print xUpon running on terminal, I get the expected output:
almoçoUpon :read!test1.py command inside Vim, I get this error:
Traceback (most recent call last): File "./test1.py", line 3, in <module> print x UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 4: ordinal not in range(128)
So I did some tests with other scripts, like this:
#!/bin/env python import sys x = u'almo\xe7o' sys.stdout.write(x)or this:
#!/bin/env python x = u'almo\xe7o' y = str(x) print y
All of them failed. However, these two scripts failed not only inside Vim, but also at the terminal. It is also worth noting that this last one failed on the str(x) command. This was a hint that the problem does not actually lie into the printing command, but in the conversion before printing.
The answer was found at some forum topic:
if you write a unicode-object (wich is an abstract data
type with no byte representation), it has to be converted to a string -
which you have to do either explicit. Or if you don't do it - it ill be
done automatically, using the system default encoding. Which is ascii,
most of the time.
That explains a lot. The unicode string object is abstract, in sense that you need some type of encoding in order to write it somewhere. It makes a lot of sense.
So, the solution is easy:
x = x.encode("utf-8")
Question: Why the print statement works inside the terminal?
Answer: It appears to do such conversion automatically, trying to discover the correct terminal encoding and encoding the output before printing.
Question: How does the print automatically encode strings? How it works?
Answer: I don't know. I would gladly accept answers for this. If you know, leave a comment. Even more important, if you know where it is documented (at the official documentation), please point out!
Question: Why inside Vim the behavior is different than inside the terminal?
Answer: This *might* be related to locale settings. In my terminal, the locale is set as LANG=en_US.UTF-8. If, on the other hand, I set LC_ALL=C, then the first script also fails at the terminal. This means that locale is relevant to such automatic conversion.
That's all, folks! It took me a few hours to discover all of this, and I hope this information is helpful to other people. See ya!
Update at 2009-04-03: A friend had a similar issue, and his solution was:
The solution was DAMN simple, you might be disappointed Just replaced the default sys.stdout with a custom one that is always UTF-8 (no matter what happens):
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)