Saltycrane logo

Sofeng's Blog

Notes on Python, Django, and web development on Ubuntu Linux

    

Python UnicodeEncodeError: 'ascii' codec can't encode character

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' 
in position 0: ordinal not in range(128)

If you've ever gotten this error, Django's smart_str function might be able to help. I found this from James Bennett's article, Unicode in the real world. He provides a very good explanation of Python's Unicode and bytestrings, their use in Django, and using Django's Unicode utilities for working with non-Unicode-friendly Python libraries. Here are my notes from his article as it applies to the above error. Much of the wording is directly from James Bennett's article.

This error occurs when you pass a Unicode string containing non-English characters (Unicode characters beyond 128) to something that expects an ASCII bytestring. The default encoding for a Python bytestring is ASCII, "which handles exactly 128 (English) characters". This is why trying to convert Unicode characters beyond 128 produces the error.

The good news is that you can encode Python bytestrings in other encodings besides ASCII. Django's smart_str function in the django.utils.encoding module, converts a Unicode string to a bytestring using a default encoding of UTF-8.

Here is an example using the built-in function, str:

a = u'\xa1'
print str(a) # this throws an exception

Results:

Traceback (most recent call last):
  File "unicode_ex.py", line 3, in 
    print str(a) # this throws an exception
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

Here is an example using smart_str:

from django.utils.encoding import smart_str, smart_unicode

a = u'\xa1'
print smart_str(a)

Results:

¡

Definitions

  • Unicode string: sequence of Unicode characters
  • Python bytestring: a series of bytes which represent a sequence of characters. It's default encoding is ASCII. This is the "normal", non-Unicode string in Python <3.0.
  • encoding: a code that pairs a sequence of characters with a series of bytes
  • ASCII: an encoding which handles 128 English characters
  • UTF-8: a popular encoding used for Unicode strings which is backwards compatible with ASCII for the first 128 characters. It uses one to four bytes for each character.

References


2 Comments


#1 Arthur Buliva commented, on November 20, 2008 at 9:19 a.m.:

A simpler way to do this is:

print unicode(u'xa1').encode("utf-8")


#2 sofeng commented, on November 21, 2008 at 9:08 a.m.:

Arthur, thanks for the tip. I'm not sure what differences the Django utility functions have. I will have to look into this further. For other readers, here is the documentation for encode: http://www.python.org/doc/2.5.2/lib/string-methods.html

Post a comment

: Required
Email: Required, but not displayed
Website: Optional
:

Format using Markdown. (HTML not allowed.)
  • Code blocks: prefix each line by at least 4 spaces or 1 tab
  • Code span: surround with backticks
  • Blockquotes: prefix lines to be quoted with >
  • Links: <URL>
  • Links w/ description: [description](URL)
:

Created with Django | Hosted by Webfaction