[albatross-users] A better page system

Andrew McNamara andrewm at object-craft.com.au
Thu Jun 5 13:00:38 EST 2003


>Andrew> Put another way, if your application only uses characters in
>Andrew> the range 0-127, the unicode version of albatross works
>Andrew> identically to the old version. However, if you are using
>Andrew> foreign character sets (accented characters, etc) with
>Andrew> characters in the range 128-255, your application will need to
>Andrew> be changed (but the results are much cleaner).
>
>You probably need to provide an example here.

Difficult to do without scaring people off... 8-)

My first response when I encountered all the grief that unicode appears
to create was "bugger this for a joke". But the reality is that unicode
isn't the problem - the problem is the mess that existed before.

Whereas you might previously have got away with outputing a variable
containing an accented character (that's an umlat-a if this doesn't make
it though e-mail):

    >>> import albatross 
    >>> ctx = albatross.SimpleContext('.')
    >>> ctx.locals.name = 'Häring'
    >>> albatross.Template(ctx, '<magic>', '''<al-value expr="name">''').to_html(ctx)
    >>> ctx.flush_content()
    Häring

Now you will get a traceback:

    >>> import albatross 
    >>> ctx = albatross.SimpleContext('.')
    >>> ctx.locals.name = 'Häring'
    >>> albatross.Template(ctx, '<magic>', '''<al-value expr="name">''').to_html(ctx)
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/usr/local/lib/python2.3/site-packages/albatross/template.py", line 358, in to_html
        self.content.to_html(ctx)
      File "/usr/local/lib/python2.3/site-packages/albatross/template.py", line 152, in to_html
        item.to_html(ctx)
      File "/usr/local/lib/python2.3/site-packages/albatross/tags.py", line 1038, in to_html
        ctx.write_content(escape(value))
      File "/usr/local/lib/python2.3/site-packages/albatross/tags.py", line 20, in escape
        text = unicode(text)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

This is because all standard strings are assumed to use the 'ascii'
character set by python, rather than the browser-default iso-8859-1
character set. So when it converts a standard string to a unicode string,
it uses the 'ascii' codec, rather than the 'iso-8859-1' codec.

The "right" answer, if you are dealing with international character sets is
probably to work with unicode throughout:

    >>> ctx.locals.name = u'Häring'
    >>> albatross.Template(ctx, '<magic>', '''<al-value expr="name">''').to_html(ctx)
    >>> ctx.flush_content()
    Häring
    
When you accept strings from other systems, you probably need to decode
them explicitly:

    >>> ctx.locals.name = sys.stdin.readline().decode('iso-8859-1')

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



More information about the Albatross-users mailing list