PEP:
Title: Traceback internationalization
Version: $Revision$
Last-Modified: $Date$
Author: Mariano Reingart <reingart EN gmail PUNTO com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 15-May-2010
Post-History:
Abstract
The idea is to provide a standard mechanism to translate exception
and traceback messages to languages other than English.
To not reinvent the wheel, this proposal is based on i18n (gettext),
to ease translation and colaboration between language communities,
use compatible tools that already exists, favor correctness with
online review, and has proven to be useful in other projects with
similar goals (like error messages in PostgreSQL).
Also, a method to control LC_MESSAGES to get back to English must
be provided, as some code depends on exception messages (i.e.
doctests), preferably changeable at runtime, so this libraries are
self-aware of this issues and no manual intervention is required.
Rationale
English isn't the first language of some users (developers), so
they may not be confortable with messages in that language [1]
(and some don't understand it at all). Ask them to learn a
second language (mostly foreign) to use Python is, at least,
not practical.
This is specially an issue when Python is used as a First
Programming Language for teaching to non-English speakers in
almost any educational level, even worse in other areas not
related directly with computer sciences.
Although a workaround may be developed (ie. a wiki page with
translated errors, as we did in Argentina [2]), that users are
often blocked when they found an English message, losing they
concentration on their work, having to waste time finding the
translation (if it exists) or asking to the teacher or mailing
lists.
In the other side, advanced users often prefer original messages
in English, and they are required in some scenarios like bug
reporting or doctests, so a method must be provided to change the
messages language (preferably at runtime).
Using standards tools for i18n (gettext) will ease translation
providing a common framework that already is prepared for
different language rules, with colaborative online applications
like Pootle[3] to automate translation and review process, tending
to a high quality result.
Other projects have chose this way some time ago, citing PostgreSQL
as an example[4], where, at runtime, you can choose error messages
language using LC_MESSAGES[5]. Indeed, we are using Pootle and
other tools to translate PostgreSQL related projects to Spanish in
a collaborative way [6].
Finally, this proposal will address current misbehavior with
locale.LC_MESSAGES category (according the Python Standard Library
Documentation of locale module) [7]:
Locale category for message display. Python currently does not
support application specific locale-aware messages. Messages
displayed by the operating system, like those returned by
os.strerror() might be affected by this category.
Usage
Setting the desired locale (ie. 'es_AR') in LC_MESSAGES category
will enable internationalization of tracebacks and exceptions, and
setting 'C' locale will get back to untranslated original messages:
Examples:
Python 3.4.0a0 (default:8f0d5ecca524+, Oct 28 2012, 00:46:34)
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 1/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> import locale
>>> locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8')
'es_AR.utf8'
>>> 1/0
Traza de rastreo (llamada más reciente última):
Archivo "<stdin>", lÃnea 1, en <module>
ZeroDivisionError: división por cero
>>> locale.setlocale(locale.LC_MESSAGES,'C')
'C'
>>> 1/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
By default, LC_MESSAGES should be 'C' locale, to prevent any
misunderstanding.
The user that needs translated messages could easily add a
line or setting LC_MESSAGES in his desired language:
import locale; locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8')
Caveats
Internationalization uses UTF-8 to be able to handle special
characters like accents. This should not be a problem in Python 3
but some functions may be revised like PyUnicode_FromFormatV() [9]
Special care must be taken with positional placeholders like in:
"name '%.200s' is not defined". If there is more than one
placeholder, using printf special format specifiers (ie. %2$s %1$s)
or an alternate string formatting system should be required
in order to allow to change their position in the string (this may
be required by some languages rules in some contexts).
Reference Implementation
A proof of concept is attached to issue #16344 [10] for Python 3.3+
Original -obsolete- version (for python 2.x) can be downloaded from
Python Argentina Wiki [8]
It defines a Py_GETTEXT macro that is called from PyErr_SetString
and PyErr_Format (errors.c) and tb_displayline, PyTraceBack_Print
(traceback.c).
A new subdirectory called Locale stores localized message files,
but this could be installed in a standard system directory (i.e.
/usr/share/locale) as a special domain called "python" is used to
not interfere with python modules / libraries / packages already
using gettext.
Some steps are required to set up internationalization correctly:
1. locale.bind_textdomain_codeset("python", "utf8") should be
called in pythonrun.c to initialize encoding (preventing nested
unicode exceptions if internationalization is not correctly)
2. locale.bindtextdomain("python", sysconfig._safe_realpath("Locale"))
should be called in site.py to specify the locale directory
(not needed if a standard directory is used, this would be
platform dependent)
3. locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8') should be
executed by the end user to finally enable internationalization
Although it is just a proof of concept, final version shouldn't be
much different than this, as internationalization points are
well-known so just 2 C files were modified.
In order to keep the change small, and in order to not bother other
developers with new special issues, this approach needs a custom tool
for messages recollection from source files, similar to pygettext.py,
but scanning C files for PyErr_Format or PyErr_SetString messages.
Looking for messages in .py files would be a little more difficult,
as it would have to look where exceptions are raised.
None of both tools were developed for this draft.
References
[1] http://wiki.python.org/moin/BeginnersGuide
[2] http://python.org.ar/pyar/MensajesExcepcionales
[3] http://translate.sourceforge.net/wiki/pootle/index
[4] http://www.postgresql.org/docs/8.2/static/nls-translator.html
[5] http://www.postgresql.org/docs/8.2/static/locale.html
[6] http://pootle.arpug.com.ar/pootle
[7] http://docs.python.org/library/locale.html
[8] http://python.org.ar/pyar/TracebackInternationalizationProposal?action=AttachFile&do=view&target=python_traceback_i18n_proof_of_concept.diff
[9] http://bugs.python.org/issue16343
[10] http://bugs.python.org/issue16344
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
Attachment moin wiki code: