PEP: Title: Traceback internationalization Version: $Revision$ Last-Modified: $Date$ Author: Mariano Reingart <reingart EN gmail PUNTO com> Status: Draft Type: Standards Track Content-Type: text/plain Created: 15-May-2010 Post-History: Abstract The idea is to provide a standard mechanism to translate exception and traceback messages to languages other than English. To not reinvent the wheel, this proposal is based on i18n (gettext), to ease translation and colaboration between language communities, use compatible tools that already exists, favor correctness with online review, and has proven to be useful in other projects with similar goals (like error messages in PostgreSQL). Also, a method to control LC_MESSAGES to get back to English must be provided, as some code depends on exception messages (i.e. doctests), preferably changeable at runtime, so this libraries are self-aware of this issues and no manual intervention is required. Rationale English isn't the first language of some users (developers), so they may not be confortable with messages in that language [1] (and some don't understand it at all). Ask them to learn a second language (mostly foreign) to use Python is, at least, not practical. This is specially an issue when Python is used as a First Programming Language for teaching to non-English speakers in almost any educational level, even worse in other areas not related directly with computer sciences. Although a workaround may be developed (ie. a wiki page with translated errors, as we did in Argentina [2]), that users are often blocked when they found an English message, losing they concentration on their work, having to waste time finding the translation (if it exists) or asking to the teacher or mailing lists. In the other side, advanced users often prefer original messages in English, and they are required in some scenarios like bug reporting or doctests, so a method must be provided to change the messages language (preferably at runtime). Using standards tools for i18n (gettext) will ease translation providing a common framework that already is prepared for different language rules, with colaborative online applications like Pootle[3] to automate translation and review process, tending to a high quality result. Other projects have chose this way some time ago, citing PostgreSQL as an example[4], where, at runtime, you can choose error messages language using LC_MESSAGES[5]. Indeed, we are using Pootle and other tools to translate PostgreSQL related projects to Spanish in a collaborative way [6]. Finally, this proposal will address current misbehavior with locale.LC_MESSAGES category (according the Python Standard Library Documentation of locale module) [7]: Locale category for message display. Python currently does not support application specific locale-aware messages. Messages displayed by the operating system, like those returned by os.strerror() might be affected by this category. Usage Setting the desired locale (ie. 'es_AR') in LC_MESSAGES category will enable internationalization of tracebacks and exceptions, and setting 'C' locale will get back to untranslated original messages: Examples: Python 3.4.0a0 (default:8f0d5ecca524+, Oct 28 2012, 00:46:34) [GCC 4.6.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> 1/0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: division by zero >>> import locale >>> locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8') 'es_AR.utf8' >>> 1/0 Traza de rastreo (llamada más reciente última): Archivo "<stdin>", línea 1, en <module> ZeroDivisionError: división por cero >>> locale.setlocale(locale.LC_MESSAGES,'C') 'C' >>> 1/0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: division by zero By default, LC_MESSAGES should be 'C' locale, to prevent any misunderstanding. The user that needs translated messages could easily add a line or setting LC_MESSAGES in his desired language: import locale; locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8') Caveats Internationalization uses UTF-8 to be able to handle special characters like accents. This should not be a problem in Python 3 but some functions may be revised like PyUnicode_FromFormatV() [9] Special care must be taken with positional placeholders like in: "name '%.200s' is not defined". If there is more than one placeholder, using printf special format specifiers (ie. %2$s %1$s) or an alternate string formatting system should be required in order to allow to change their position in the string (this may be required by some languages rules in some contexts). Reference Implementation A proof of concept is attached to issue #16344 [10] for Python 3.3+ Original -obsolete- version (for python 2.x) can be downloaded from Python Argentina Wiki [8] It defines a Py_GETTEXT macro that is called from PyErr_SetString and PyErr_Format (errors.c) and tb_displayline, PyTraceBack_Print (traceback.c). A new subdirectory called Locale stores localized message files, but this could be installed in a standard system directory (i.e. /usr/share/locale) as a special domain called "python" is used to not interfere with python modules / libraries / packages already using gettext. Some steps are required to set up internationalization correctly: 1. locale.bind_textdomain_codeset("python", "utf8") should be called in pythonrun.c to initialize encoding (preventing nested unicode exceptions if internationalization is not correctly) 2. locale.bindtextdomain("python", sysconfig._safe_realpath("Locale")) should be called in site.py to specify the locale directory (not needed if a standard directory is used, this would be platform dependent) 3. locale.setlocale(locale.LC_MESSAGES,'es_AR.utf8') should be executed by the end user to finally enable internationalization Although it is just a proof of concept, final version shouldn't be much different than this, as internationalization points are well-known so just 2 C files were modified. In order to keep the change small, and in order to not bother other developers with new special issues, this approach needs a custom tool for messages recollection from source files, similar to pygettext.py, but scanning C files for PyErr_Format or PyErr_SetString messages. Looking for messages in .py files would be a little more difficult, as it would have to look where exceptions are raised. None of both tools were developed for this draft. References [1] http://wiki.python.org/moin/BeginnersGuide [2] http://python.org.ar/pyar/MensajesExcepcionales [3] http://translate.sourceforge.net/wiki/pootle/index [4] http://www.postgresql.org/docs/8.2/static/nls-translator.html [5] http://www.postgresql.org/docs/8.2/static/locale.html [6] http://pootle.arpug.com.ar/pootle [7] http://docs.python.org/library/locale.html [8] http://python.org.ar/pyar/TracebackInternationalizationProposal?action=AttachFile&do=view&target=python_traceback_i18n_proof_of_concept.diff [9] http://bugs.python.org/issue16343 [10] http://bugs.python.org/issue16344 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
Attachment moin wiki code: