Department of Engineering

IT Services

Internationalization

Internationalization (also known as "Internationalisation", and "i18n" - there are 18 letters between the first 'I' and the last 'n', and "NLS") partly concerns support for other natural languages. This document steps through how to support more than one natural language in programs that uses wxWidgets on Linux.

Python

"Internationalizing your programs and modules" and "How to Translate Python Applications with the GNU gettext Module" explain the process. The stages below work on our local Linux system.

In a new folder, create a file called demo.py containing

import gettext
gettext.install('demo','.')
message1 = _('APPLES')
message2='PEARS'
print(message1,"and",message2)
with open(message1, 'w') as fp:
    fp.write(message1)

If you run it, it will print "APPLES and PEARS", and create a file called "APPLES" containing the word "APPLES". The gettext.install line will look for language files appropriate to this program, and it will look in the current folder for them ('.' is unix shorthand for the current folder). At the moment there are no language files. We'll create them soon.

The program uses 2 strings - message1 and message2. Note that 'APPLES' is enclosed in _(...) but 'PEARS' isn't. This will make only 'APPLES' language-dependent. Now run

xgettext --from-code=utf-8 -o demo.po demo.py

This extracts all the strings for translation from "demo.py" , and puts them in a file called "demo.po". You can use a program called "poedit" to add the translations to this file, but any text editor (e.g. gedit) will also work. In the file you'll see (amongst other things)

msgid "APPLES"
msgstr ""

msgstr is going to be the translation of "APPLES". Let's set it to be "BANANAS". Save the file. Now run

msgfmt.py demo.po

to create a machine-readable file called "demo.mo". This file needs to go in a place applicable to the language you want to choose. If you type

    echo $LANG

you'll find out what your default language is. Mine is "en_US.UTF-8" so I'm going to choose that. Create a folder called "en_US.UTF-8", and inside that create a folder called "LC_MESSAGES". Copy the "demo.mo" file into "en_US.UTF-8/LC_MESSAGES".

Now if you run your program it will seek language strings in a file appropriate to your language and the name of your program. It will look in the current folder first, and if it can't find a file, it will look in the system folders ( In /usr/share/locale/ you'll see system folders for many languages). It should find the file you copied, print "BANANAS and PEARS" and create a file called BANANAS containing the text BANANAS.

If you want to use another language (French, say) you can set it just for a single run of your progran by typing

LANG=fr python demo.py

C++

poedit is installed to help identify and translate the text in your C/C++ program that needs to be internationalized. (an alternative, Lokalize, is installed on some systems). If you're writing a wxWidgets application you may find their Internationalization document useful. To see internationalization in action

  • Download the multilang.cc demo code. Save this file as "multilang.cc". It's a minimal wxwidgets program with multi-language support. Some points to note are
    • It creates a locale - a place to store language-related information
    • It uses AddCatalogLookupPathPrefix to define an extra location for the language files (there are several system folders where the program looks by default already, but folders added in this way will be searched first)
    • It tries to load in a "catalog" (a file containing words and their translations) called "foo.mo"
    • Strings are surrounded by _( ... ) for reasons that will become clear later.
  • Compile using
    g++  -c `wx-config --cxxflags` multilang.cc
    g++  -o multilang multilang.o `wx-config --libs --gl_libs`  -lglut -lGL -lGLU
    
  • If you run this without having created a "foo.mo." file first, you should see an "AddCatalog failed" error message, and a window with a menu and 2 labelled buttons.
  • Next, create a file of words to translate. If you type
      xgettext -d foo -s --keyword=_  -o foo.po multilang.cc 
    

    you will create a file called foo.po. Then type

      poedit foo.po
    
    (poedit is a free program that you might not have on your system). It will show you the words that xgettext has extracted from your source code. If you click on "button 1" then type something into the bottom panel, the text will appear the right column and will appear instead of "button 1" in the application. We're not going to translate "button 1" into a foreign language. Instead, type "cabbage" into the bottom panel. Then "Save". You will get an warning message because you've left some word untranslated - don't worry . As well as updating the foo.po file, a foo.mo file has also been created. If you "Close" poedit and you're using version 1.4.6 it might display an assert error message. Just continue. Create a "en" folder and put the "foo.mo" file into it. If you run your ./multilang program now, you'll see that the first button is now labelled "cabbage". You can substitute the other strings in the same way.
    If you don't have poedit, one alternative is to edit the foo.po file using a text editor then convert it using
       msgfmt foo.po -o foo.mo
    

Usually you would produce a file for each natural language that you wanted to support, putting each in a place where files in that language would be sought. Then the user's choice of language will determine which file of substitute words will be read.

Where the language files are

On linux, typing

 
locale -a

shows you which locales are installed. If your program isn't finding the expected files, it could be that it's looking in the wrong places, or the locale files aren't installed. It's rather complicated. Where it looks depends partly on the language setting you have. When I type

  echo $LANG

I get en_GB.UTF-8 meaning that I'm set up to use the GB dialect of the en[glish] language codified using UTF8. If programs can't find language files this specific, they'll look for more general files. This may mean the program having to look in many places. If your program is called multilang and you run

 strace ./multilang

you can find out where language files are being sought (in amongst a lot of other information). I get the following (where '.' means "the current folder")

  • ./en_GB.UTF-8/LC_MESSAGES
  • .
  • ./en_GB.UTF-8
  • /usr/local/share/locale/en_GB.UTF-8/LC_MESSAGES
  • /usr/share/locale/en_GB.UTF-8/LC_MESSAGES
  • /usr/share/locale
  • /usr/share/locale/en_GB.UTF-8
  • ./en_GB/LC_MESSAGES
  • .
  • ./en_GB
  • /usr/local/share/locale/en_GB/LC_MESSAGES/
  • /usr/share/locale/en_GB/LC_MESSAGES
  • /usr/share/locale
  • /usr/share/locale/en_GB/
  • ./en/LC_MESSAGES
  • .
  • ./en
  • /usr/local/share/locale/en/LC_MESSAGES
  • /usr/share/locale/en/LC_MESSAGES/
  • /usr/share/locale
  • /usr/share/locale/en

LANG is only one of a few environmental variables that affect the language. LC_MESSAGES is the variable that determines the language used for standard system messages. If LC_ALL is set, it overrides the value of LC_MESSAGES otherwise LC_MESSAGES default to the value of the LANG environment variable.