Skip to content

Internationalization (i18n) and localization (l10n)

Babel

Marking strings for translation

In order to translate strings, they have to be marked. This ensures that they will be picked up by the extraction process later on. Only strings which are visible in the UI have to be translated.

Python

Warning

Do not use f-strings for translatable text. Interpolation is not supported for these types of strings (see Issue).

Instead, use format:

- _(f"Hi {username}")
+ _("Hi {username}".format(username=username))

Marking strings in python is done via the Flask-BabelEx package. First, make sure to import the package and the *gettext function. We usually import it as _ to be consistent throughout different packages.

from flask_babelex import lazy_gettext as _

Now strings can be marked for translation:

error_message = _("Invalid scheme")

Certain data, identifiers or keywords do not have to be translated. This can be achieved with string interpolation:

_("Invalid {scheme}").format(scheme)

Adding notes for translators helps them to better understand the context where this string will be used:

# NOTE: This is displayed in ....
_("Invalid {scheme}").format(scheme)

Jinja Templates

The gettext function is made available in jinja templates as well:

<p>
    {{ _('Only published versions are displayed.') }}
</p>

JavaScript

Marking strings in JavaScript is done via the i18next module. First, make sure to import i18next from the module.

import { i18next } from "@translations/invenio_app_rdm/i18next";

Now strings can be marked for translation:

<p>{i18next.t("Only published versions are displayed.")}</p>

For strings containg HTML/React nodes, the Trans component is the way to go (full documentation):

import { Trans } from '@translations/i18next';

const title = record.title;
<Trans>
    The record {{ title }} can <b>only</b> be accessed by <b>users specified</b> in the permissions.
</Trans>

For more complex strings, custom variables can be passed to the Trans component as well.

<Trans
    defaults="On <bold>{{ date }}</bold> the record and the files will automatically be made publicly accessible. Until then, the record and the files can <bold>only</bold> be accessed by <bold>users specified</bold> in the permissions."
    values={{ date: fmtDate }}
    components={{ bold: <b /> }}
/>

This will interpolate the nodes from the values and components properties and is a good approach, if the nodes are to be used multiple times.

Singular & Plural

Sometimes, certain words or phrases in a sentence change based on a number or the amount of objects. There is also support for this, by passing the count property.

const records = [{ title: "Record 1" }, { title: "Record 2" },];

<Trans
    i18nKey="numberOfRecords"
    count={records.length}>
    There are {{ count: records.length }} records.
</Trans>

This will generate the following two ids for translation:

{
    "numberOfRecords": "There is one record.",
    "numberOfRecords_plural": "There are {{ count }} records."
}

Given that you must use the reserved property count, the translation is different when having to translate sentences that have multiple (nested) plurals, e.g. 2 views, 1 download. In this case, you will have to (see official documentation here):

  1. in your code, define the translated string as a key only, with all the params:
    i18next.t("viewsAndDownloads", { views: uniqueViews, downloads: uniqueDownloads })
    
  2. make sure that the translation keys are defined, with interpolation, using the count reserved variable:
    {
        "viewsAndDownloads": "$t(views, {\"count\": {{views}} }), $t(downloads, {\"count\": {{downloads}} })",
        "views": "{{count}} view",
        "views_plural": "{{count}} views",
        "downloads": "{{count}} download",
        "downloads_plural": "{{count}} downloads"
    }
    

Lazy vs non-lazy

Coming soon

Storage of strings and timestamps

Strings should be stored in UTF-8. Timestamps should be stored in UTC and localized when displayed.

EDTF localization

The EDTF (Extended Date Time Format) is used when dealing with imprecise dates and for their region specific display.

Python

Here we make use of the babel-edtf module. It allows to pass the locale to the format_edtf function. As always, make sure to import it before you use it.

from babel_edtf import format_edtf

format_edtf('2020-01', locale='en')
'Jan 2020'

It also allows to pass a certain format (one of: short, medium, long, full):

format_edtf('2020-01/2020-09', format='long', locale='en')
'January – September 2020'

Extracting Strings

After strings have been marked for translation, it is now time to extract them. Instructions on how to do so can be found in the respective package in the file located at .tx/config. Depending on the package, only Python or JavaScript may be available for translation.

Python

Configuration

  • In the root folder of your package update the .tx/config file:
[main]
host = https://www.transifex.com

# python conf
[<parent_namespace_of_your_transifex_projext>.<package_name>-messages]
file_filter = <package_name>/translations/<lang>/LC_MESSAGES/messages.po
source_file = <package_name>/translations/messages.pot
source_lang = en
type = PO
  • also in the root folder add babel.ini if not there:
# Extraction from Python source files
[python: **.py]
encoding = utf-8

# Extraction from Jinja2 templates
[jinja2: **/templates/**.html]
encoding = utf-8
extensions = jinja2.ext.autoescape, jinja2.ext.with_
  • also in the root folder add these configuration line in setup.cfg if not there:
[compile_catalog]
directory = <package_name>/translations/
use-fuzzy = True
  • remember to add this line for MANIFEST.in
recursive-include <package_name> *.po *.pot *.mo
  • also the entypoint in setup.py
    entry_points={
    ....
            'invenio_i18n.translations': [
            'messages = <package_name>',
        ],
    },
  • remember to add your new language to the instance configuration by modifying I18N_LANGUAGES variable, e.g. I18N_LANGUAGES = [('de', _('German'))]
Usage

In the root folder of the package run

python setup.py extract_messages

This command will gather all the marked strings from .py and jinja template files, group them by their ID and place everything in the source file specified in the config (usually at <package_name>/translations/messages.pot).

Push translation sources (.pot and .po) to transifex. It will add the missing strings to the translation awaiting list.

tx push -s -t

NOTE: When there is no resource for this package on transifex, one will be created automatically on the first push.

For the next step you need to have some translations done on the Transifex website. Otherwise next step will not be successfully done, because there will be no translated content to work with. Translations are stored under the namespace defined in your .tx/config file.

Pull translations from Transifex:

tx pull -l <lang>

JavaScript

Configuration

Navigate to the <package_name>/theme/assets/semantic-ui/translations/<package_name> folder. Create following folder and file structure:

|-messages/
           |-index.js
|-scripts/
           |-compileCatalog.js
           |-initCatalog.js
|-i18next.js
|-i18next-scanner.config.js
|-package.json

in the root folder of your package update the .tx/config file:

[main]
host = https://www.transifex.com

# React conf
[<parent_namespace_of_your_transifex_projext>.<package_name>-messages-ui]
file_filter = <package_name>/assets/semantic-ui/translations/<package_name>/messages/<lang>/messages.po
source_file = <package_name>/assets/semantic-ui/translations/<package_name>/translations.pot
source_lang = en
type = PO

where:

  • i18next.js is your translation configuration
  • i18next-scanner.config.js is translation string scanner configuration
  • compileCatalog.js is a script transforming Transifex *.po output translations into .json files used by React
  • initCatalog.js is a script initializing a new language configuration
  • package.json configuration and dependencies of the translation mini-app

example content of above files available here:

Usage

Install i18n dev dependencies via (dependencies from package.json)

npm install

Now run to add a new language

npm run init_catalog lang <your_new_lang_two_letters_abbreviation>

Now run

npm run extract_messages

This command will gather all the marked strings from .js files, group them by their ID and place everything in the source file specified in the config (usually at <package_name>/theme/assets/semantic-ui/translations/<package_name>/translations.pot).

Update the ./messages/index.js file to add the configuration of your new language to the React translation configuration map

import TRANSLATE_<lang> from './<lang>/translations.json'
export const translations = {
...rest,
<lang>: { translation: TRANSLATE_<lang> }
}

If you don't already have it, install the python transifex client. You will need also the transifex.com account and your private API key

pip install transifex-client

Push translation sources (.pot and .po) to transifex. It will add the missing strings to the translation awaiting list.

tx push -s -t

NOTE: When there is no resource for this package on transifex, one will be created automatically on the first push.

For the next step you need to have some translations done on the Transifex website. Otherwise next step will not be successfully done, because there won't be no translated content to work with. Translations are stored under the namespace defined in your .tx/config file Pull translations from Transifex:

tx pull -l <lang>

You can use -f to overwrite existing .po files. After this step you should see new .po files pulled to package_name>/theme/assets/semantic-ui/translations/<package_name>/messages/<lang>/ folder. If there are no new files, the translations were empty.

Compile the .json files for React to use.

npm run compile_catalog
npm run compile_catalog lang <lang> # for specific language

Invenio-CLI

In addition, if you want to translate strings from your invenio.cfg file or the code in your instance's site folder you can do so with the invenio-cli. Note that this workflow also enables you to overwrite translations from any other existing module's string.

Extract messages catalog

In your instance there is a translations folder. The first step would be to extract the strings to a message catalog:

invenio-cli translations extract

Running this command will generate the corresponding .pot file:

/translations
 |- babel.ini
 |- messages.pot

Absolute Paths in Localization Files

Absolute paths in messages.po and messages.pot are comments for human reading context. They don't affect the compiled .mo file or app behavior.

Initialize a message catalog for a locale

The second step is to initialize the catalog for the specific locale of your choice. These locales are the keys of the I18N_LANGUAGES configuration variable. For example, for German would be:

# invenio.cfg file
I18N_LANGUAGES = [
    ('de', _('German')),
]
invenio-cli translations init -l de

Running this command will produce the corresponding .po files:

/translations
 |- babel.ini
 |- messages.pot
 |- de/LC_MESSAGES
     |- messages.po

Compile the catalogs

Finally, you need to compile the catalogs. That can be achieved with the translations command. However, this process is also a part of the setup process so it will be executed when running invenio-cli services setup (or its containerized counter part).

invenio-cli translations compile

Running this command will produce the corresponding .mo files:

/translations
 |- babel.ini
 |- messages.pot
 |- de/LC_MESSAGES
     |- messages.po
     |- messages.mo