The final obstacle to wiki tech comm, localization

The final obstacle to wiki techcomm, localizationI recently finished reading Sarah Maddox’s splendid book on technical communication in a wiki environment, Confluence, Tech Comm, Chocolate. It answered most of my concerns on doing tech comm on a wiki, except for one: localization.

So why would you want to use a wiki for tech comm, anyway?

Authoring environment that is available everywhere. It’s browser-based. No need for expensive individual tools (except for graphics, videos etc.).

Turn developers (or even your customers) into authors. It would be a long shot to get developers to contribute directly in Framemaker or DITA XML. However, they could easily edit wiki pages, and technical writers can even receive notifications of all such activity and fix content structure, grammar, and spelling.

Instant feedback. Editing the pages or commenting them on a wiki enables rapid feedback on documentation.

Things wikis can do nowadays

The state-of-the-art documentation suites come with a plethora of features designed to make content creation and reuse as simple as possible. These features have usually not been associated with wikis, which are at their core web pages with an edit button. However, plenty of things are possible on wiki platforms nowadays.

In the examples below, I will focus on the capabilities of Confluence, as I have used that platform the most and Sarah’s book was focused on it as well. Her book details the use of many features in depth (for Confluence 4.0) and Atlassian has also published some details in their own documentation.

Templates and themes. Through the use of templates and themes, it is possible to standardize content (help the developers author content) and customize appearances (think OEM products and different brands, for example).

Tables of contents, FAQs, indices. These can be created automatically with various macros.

Content reuse. With include page and excerpt include macros, it is possible to reuse entire pages or parts of pages in a Confluence environment. Whether this solution is comprehensive enough is questionable though, especially in highly customized, modular environments.

Multi-format publishing. Through the use of various plugins, material from Confluence can be published, for example, as PDF, XML, Word, or various ebook formats (Sarah’s book was written in Confluence!).

Offline use. A wiki can even be taken offline nowadays (Appfire’s Firestarter).

Version control and controlled publishing. If the edit-publish cycle is too simplistic for your needs, there is also a plugin for improved version management (Scroll versions by K15t software).

The problem: localization

So far, so good, but now we are starting to run into trouble. The problems come in two varieties: there are technical issues on how to carry out localization on a wiki, and there are philosophical issues on what sort of localization matches a wiki mindset.

Technical issues with localization in a wiki environment: selecting the content to be translated

The world is increasingly about customization, small batch sizes, and unique features. Companies retain their profitability through modularization and then customization of small parts of the whole. This presents a challenge for documentation, one that is traditionally addressed by component content management systems (CCMS).

In order to reduce translation costs, only the needed content should be translated, nothing else. Therefore, there needs to be a way to select the needed standard and customized modules and compare them to the existing translated content and only send the changed or previously untranslated content to translation. A wiki is in trouble with all of these requirements.

Version control and establishing a link between language versions. In order to be able to determine whether some page has already been translated, it needs to be linked to the same page in other languages in some way. Furthermore, the link needs to be version-sensitive as well, so that any updated content goes to the translation workflow as well. There is no need to include any parts of the old target language version; that material is already stored in the translation memory, so if it is established that the version is old, the new source language version can be sent to translation as is.

Compiling module sets. Links between language versions are the first step. However, there also needs to be a flexible way to compile the module set that needs to be translated. Here, the “one parent” principle becomes an issue: in a DITA environment, for example, various offerings could be modeled as different DITA maps that can easily use the same modules. There is no similar concept in a wiki: it cannot be based on parent-child relationships (only one parent for each child), nor can it be based on labels only (no way to arrange the content). This issue actually affects the use of a wiki for technical communication even in a single language.

Translation of include libraries. If the documentation reuses content from general include libraries, there needs to be a way to determine which libraries are used in order to only translate the needed libraries. Depending on how the libraries are designed, it may be inevitable to translate some unneeded content as well (multiple excerpts on a single page, of which only some are used), or there needs to be an even more complex way to determine the reused content for translation.

Technical issues with localization in a wiki environment: exporting content

Wiki and CAT tools – getting wiki content ready for translation. Basically all technical communication is nowadays translated with computer-assisted translation tools (CAT). What makes these great is the translation database (called a translation memory) that is gathered from all the translations made with the tool, usually either on sentence-level or paragraph-level. This ensures that if the exact same content is translated again, the translation will be the same, and hardly any time is spent on it thus saving a great deal of costs.

Now, in order to use CAT tools, the content needs to be in a format the tool understands. Luckily, CAT tools can be used with any structured content where the translatable segments can be unambiguously identified. For example, any XML format, where either elements or attributes can always be identified as translatable content or non-translatable content, works fine. As Confluence content is nowadays stored in XML format, it can be directly processed in a CAT tool. All that is needed is the XML schema and knowledge of the translatable elements for creating a configuration file. (Hint to Atlassian: you could provide such files or at least the specific information needed to create such files, now that would be customer service.)

Identifying non-translatable elements. So, a simple export out of the system for translation and we’re good to go. Almost, but not quite. In most authoring environments, it is possible to define non-translatable content, even mid-sentence, and the CAT tool then prevents translating it. This is a feature that is currently missing from wiki environments. Then again, this is not exactly a show-stopper, as there are workarounds, such as determining a set of terminology (e.g. product names) that should never be translated, and incorporating this information into the translation environment. This does not ensure that pieces of code, variables etc. are not translated, but makes things a bit better. And let’s face it, even many huge companies do not tag non-translatable content anyway, even when they have the tools to do so.

What exactly is in the export file. There is another complication. If the export file is meant for backup purposes, it may include all sorts of material that should not be translated, such as comments and previous versions of the page. In other cases these may be useful. The export file should be customizable so that in the minimum configuration only the page contents of the current version are included.

Technical issues with localization in a wiki environment: importing content

Importing translated content as new pages linked to the source pages, not overwriting the source pages. After the translator and his CAT tool have worked their magic, the result is a translated XML file. All the data except for the translatable elements remains identical to the original version. The translation then needs to be imported back into the system keeping in mind two considerations: it must not overwrite the original (the XML file cannot contain data that makes the wiki think this is a re-import of the original, or there needs to be a workaround to determine the correct language version) and it must be linked to the correct version of the original page.

Philosophical issues with localization in a wiki environment: source language and the extent of localization

In a traditional technical writing environment, there is one source language. All the other languages are translated from this single source language, and any localization is done during translation to make them a better match for the local culture and conventions.

However, this can change radically in a wiki environment: if people can contribute to the various language versions of the page, how is the source language determined? If the documentation was originally written in English, what is to be done if a number of important additions are made to the German version?

While the changes to language versions could be implemented back to the original version, and then translated into the other languages, the languages would inevitably get out of sync. When a number of changes have been made to the German, Spanish, and Finnish versions, all of them would need to be translated into English and from there to each of the other languages. This becomes hopelessly complicated.

One solution to this issue is simply designating one language as the source and preventing the others from being edited. Unfortunately, this also prevents correcting the other versions easily, but this is somewhat inevitable as any corrections would need to be verified and implemented in the CAT tool in order to use them in future versions as well. Very difficult.

This approach also prevents deep localization of the content that a wiki environment would otherwise be capable of. Who says that the language versions need to be exact copies of one another? Perhaps there are special needs in some market areas that are not present in others! A wiki could, in theory, enable deep localization to the exact area-specific needs, but whether there is a feasible way to accomplish this and also keep the important parts of the content in sync between language versions, I cannot say. If there is, that will change technical communication to the wiki way for good, no other means will be able to compete.

Conclusions

Wikis are getting close to the state where they are ready to be the first choice for technical communication. In some applications, they already are.

If you are doing technical communication in a single language for standard products, a wiki is already the best possible choice. However, there are challenges when it comes to multiple languages, especially if they are combined with modular products. The level of complexity of your application determines whether a wiki is a viable option in these cases.

Picture: People from UP There Everywhere, a global cloud-based advertising agency, collaborating (not necessarily on a wiki) (CC)

Author: Ville Kilkku

I run my own consultancy business, so if you find the ideas on this blog intriguing, contact me at consulting@kilkku.com or call me at +358 50 588 5043 and we can discuss how I can help you solve your business problems. I am currently based in Tornio, Finland, but work globally. Google+

  • Hallo Ville

    Awesome post! Thanks so much for sharing all all this detailed knowledge. I don’t have any experience in translating content yet, although I’m keen to get a project started.

    I absolutely love your point about wikis enabling deep localisation of content in a way that no other tool does. And your points about linking the translated content to the original are very interesting.

    On the point about identifying non-translatable elements: It is possible to do this in Confluence. You could define a “user macro”, which would wrap a bit of text in a tag that identifies it as “not to be translated”. But I suspect that, as you’ve pointed out too, people probably wouldn’t go to all the trouble of tagging the content, even if the tool allows it.

    And thanks too for linking to my book. I’m so glad you’ve enjoyed it. 🙂

    Cheers, Sarah

  • Hey Ville,

    very interesting read!

    Let me add my two cents:

    In the past we did i18n projects with the Scroll DocBook Exporter:
    1. The customer exported the pages they needed to be translated to DocBook and sent the results to the translation company.
    2. The translated DocBook file was imported into a new space with a custom-built DocBook importer.

    This had a few short-comings, such as no fixed relationship between source and target pages and a one way translation, but the customer was happy with that limitation.

    IMHO “deep localisation” would mean to implement the translation and translation management functionality into Confluence – very hard. Also, existing translation tools that lead to a significant reduction of translation cost like a translation memory would -to my knowledge- require to choose one source language.

    Deep localisation might be suitable for community projects, that do not require a strict governance (as most commercial translation projects require).

    Cheers,
    -Stefan

    • Thank you for your contribution!

      You are right that some localization capabilities can be built on wiki platforms in their current state, for example through your plugins. However, while these may be adequate in some use cases, they are not adequate in nearly all of them. Furthermore, as you mentioned in your example, a custom-built importer was needed to make even this, rather simple, workflow possible.

      Regarding translation memories, you are correct in that translation memories are unidirectional. However, a company can have multiple memories at their disposal, and the related terminology management tools are multidirectional.

      So if I have a case where I have content in English, German, and Finnish, I would have translation memories for EN-DE and EN-FI, but I could also add memories for DE-EN and FI-EN.,I could also add memories for DE-FI and FI-DE if needed, although that is not likely as usually the workflow would be from the other languages to English and from there to all the rest.

      I would also have one terminology database EN-DE-FI. This terminology database is a key asset when translating from one of the other languages to the primary language, as the translation memory is much smaller in that direction.

      On the subject of deep localization, I do not think that extensive translation management would be needed in the wiki platform itself, although I do admit that some form of management would be required from the wiki as well.

      One solution that would take the progress in this direction would be the ability to mark “core” content and “non-core” content, and then prevent core content from being modified in translated versions (to ensure that it matches the original), but allow the users to modify and add non-core content. For new versions the core content should then be updated and the local non-core content flagged for review.

      OK, I fully admit that this is more of daydreaming than reality at this point, and sketchy at that, but then again, tech comm localization has never had the need, or the chance, to ponder such a subject, as the previous publishing model was always a one-way model. The possible two-way interaction that a wiki offers can bring forth new paradigms in localization as well.

      • I definitely agree regarding your solution for how to keep the different versions in sync. Although, as you mentioned, it’s not essential for all the information to be exact copies across all languages, there will almost always be “core” content that needs to be maintained for various reasons (legal, consistency/brand etc.) and it would be up to the source language “manager” to control this.