What is CAT?

Computer-assisted translation (CAT) is a form of translation where human translators use software language translation tools to assist them in the translation of digital content from one language to another. Computer-assisted translation should not be confused with machine-translation technology.

CAT Tools at WHP

WHP uses a majority of major commercial and open-source CAT tools. As a tools agnostic translation agency we are free to choose the most suitable technology for our clients, or use a particular technology requested by our clients.

Our engineers are experts in the creation of custom filters and rules for translating only what needs to be translated in your files. Either through configuration of the CAT tools or by creating preparation tasks to render your files compatible with computer-assisted translation tools.

Why use CAT tools?

CAT tools provide comprehensive features designed to improve the productivity, consistency and quality of human translation. The tools provide a suite of interactive translation features such as spell-checkers; terminology look-up, concordance, translation memory, markup validation, quality assurance, number, date and time auto-formatting, regular-expression and more… All within a translation editing suite designed to aid human translators to manually edit a source language text into a new language.

CAT tools use customizable rules and filters to pre-process common desktop publishing, documentation, spreadsheet, software and markup file formats into editable translation file formats. Filtering exposes only the text that actually requires translation, it’s a crucial point in the translation process – filter errors can introduce translation errors (translation of non-translatable code or markup) in target files and generate false statistics for cost and time analysis. Document formatting, non-translatable images, layout etc… is excluded from the translation file format to concentrate on the translation of source text to the required target language(s). WHP translation engineers are experts in the creation of translation rules and filters to correctly process the plethora of source formats that exist.

The translation editing process in the majority of CAT tools is performed by overwriting the source text with the target translation(s). The source text requiring translation is broken down into small translatable units known as « segments ». Segments are created using several customizable parameters, including punctuation; full-stops, colons, exclamation-, question-marks, paragraph returns and also document usage: headers, titles, sub-titles, list elements, etc…

Clean and accurate segmentation is critical to the successful use of translation memory – the key technology behind the majority of CAT tools. Translation memory technology interactively stores source text segments with the equivalent translated target language segments within a database, building a bilingual map of the document being translated. Each new segment that a translator edits during the translation process invokes the CAT tool to look-up the translation memory database to retrieve previous translations for like source segments. If there are « matches », either identical or similar « fuzzy » matches, then the CAT tool will propose them to the translator or auto-insert them into the editor for acceptance or modification. In the case of similar « fuzzy » matches the translation memory technology will make a percentage assessment based on the number of word differences it finds in the source segments.

Example translation memory segment entry:

EN: The very quick brown fox jumped over the lazy cat!
FR: Très rapide, le renard brun a sauté par-dessus le chat paresseux !

Now, if working in the CAT environment we open for translation a new EN source sentence such as:

"The very quick red fox jumped over the lazy cat!"

The translation memory would automatically propose the translation:

"Très rapide, le renard brun a sauté par-dessus le chat paresseux !"

but with a penalty of 10% (due to a single word difference in a 10 word sentence « brown|red ») making the proposed translation a 90% translation match based on the probalistic record linkage used in the majority of TM tool match lookups.

The CAT tool will also show the two English phrases with the difference highlighted in the new phrase, helping the translator make the required change to the proposed fuzzy match translation to make it a perfect translation

"Très rapide, le renard rouge a sauté par-dessus le chat paresseux !"

Example of translation memory match lookup:

EN: The very quick red fox jumped over the lazy cat!EN: The very quick brown fox jumped over the lazy cat!
FR: Très rapide, le renard brun a sauté par-dessus le chat paresseux ! 90%

On acceptance of the change to the proposed translation the new source and target segment pair are added to the translation memory.

EN: The very quick red fox jumped over the lazy cat!
FR: Très rapide, le renard rouge a sauté par-dessus le chat paresseux !
EN: The very quick brown fox jumped over the lazy cat!
FR: Très rapide, le renard brun a sauté par-dessus le chat paresseux !

Pros for using CAT tools

Dramatically improves a human translators daily work-rate and improves overall translation quality if the tools are pre-loaded with approved and accurate terminology and translation memory material for matching and concordance. Interchange formats exist to exchange translation memories, terminology databases and in some cases segmentation rules between translation-assisted tools for portability, scalability and to avoid lock-in

Cons for using CAT tools

Computer-assisted tools are not free and there are many of them – there are several leading technologies that are not 100% inter-compatible. Even with open formats such as XLIFF, TMX, TBX and SRX there are issues with proprietary online tagging held within TM segments that causes a reduction in leveraging and portability between tools. Tools often use different segmentation rules – SRX was designed to move segmentation rules from tool to tool but few editors have implemented the standard!

For the translation of high level marketing content, localization companies will offer transcreation services instead of translation. This means that there will not be a full correspondence between the source and the target sentence; using a Translation Memory in this context could even generate quality issues, while the transcreator might still benefit from other assets such as a Term Base..