Terminology extraction for global content management
Abstract
The role of terminology in content management has often been underrated. Term extraction has been identified by the information industry as an area requiring focus. Term extraction benefits both the content authoring and the translation process. Supplying key product terms to translation services several weeks before the actual translation begins reduces translation time, improves translation quality, and saves effort (and thus money) by reducing duplication of work. Getting the key terms ready in a timely manner can be difficult without some automation. This paper describes the process of proposing, designing, developing, and deploying a terminology extraction tool. The tool extracts nouns and noun groups, excludes non-translatable terms and known product terms, and displays a context for each extracted item. This is done based on full parsing of the text with a broad-coverage parser. The tool is made available to users on a Web server.