Publication
Proceedings of SPIE - The International Society for Optical Engineering
Paper

Layout and language: An efficient algorithm for detecting text blocks based on spatial and linguistic evidence

View publication

Abstract

The ability to accurately detect those areas in plain text documents that consist of contiguous text is an important pre-process to many applications. This paper introduces a novel method that uses both spatial and linguistic knowledge in an accurate manner to provide an initial analysis of the document. This initial analysis may then be extended to provide a complete analysis of the text areas in the document.