End-to-end videotext recognition for multimedia content analysis
Abstract
Videotext refers to text superimposed on still images and video frames, and a videotext based Multimedia Description Scheme has recently been adopted into the MPEG-7 standard as one of the normative media content description interfaces. While much of the previous work including ours concentrates on the task of locating and extracting text from the video frames automatically, very little research has focused on reliably recognizing segmented text. The low resolution of videotext, unconstrained font styles and sizes, poor separation of characters often resulting from video compression and decoding, all pose severe problems even to commercial OCRs in recognizing videotext accurately. This paper describes a novel end-to-end video character recognition system featuring new character attributes emphasizing macro shapes, a Support Vector Machine-based character classifier, videotext object synthesis, font context analysis, and temporal contiguity analysis, to successfully address the issues confounding accurate videotext recognition. We present results from our experiments with real video data that demonstrate the strengths of this system.