Publication
IEEE TC
Paper

An Autonomous Reading Machine

View publication

Abstract

An unconventional approach to character recognition is developed. The resulting system is based solely on the statistical properties of the language, therefore it can read printed text with no previous training or a priori information about the structure of the characters. The known letter-pair frequencies of the language are used to identify the printed symbols in the following manner. First, the scanned characters are partitioned into distinct groups of similar patterns by means of a distance measure. Each class (at most 26 are permitted) is assigned an arbitrary label, and an intermediate tape, containing these temporary labels of the symbols in the original sequence, is generated. In the second phase of the program, the matrix of bigram frequencies of the labels is compared to a frequency matrix obtained from a large sample of English text. The labels are then assigned alphabetic symbols in such a way that the correspondence between the two matrices is maximized. The method is tested on a 100 100 000-character data set comprising four markedly different fonts. Copyright © 1968 by The Institute of Electrical and Electronics Engineers, Inc.

Date

Publication

IEEE TC

Authors

Share