Dictionaries, dictionary grammars and dictionary entry parsing
Abstract
We identify two complementary p.ro.cesses in. the conversion of machine-readable dmUonanes into lexical databases: recovery of the dictionary structure from the typographical markings which persist on the dictionary distribution tapes and embody the publishers' notational conventions; followed by making explicit all of the codified and ellided information packed into individual entries. We discuss notational conventions and tape formats, outline structural properties of dictionaries, observe a range of representational phenomena particularly relevant to dictionary parsing, and derive a set of minimal requirements for a dictionary grammar formalism. We present a general purpose dictionary entry parser which uses a formal notation designed to describe the structure of entries and performs a mapping from the flat character stream on the tape to a highly structured and fully instantiated representation of the dictionary. We demonstrate the power of the formalism by drawing examples from a range of dictionary sources which have been processedand converted into lexical databases.