Production Trees: A Compact Representation of Parsed Programs
Abstract
Abstract syntax trees were devised as a compact alternative to parse trees, because parse trees are known to require excessive amounts of storage to represent parsed programs. However, the savings that abstract syntax trees actually achieve have never been precisely described because the necessary analysis has been missing. Without it, one can only measure particular cases that may not adequately represent all the possible behaviors. We introduce a data structure, production trees, that are more compact than either abstract syntax trees or parse trees. Further, we develop the necessary analysis to characterize the storage requirements of parse trees, abstract syntax trees, and production trees and relate the size of all three to the size of the program's text. The analysis yields the parameters needed to characterize these storage behaviors over their entire range. We flesh out the analysis by measuring these parameters for a sample of “C” programs. For these programs, production trees were from 1/15 to 1/23 the size of the corresponding parse tree, l/2.7 the size of a 1990 abstract syntax tree, and averaged only 2.83 times the size of the program text. © 1990, ACM. All rights reserved.