Generalizing recognition of an individual dialect in program analysis and transformation
Abstract
We present a novel method for generalizing the recognition capability of a tool from one dialect of a language to other dialects. The framework defines a novel base-dialect-specific map layer for representing program geography as recognised/unrecognised program zones in terms of an annotated preprocessor token stream. Error call stacks generated in the front-end syntax-analysis are also stored in the map for invoking programmable, error handling and repair transformations to obtain recognizable constructs from un-recognized ones. An iterative process is followed, wherein for each transformation, reversion-identifying edits are also stored in the map to provide a handle on (and preserved semantics of) the unchanged, original form. To advance unconstrained dialect variation from the recognised base dialect, the map's edit components and all transformations are defined on text form, for which a novel datatype minimizing transformation conflict is defined denotationally. The datatype, anchored text, gives primacy to locations in the initial sources, independently of the code located thereat. This results in greater commutativity of software transformations. A case study of the approach in generalizing a sequential dialect of C to UPC (Unified Parallel C) is carried out. Copyright 2007 ACM.