Track: XML and Web Data
Paper Title:
A High-Performance Interpretive Approach to Schema-Directed Parsing
Authors:
Abstract:
XML delivers key advantages in interoperability due to its
flexibility, expressiveness, and platform-neutrality. As XML has
become a performance-critical aspect of the next generation of
business computing infrastructure, however, it has become increasingly
clear that XML parsing often carries a heavy performance penalty, and
that current, widely-used parsing technologies are unable to meet the
performance demands of an XML-based computing infrastructure. Several
efforts have been made to address this performance gap through the use
of grammar-based parser generation. While the performance of
generated parsers has been significantly improved, adoption of the
technology has been hindered by the complexity of compiling and
deploying the generated parsers. Through careful analysis of the
operations required for parsing and validation, we have devised a set
of specialized bytecodes, designed for the task of XML parsing and
validation. These bytecodes are designed to engender the benefits of
fine-grained composition of parsing and validation that make existing
compiled parsers fast, while being coarse-grained enough to minimize
interpreter overhead. This technique of using an interpretive,
validating parser balances the need for performance against the
requirements of simple tooling and robust scalable infrastructure.
Our approach is demonstrated with a specialized schema compiler, used
to generate bytecodes which in turn drive an interpretive parser.
With almost as little tooling and deployment complexity as a
traditional interpretive parser, the bytecode-driven parser usually
demonstrates performance within 20% of the fastest fully compiled
solutions.