CS-282 Course Project:

Parsing Mathematics Typeset in TeX


Eylon Caspi . . . Prof. Richard Fateman



Abstract

The recognition of mathematics notation by a computer is made difficult by the two-dimensional nature of the parsing problem as well as by the richness and ambiguity of the notation. Parsing mathematics typeset in TeX constitutes a simplified, idealized 2D recognition problem, allowing the recognition engine to concentrate more on semantic understanding. Choosing TeX as an input form for mathematics is immediately desirable for document recognition because of the availability of many published works in TeX form. It is also desirable as a linearized, intermediate form emitted by a mathematically-oriented graphical user interface, as in IBM TechExplorer. A multi-pass mathematics recognition engine is described, designed with the intent of transcribing formulas from the electronic reference A Table of Integrals, Series, and Products into LISP statements suitable for a computer algebra system. The engine is currently capable of transcribing 154 of 210 integral and summation formulas in the domain of real, scalar calculus.


Report


People


The Course

This report describes a course project for Richard Fateman's graduate course, "CS-282: Algebraic Algorithms," at U.C.Berkeley's Computer Science department, fall 1997 semester. The course was dedicated to exploring theoretical, algorithmic, and practical issues in the construction of symbolic computer algebra systems such as Mathematica, Maple, Macsyma, Axiom, etc.


Links




Last updated: 10/2/00
Comments to: eylon@cs.berkeley.edu