CS-282
Course Project:
Parsing Mathematics Typeset in TeX
Eylon Caspi
. . .
Prof. Richard Fateman
Abstract
The recognition of mathematics notation by a computer is made
difficult by the two-dimensional nature of the parsing problem as well
as by the richness and ambiguity of the notation. Parsing mathematics
typeset in TeX constitutes a simplified, idealized 2D recognition
problem, allowing the recognition engine to concentrate more on
semantic understanding. Choosing TeX as an input form for mathematics
is immediately desirable for document recognition because of the
availability of many published works in TeX form. It is also
desirable as a linearized, intermediate form emitted by a
mathematically-oriented graphical user interface, as in IBM
TechExplorer. A multi-pass mathematics recognition engine is
described, designed with the intent of transcribing formulas from the
electronic reference A Table of Integrals, Series, and
Products into LISP statements suitable for a computer algebra
system. The engine is currently capable of transcribing 154 of 210
integral and summation formulas in the domain of real, scalar
calculus.
Report
People
The Course
This report describes a course project for
Richard Fateman's
graduate course,
"CS-282: Algebraic Algorithms,"
at U.C.Berkeley's
Computer Science department,
fall 1997 semester.
The course was dedicated to exploring theoretical, algorithmic,
and practical issues in the construction of symbolic computer
algebra systems such as Mathematica, Maple,
Macsyma, Axiom, etc.
Links
Last updated: 10/2/00
Comments to:
eylon@cs.berkeley.edu