Track: Semantic Web
Paper Title:
Deriving Knowledge from Figures for Digital Libraries
Authors:
Abstract:
Figures in digital documents contain important information. Current
digital libraries do not summarize and index information available
within figures for document retrieval. We present our system on
automatic categorization of figures and extraction of data from 2-D plots.
A machine-learning based method is used to categorize figures into a set of predefined types
based on image features. An automated
algorithm is designed to extract data values from solid line curves in 2-D plots. The semantic type of figures
and extracted data values from 2-D plots can be integrated with textual information within
documents to provide more effective document retrieval services for digital library users. Experimental
evaluation has demonstrated that our system can produce results suitable
for real world use.