- Poster Title:
- Web Page Classification with Heterogeneous Data Fusion
- Authors:
- Zenglin Xu (The Chinese University of Hong Kong)
- Irwin King (The Chinese University of Hong Kong)
- Michael R. Lyu (The Chinese University of Hong Kong)
- Abstract:
- Web pages are more than text and they contain much contextual and structural information, e.g., the title, meta data, the anchor text, etc., each of which can be seen as a data source or a representation. Due to the different dimensionality and different representing forms of these heterogeneous data sources, simply putting them together would not greatly enhance the classification performance. We observe that via a kernel function, different dimensions and types of data sources can be represented into a common format of kernel matrix, which can be seen as a generalized similarity measure between web pages. In this sense, a kernel learning approach is employed to fuse these heterogeneous data sources. The experimental results on a collection of the ODP database validate the advantages of the proposed method over any single data source and the uniformly weighted combination of heterogeneous data sources.