Developing Semantic Web applications requires handling the RDF data model in a programming language. Although a majority of current software is developed in the object-oriented paradigm, programming in RDF is currently triple-based. For relational databases, object-oriented APIs have been long available: frameworks such as Hibernate [2] or ADO.Net provide an automatic mapping from relational databases to object-oriented programming.
Partially inspired by such object-relational mappings, the development of an object-oriented RDF API has been suggested several times [15,16]. Such an API would map RDF Schema (RDFS) classes to programming classes, RDF resources to programming objects and RDF predicates to methods on those objects, containing Person.firstName instead of Resource.getProperty(http://xmlns.com/foaf/0.1/firstName).
In this paper we present an architecture and implementation of such an object-oriented RDF API. In section 2 we examine the differences between the object-oriented paradigm and the RDF model and explain why techniques used in object-relational mapping approaches are not sufficient. We present our solution architecture in section 3 and analyse the suitability of scripting languages for our mapping architecture. Section 4 introduces our implementation ActiveRDF, while section 5 illustrates the integration of ActiveRDF with the web application framework Ruby on Rails. We evaluate our work in section 6, discuss related approaches in section 7 and conclude in section 8.
The conceptual model and semantics of RDF Schema differ substantially from the object-oriented paradigm, more so than the relational paradigm does. In this section we examine these differences and explain why existing mapping approaches do not suffice for Semantic Web data. Although the exact meaning of ``object-oriented'' varies [1], we will consider typical object-oriented features and focus mostly on Java.
The semantics of classes and instances in RDF Schema is open-world and description logics-based while object-oriented type systems are closed-world and constraint-based [10]. This fundamental semantic difference causes six mismatches:
For relational databases several object-relational mappings exist, such as Java Data Objects and Hibernate for Java, ADO.Net for C# and ActiveRecord for Ruby. Most of these mappings follow the Active Domain Object or Active Record pattern [8, p. 160] which abstracts the database, simplifies data access and ensures data consistency.
Although the mapping frameworks differ in how they solve the impedance mismatch between the relational model, which is normalised for fast data retrieval, and the object-oriented model, which captures real-world objects as closely as possible, the general mapping is the same in all frameworks.
Tables are mapped to classes; table columns are mapped as attributes in the class, except for foreign keys which are mapped to object relationships; and every tuple in the relational model is mapped to an object. Intersection tables, which are introduced in the relational model to capture many-to-many relations, are mapped to object relationships (is-a, has-a relations).
To apply the general mapping methodology to RDF data, adjustments are required to address the six identified mismatches listed above. Existing approaches do not address these mismatches since they do not occur in relational data:
We take the second approach: our solution is based on object-oriented scripting languages, where the mismatch between the object-oriented paradigm and RDF is smaller than with compiled object-oriented languages.
This section introduces scripting languages, explains their suitability and introduces our architecture for an object-oriented RDF API.
Dynamic, general-purpose scripting languages such as Perl, Python, and Ruby are typically interpreted, use dynamic typing, have strong meta-programming capabilities (which enable the programmer to alter the semantics of the language) and allow runtime introspection [13]. Through dynamic typing and meta-programming, scripting languages enable us to implement a domain-specific language for RDF(S) data and alleviate the discussed mismatches as follows:
In summary, dynamic scripting languages offer the properties required for a virtual and flexible API for RDF(S) data. Our arguments apply equally well to any dynamic Turing-complete language with these capabilities.
The general principle of our architecture is to represent RDF resources through transparent proxy objects. Each proxy object represents one RDF resource but does not contain any state. All methods (manipulations) on the proxy object are translated into (read or write) queries related to the proxy's RDF resource. Transparent proxy objects are simpler to implement than rich objects that copy the state and data of an RDF resource. Since rich objects often offer better performance, caching data in such rich objects can be implemented as an extension but requires a cache-management policy.
Our architecture consists of four layers, as shown in Fig. 1, that incrementally abstract RDF data into objects.
The object manager maps RDF data to objects and data manipulation to methods. For example, when the application calls a find method or when a new person is created, the mapping layer translates this operation into a query on the data source. The object manager also creates object-oriented classes from RDF Schema classes if schema information is available.
Developers can augment the object-oriented classes with custom methods to provide additional behaviour. Such methods can be overridden in subclasses to define specific behaviour: for example, a toString method might return different results for different kinds of objects. In typical object-oriented systems, the definition in the most specific class is used when multiple method definitions are given.
However, given the multiple inheritance in RDF Schema and the possibility of multiple membership, an additional resolution strategy must be used for methods that are defined multiple times, in classes that have no inheritance relation to each other. Possible solutions are to execute the first-found method definition, to select the most applicable method through a more refined distance-measure, to let the developer explicitly indicate the definition to use, or to raise an error.
The query engine provides an abstract query API that is independent of a specific data source and query language. It is used by the object manager to construct queries for each object manipulation.
The federation manager manages the collection of available data sources, distributes the queries over some or all of these sources and collects their results. The federation manager should, when querying multiple data sources, consolidate the results [7]: similar objects that are identified differently in the different data sources should be merged before the results are returned.
Adapters provide access to a specific RDF data-store by translating generic RDF operations to a store-specific API. Such RDF data-store specific adapters are necessary, because of the absence of a general standardised query language which provides create, read, update, and delete access.
As such adapters are responsible for translating and executing queries from the federation manager into a query language supported by their data source. Each adapter must implement a simple API, which allows new adapters to be added easily.
Adapters do not necessarily wrap RDF data sources, they could also wrap ``legacy'' sources such as desktop application data (as in the Aperture architecture [14]) or relational databases (as in the D2R system [5]), as long as they expose their query results as RDF.
We have implemented the presented architecture in our Ruby library ActiveRDF, which provides a virtual API for managing RDF data in an object-oriented manner. We have reported on an initial implementation earlier [11]. Since then, ActiveRDF was completely re-implemented according to the architecture described above. ActiveRDF is currently implemented in around 600 lines of code; the adapters are written in on average 160 lines of code.
The object manager offers a virtual API to manipulate RDF. This virtual API can be divided into three parts: mapping RDF(S) resources into objects, instance-level methods for manipulating these resources, and class-levels methods for searching resources.
In ActiveRDF every object can be member of many classes. Since Ruby does not allow such multiple membership, we override the built-in Ruby behaviour. All built-in methods that use the class of an object are overridden to rely on the rdf:type(s) in the data source.
Apart from the virtual API, developers can augment the domain model with custom methods. As discussed in section 3.2.1, a search strategy is needed to resolve multiple (clashing) method definitions in classes: as a pragmatic solution our current implementation uses the first-found definition.
Domain-specific methods such as john.age or john.name are not generated but provided virtually: the object manager catches their invocation and translates the method call into a query. Without the object manager's interference, Ruby would throw a MethodNotFound error. Such meta-programming caters for flexibility: as we do not generate the API but ``pretend'' it based on the data available at runtime, we do not need to recompile or regenerate the API when the data changes.
To prevent clashes between similarly-named classes in different libraries we map the RDF namespaces onto the namespace mechanism provided by Ruby. Listing 1 shows how to register a namespace abbreviation for the FOAF namespace and how to create an instance of FOAF::Person.
In this example, the object manager transparently catches the method calls john.knows and friend.name and translates each into a query. Part of this translation is determining the full URI of the predicate for ``knows'' and ``name'', which is straightforward with a unique local part, but ambiguous when different predicates have the same local parts. As discussed in Sect. 2, the schema definition cannot be used to determine which predicate might apply to a certain resource, since the schema does not constrain usage of predicates to classes. For example, every resource can use foaf:name, the resource then simply becomes of type foaf:Person. One might be tempted to use the schema definition and class hierarchy to limit this ambiguity and to find the most relevant property for a resource, but the RDF(S) notion of ``domain'' does not cater for this.
Developers can still use an ambiguous but convenient shorthand, as in Listing 3, but are not guaranteed the desired results since the first matching predicate will be used. Instead, they can explicitly specify the predicate through its namespace, as in Listing 4.
Each such resource manipulation is translated into a query. Invocations that change attribute values are handled similarly, but generate update queries instead of read queries.
Listing 5 demonstrates the dynamic finders. The first shows a search returning all resources named ``John'', the second all thirty-year-olds named ``John''. These finders allow to locate a resource through one or more conjunctive clauses. If the developer requires more complicated queries the Query API can be used.
Listing 6 shows some typical queries. The first query counts the number of distinct predicates used in the dataset, the second one returns all distinct foaf:names of the earlier defined John, and the third one finds all resources mentioning ``apple''.
We have implemented adapters for generic SPARQL endpoints, to the RDF data stores Sesame [6], Jena [17], YARS [9], and Redland [3]. We have also implemented proof-of-concept adapters to desktop application data such as the Evolution email address book (exposed as FOAF data).
We have further developed rdflite, a simple and light-weight RDF store (and adapter) based on SQLite with support for full-text search. We distribute rdflite as an adapter for ActiveRDF to enable simple prototyping without installing a fully-fledged RDF store.
Ruby on Rails is a rapid application development framework for web applications, following the model-view-controller paradigm. Developers are presented with default models, views, and controllers and can adjust these to their domain. The model is usually provided by an existing data-base, the controller implements the business logic in Ruby code and the view is specified using HTML with Ruby code embedded.
Ruby on Rails has two main strengths: on the one hand it provides default application logic for the generic parts of web applications and several helper methods for data manipulation and JavaScript effects, alleviating developers from these tasks. On the other hand, since Ruby on Rails is targeted towards web applications that operate on relational databases, it integrates the business logic with the domain data using the ActiveRecord object-relational mapping: database tables serve as domain models and database tuples become Ruby instances.
We have designed ActiveRDF such that it can serve as a data layer in Ruby on Rails, replacing or augmenting the default ActiveRecord layer. As such, it provides a solution for rapid development of Semantic Web applications, leveraging the large and vibrant community of Ruby on Rails developers with their extensions and plug-ins. We have developed several web applications using ActiveRDF and Ruby on Rails; we will briefly describe two of them:
Using ActiveRDF the integration of Rails with RDF data was straightforward and the development effort was minimal. Most development time was actually dedicated to support different browsers for the views. The models itself are automatically provided as virtual models, the controller (with all application logic) contains around 250 lines of code, and the views contain around 200 lines of HTML, Ruby and JavaScript code.
To allow navigation in arbitrary RDF datasets we have developed a faceted metadata browser. Faceted browsing is a data exploration technique for large datasets. BrowseRDF extends this technique for complete graph-based data and adds algorithms to rank facets automatically based on facet entropy [12].
Again, using ActiveRDF the development effort was minimal once the formal model and the algorithms had been developed: the models are automatically provided, the controller contains around 300 lines of code, and the views contain around 250 lines of HTML, Ruby and JavaScript code.
BrowseRDF currently uses the rdflite data store, but the data source abstraction in ActiveRDF allows us to easily switch to a more scalable RDF store such as YARS or Sesame for larger datasets.
We evaluate ActiveRDF in two ways: a quantitative evaluation to indicate possible performance overhead of our library and a qualitative evaluation to indicate the possible ease-of-use and increased productivity in software development. For practical reasons we have not measured productivity increase directly (as e.g. task completion speed of several similarly qualified programmers with and without ActiveRDF), instead an indication is given through the relatively few lines of codes needed for the applications presented in section 5.
For quantitative evaluation, we compared query execution on Sesame (using various queries and various datasets) using the curl HTTP client (which shows the time needed by the data store for query answering), the Sesame Java API and ActiveRDF. We evaluated nine queries (ranging from selecting all triples to joins over two resources, see Fig. 2) using five different datasets (ranging from 2500-50.000 triples). Each test was first run to warm-boot the server and then repeated ten times. The tests were run on a server with two 1994MHz AMD Opteron 246 processors and 2Gb RAM.
Fig. 3 shows the average response time (including result parsing in Java and ActiveRDF) of each query using curl, Java, and ActiveRDF in a logarithmic scale. It can be seen that for most queries ActiveRDF adds only little overhead. On some queries ActiveRDF seems to perform faster than using curl HTTP, which is probably due to random hardware variations and measurement difficulties in those small response time ranges.
For queries #3, #4 and #5 however the overhead of ActiveRDF is substantial. Because these queries return large amounts of XML results, we suspected the performance to be influenced by the Ruby XML parser. Fig. 4 therefore shows the average response time for same queries but with the JSON result format instead of XML: indeed the response time is on average halved for queries #3 (from to ), #4 (from to ) and #5 (from to ); note that the graphs are in logarithmic scale.
Many RDF APIs exist currently (in various programming languages). Some provide access to one specific RDF store, such as the Jena API [17] or the Sesame API [6], and some are agnostic to the underlying data store, such as RDF2Go. Most of these APIs are generic and triple-based, offering methods such as getStatement and getObject. These are exactly the APIs that we want to abstract from.
The development of an object-oriented API has been attempted in Java in RdfReactor, Elmo and Jastor. These approaches ignore the flexible and semi-structured nature of RDF data and instead: (i) assume the existence of a schema, because they rely on the RDF Schema to generate corresponding classes, (ii) assume the stability of the schema, because they require manual regeneration and recompilation if the schema changes and (iii) assume the conformance of RDF data to such a schema, because they do not allow objects with different structure than their class definition.
We have presented ActiveRDF, an object-oriented library for RDF data written in Ruby. We have analysed why the techniques used in traditional object-relational mapping approaches are not sufficient for the Semantic Web and RDF in particular. Based on a careful examination we have chosen to implement ActiveRDF in an object-oriented scripting languages. Among the advantages of these languages is the dynamic typing of objects, which maps well onto the RDF(S) class membership, meta-programming, which allows us to implement the multi-inheritance of RDF(S), and a relaxation of strict object conformance to class definitions.
ActiveRDF is light-weight and implemented in around 600 lines of code. It can be used with generic SPARQL endpoints, on popular RDF data stores, and with desktop application data. We have designed ActiveRDF such that it can serve as a data layer in Ruby on Rails, replacing or augmenting the default ActiveRecord layer, and providing a solution for rapid development of Semantic Web applications.
We have shown that ActiveRDF adds only little performance overhead, which can probably be decreased by carefully considering the parsing implementation. With its higher abstraction level and integration with Ruby on Rails, ActiveRDF allows the development of Semantic Web applications in relatively few lines of code.