Model Names, Namespaces, and Aliases in SADL

Last revised 02/05/2015.

The Basics

One of the important capabilities of OWL is allow one ontology to import another using owl:imports. In order for this to be possible, each OWL ontology must have an identifier that can be used in the owl:imports statement. Consistent with Semantic Web identification, an ontology is identified by a URI.

SADL model files are converted to OWL model files when the model is saved, provided there are no fatal translation in translating to OWL. Since the OWL model is identified by URI, the SADL model file requires that a valid URI be given after the "uri" keyword as the first non-comment statement in the model. This URI serves as the model name in SADL and can be used to import this model into another model. (This is importing a model by model name; a SADL file can also be imported to another SADL file using the SADL file name.) The model's URI becomes the base URI of the OWL model file. In this sense the model URI identifies an XML namespace in which the concepts of the model are defined. It is also the URI that is used in another OWL file to import this OWL file. (We are currently using OWL 1; OWL 2 modifies the import mechanism by introducing a version IRI and rules about accessing the current versus a previous ontology version.)

A valid URI for a model name is required to have the following two parts:

A Scheme: this is the first element of the URI preceding the colon, e.g., in "http://com.ge.research" the scheme is "http". A SADL model name requires an "http" scheme.
The Path: this is the rest of the URI and the format depends upon the scheme. In a path, the slash character ("/") is reserved for delimiting substrings whose relationship is hierarchical.

In general, a valid URI may contain a hash character ("#"). The hash is reserved as a delimiter to separate the URI of a whole from a fragment identifier. A fragment identifier represents "a part of, fragment of, or a sub-function within, an object" (see http://www.w3.org/Addressing/URL/4_2_Fragments.html). In the case of our generated OWL model, we will use the the model name as the part of the path preceding the hash ("#") and name of a particular concept (class, property, or instance) as the fragment identifier, which we also call else where the local name. For example, the following SADL model generates the OWL model shown in XML/RDF format.

uri "http://sadl.imp/shapes" version "$Revision: 1.3 $ Last modified on $Date: 2015/02/05 13:33:00 $".

Shape is a class, described by area with a single value of type float.

<rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns="http://sadl.imp/shapes#"
        xmlns:owl="http://www.w3.org/2002/07/owl#"
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        xml:base="http://sadl.imp/shapes">
        <owl:Ontology rdf:about="http://sadl.imp/shapes#">
                <owl:versionInfo>$Revision: 1.3 $ Last modified on $Date: 2009/03/06 14:37:54 $</owl:versionInfo>
                <rdfs:comment xml:lang="en">This ontology was created from a SADL file 'shapes.sadl' and should not be edited.</rdfs:comment>
        </owl:Ontology>
        <owl:Class rdf:ID="Shape"/>
        <owl:FunctionalProperty rdf:ID="area">
                <rdfs:domain rdf:resource="http://sadl.imp/shapes#Shape"/>
                <rdfs:subPropertyOf rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/>
                <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/>
                <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/>
        </owl:FunctionalProperty>
</rdf:RDF>

The question of whether to include or not include the hash ("#") on the end of the URI for the whole (the model in this case) seems somewhat unclear--"If the fragment-id is void, the hash sign may be omitted: A void fragment-id with or without the hash sign means that the URL refers to the whole object." (ibid.) We will take the approach of not including the hash when referring to the model as a whole by model name, such as when referencing the model in an owl:imports statement or when the model is mapped in the policy file from the model name (public URI) to actual model location (alternate URL). However, when referring to the namespace of the model, as in the third line of the OWL model above, the hash will be included. When a SADL model imports an OWL file, the inclusion of the hash is beyond our control and we will attempt to properly handle whatever approach is used in the OWL model(s) imported.

A concept in an OWL model is identified by its URI. Because URIs can be rather long and can make models hard to read, the base URI (the namespace) may be replaced with a prefix, also known in SADL as an alias for the model (namespace). The combination of the prefix and the localname or fragment, separated by a colon, is called a Qualified Name or QName. The OWL model above uses the QNames owl:Ontology, rdf:about, owl:versionInfo, etc.

An alias for the name of an imported model allows a shortened although still explicit referencing of imported concepts. For example, if another model imported the model shown above we might wish to reference the Shape class. If no other import contains a concept with local name "Shape" then just referencing Shape is unambiguous and is allowed in SADL but it is conceivable that other imports might include other concepts with local name "Shape". To disambiguate we can use an alias, e.g., "shapes". Then the reference would be "shapes:Shape". For example,

Circle is a type of shapes:Shape.

There are two ways to provide an alias for a model name (namespace). A global alias can be provided in the aliased model itself. For example, the first line of the model above could be

uri "http://sadl.imp/shapes" alias shapes version "$Revision: 1.3 $ Last modified on $Date: 2015/02/05 13:33:00 $".

This will create a global alias in the Jena policy file, ont-policy.rdf (located in the OwlModels folder of a SADL-IDE project).

When a global alias is given as part of a model name, imports of that model do not need an alias. For example, the model above could then be imported with either of these statements:

import "Decl1.sadl".

import "http://sadl.imp/shapes".

It is also possible to specify a local alias, for use only in the importing model, on the import statement line. For example, if the model above has no alias we could import it with either of these statements:

import "Decl1.sadl" as shapes.

import "http://sadl.imp/shapes" as shapes.

This creates a local prefix for that model which can be used to unambiguously refer to concepts from the imported ontology but only within the importing model. The difficulty with using local aliases occurs when there are indirect imports, e.g., model A imports model B with local alias "b", and model B imports model C with local alias "c". If model A uses a concept from model C, its prefix is ambiguous if only local alias are used and there can even be conflicts. For example, suppose model C imports model D with alias "b".

Note that a template is available if content assistance is requested (control-space). The import template will either show the model names (http scheme) of known models or the actual SADL file name (file scheme), depending upon the setting in the SADL preferences (Window -> Preferences -> Sadl).

Ontology Annotations

Just as concepts with a model can be given rdfs:label and rdfs:comment annotations, so can the model itself. As with model concepts, this is done with "alias" and "note" within parentheses. This use of "alias" within parentheses is different from the alias described above: this creates an rdfs:label whereas the "alias" keyword not in parentheses creates an XML namespace prefix. Here is an example of a model URI with annotations:

uri "http://sadl.org/TestSadlIde/OntologyComments" alias Name
    version "$Revision: 1.3 $ Last modified on $Date: 2015/02/05 13:33:00 $"
    (alias "This is a label for the ontology")
    (note "This is a comment on the ontology", "This is a second comment")
    (note "This is a third comment").

Jena TDB, Default and Named Graphs

The approach described above works well when each SADL model file generates an OWL model file. The mapping between an actual OWL model file, identified by a URL (file or http schema) and the model's URI is captured in the Jena policy file, ont-policy.rdf. However, models stored in "flat files" have limited scalability. Therefore one of the OWL model format options available in the SADL-IDE is "Jena TDB". Jena TDB is a triple store capable of efficiently handling much larger models. It supports named graphs and as long as no inference is required, supports execution of SPARQL queries against the triple store without having to create an in-memory OWL model. All of this facilitates greater performance and scalability.

When Jena TDB is chosen as the OWL model format in the SADL-IDE, each model of the project, whether a SADL model with generated OWL, or an OWL file imported by a SADL model, directly or indirectly, but not generated from SADL, is added to the TDB repository as a named graph. The TDB repository is created in a "TDB" subfolder of the OwlModels folder of the project. The Jena policy file is used to find and add models to the TDB repository. It is also used to find the rule files if a SADL file contains rules.

Jena TDB has the concept of a "default graph" as well as named graphs. The SADL-IDE using the OWL model format "Jena TDB" only creates named graphs. However, it is possible to configure TDB so that the union of all the named graphs is available to the query engine as the default graph (see http://jena.apache.org/documentation/tdb/datasets.html). When querying from the SADL environment, the matching of queries to the appropriate TDB graph is handled automatically.

Cached Inferred Models

When a Jena Rules Engine-based reasoner is used in SADL, inference requires that the OWL model be loaded into memory before any reasoning/rule processing can occur. For large models and/or extensive reasoning, this can be quite time and computing resource intensive. In those situations in which the instance data is not dynamic, it can be a significant boost to performance to cache the resulting inferred model for reuse in querying. In those instances it is beneficial to capture the inferred model in a Jena TDB repository so that queries can be executed against the triple store without having to create an in-memory model. When an inferred model is cached in its own TDB repository, the model is stored as the default graph and the URI associated with the model is the namespace identified as the namespace of the [first] input data.

Caching of inferred models is only reasonable if there is some mechanism by which we can determine that the cached model is not made obsolete by new instance data. Several approaches come to mind.

Record the timestamp of the input data and compare this timestamp to that of the TDB repository to determine if the data is newer than the cached model. This is a viable option when the data is in a file whose timestamp is available to the reasoner plug-in for comparison. For example, a SADL test case would fall in this category as the test data is in an OWL file on the local file system so the date of the input data file can be compared with that of the cached inferred model.
Another approach is to save a hash of the serialized input data used to create the cached model. Whenever new data is received a hash is also generated and the new hash is compared to the old hash to see if the instance data is different. If the hash values are the same the cached inferred model may be used. Note that multiple input data sets could be handled by including the hash of the previous data set in each subsequent input data serialization before hashing. Note that for large amounts of input data serializing and hashing could be expensive.
Leave clearing of the cache up to the client and use the cached inferred model until the client clears the cache.

Using a TDB Default Graph as a Named Model in SADL

The previous section introduced the concept of a cached inferred model in a dedicated TDB repository default graph. But what if we wished to access a TDB repository default graph in a more persistent manner? How could we map from a URI or set of URIs to the TDB repository? The default graph can contain concepts from multiple namespaces. To be able to query this default graph, we must have some way to identify the repository (and its default graph) as the "target" of the query. The question probably only makes sense in the context of having multiple TDB repositories, each with a default graph, and wanting to execute a query against the "correct" repository. It seems that the solution is to have a mapping from one or more URIs to the TDB repository, which can be thought of as having an actual file-based URL. One approach might be to have the actual URL associated with one or more URIs in the Jena policy file be the folder containing the repository. Then the query mechanism could check, for a given query graph name or names, the policy file for the actual URL and if that URL is a TDB repository, execute the query against a Dataset opened on the repository.