[DL] Announcing the W3C SHACL language for constraints on RDF data
Peter F. Patel-Schneider
pfpschneider at gmail.com
Thu Oct 8 17:20:35 CEST 2015
The W3C RDF Data Shapes Working Group has just published its first technical
document. This document
http://www.w3.org/TR/2015/WD-shacl-20151008/
defines the form and meaning of the SHACL language.
SHACL is a language for defining constraints, which SHACL groups into shapes
that control the applicability of the constraints, and determining whether
the data in an RDF graph satisfies these constraints. For example, SHACL
can be used to check whether all the nodes in an RDF graph that have
rdf:type foaf:Person have a foaf:name that is a string and that all of their
values for foaf:parent have rdf:type foaf:Person.
The document is at the First Public Working Draft stage and is thus in no
way a finished product. This is the time in the W3C process when comments
from outside the working group have the best chance of producing large
changes to SHACL, so if you care about constraints over RDF data you should
read the document and send in comments to the working group.
Here are some interesting aspects of SHACL.
Dual purpose of SHACL shapes
A SHACL shape bundles up one or more constraints plus some control
information. For example, the control information on an ex:Human shape
might say that all nodes that have an rdf:type value that is connected to
foaf:Person via a chain of rdfs:subClassOf links are to be validated against
the shape. The ex:Human shape might have several constraints, for example,
that these nodes have to have a value for foaf:name that is a string and
that all of their values for foaf:parent have rdf:type foaf:Person.
SHACL shapes are also used in SHACL constraints. For example, a constraint
might say that all values for foaf:friend have to match the ex:Human shape.
These uses of shapes ignore the control information, so there is no
requirement that the values being checked here have to have a type related
to foaf:Person.
Relationship between SHACL and RDFS
SHACL uses various RDF and RDFS vocabulary and notions. However, SHACL
does not abide by the RDF or RDFS semantics of this vocabulary and these
notions. For example, SHACL does not perform any inferences based on
rdfs:subPropertyOf, rdfs:domain, or rdfs:range. Sometimes, SHACL does not
even perform rdfs:subClassOf inferences.
SHACL also uses rdfs:subClassOf for inheritance between templates, which are
not classes.
Recursion in SHACL
SHACL permits shapes to refer back to themselves. For example, in SHACL one
can say that the nodes that match an ex:FriendlyHuman shape have to have all
their friend values also match the ex:FriendlyHuman shape.
There are some issues related to recursive shapes that the working group has
not finalized. If Joe is a friend of Jill and Jill is a friend of Joe, does
Joe match the human shape? If humans have to have all of their friends not
match the human shape does Joe match the human shape?
Reporting results in SHACL
The main operations in SHACL validate whether certain nodes in an RDF graph
(called focus nodes) satisfy the constraints in SHACL shapes. These
operations produce validation results which document how focus nodes do
not match the constraints. Validation results have a severity level, either
information, warning, or violation. Violation-level results are considered
failures to satisfy the constraint; the other levels are not. These results
are encoded into an RDF graph that is available for perusal when the
operation finishes.
Not matching a constraint that is embedded in another constraint may or may
not contribute to a validation result from the embedding constraint. For
example, a negation constraint produces a violation-level result precisely
when the embedded constraint matches.
Relationship between SHACL and SPARQL
Much of the meaning of SHACL constructs is specified by a translation to
SPARQL. However, the meaning of SHACL recursion cannot be fully specified
in SPARQL. Extending a SPARQL implementation to handle recursion in SHACL
may be difficult. If recursion was not allowed, the entire meaning of SHACL
could be done via a simple translation to SPARQL.
The translations from SHACL to SPARQL used in the document depend on other
extensions of SPARQL, such as maintenance of the identity of blank nodes and
pre-binding of certain variables. If recursion was removed from SHACL,
these extensions could be avoided by using a different translation.
Using SPARQL and other languages in SHACL
SHACL shapes can directly use SPARQL select queries. These queries
implement constraints by building up SPARQL results that are turned into
SHACL validation results. The way that this works is completely specified
and requires that an RDF encoding of the shapes is available during
validation and the presence of particular SPARQL extensions.
Other languages can also be directly used in SHACL shapes. The way that
this works is not specified.
More information about the dl
mailing list