<img src="https://rml-fields.goatcounter.com/count?p=/test-noscript">

RML Logical Views

Draft Community Group Report

Latest published version:
https://w3id.org/rml/lv/spec/
Latest editor's draft:
https://w3id.org/rml/lv/spec
Editors:
(Skemu)
(Ghent University – imec – IDLab)
Authors:
(Skemu)
(Ghent University – imec – IDLab)
(Free University of Bozen-Bolzano)
This Version
https://w3id.org/rml/lv/spec/20250114/
Previous Version
https://w3id.org/rml/lv/spec/20250114/
Website
https://github.com/kg-construct/rml-lv/

Abstract

RML logical views is an extension of the RDF Mapping Language (RML) that increases the language's capability to construct RDF datasets from nested input data, to join data sources (also across data hierarchies), and to handle data sources that mix source formats, by allowing to specify a logical view: a flattened, source format-agnostic view over one or more existing data sources. Additionally, it provides a mechanism to express relationships between data sources, as well as additional information about their fields, through structural annotations.

This document describes RML logical views through definitions and examples.

The version of this document is DRAFT.

Status of This Document

This specification was published by the Knowledge Graph Construction Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

This is an early draft, yet efforts are made to keep things stable.

1. Overview

This section is non-normative.

1.1 Document conventions

We assume readers have basic familiarity with RDF and RML concepts.

In this document, examples assume the following namespace prefix bindings unless otherwise stated:

Prefix Namespace
rml: http://w3id.org/rml/
xsd: http://www.w3.org/2001/XMLSchema#
: http://example.org/

The examples are contained in color-coded boxes. We use the Turtle syntax [Turtle] to write RDF.

1.2 Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, and SHOULD in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Problem

This section is non-normative.

RML Logical Views aims to resolve challenges such as handling hierarchy of nested data, more flexible joining (also across data hierarchies), and handling data sources that mix source formats.

2.1 Nested data structures

References to nested data structures, like JSON or XML, may return multiple values. These values can be composite: they may again contain multiple values. RML-Core defines mapping constructs that produce results by combining the results of other mapping constructs in a specific order. For example, a triples map combines the results of a subject map and a predicate-object map in that order. Another example is a template expression, which combines character strings and zero or more reference expressions in declared order. When mapping constructs produce multiple results, the combining mapping constructs will apply an n-ary Cartesian product over the sets of results, maintaining the order of the mapping constructs. In the case of nested data structures, this may cause the generation of results that do not match the source hierarchy, i.e. do not follow the root-to-leaf paths in the source data, since values are combined irrespective of it.

Furthermore, there is varying expressiveness in data source expression and query languages, and many languages have limited support for hierarchy traversal. For example, JSONPath has no operator to refer to an ancestor in the document hierarchy.

This limits the ability of RML-Core to map nested data.

2.2 Mixed data formats

Data in one format can contain multiple or composite values stored in another format, e.g. a CSV dataset could contain columns containing JSON values. To define the expected form of references to input data RML-Core employs the notion of a reference formulation that is a property of every logical source. However, currently a logical source is limited to having a single reference formulation, meaning mixed format data can only be referenced using a query language that supports just one of the formats.

2.3 Joining of data sources

RML-Core restricts join operations to referencing object maps. Since a referencing object map can only generate an object that is an IRI or blank node subject as specified by a parent triples map, it is not possible to combine data from two sources in one term, use data from a join on another position than the object, or generate a literal using data from a join. Moreover, RML-Core cannot declare join operations correctly across hierarchies.

3. Records

A record is created using an iterator or an expression. Depending on the source type, records might take different forms: for tabular data sources, a record might be a row or a cell; for tree-structured sources like XML, a record might be a node; for document-structured sources like JSON, a record might be a document or property value.

A record MUST have a string representation. It MAY be possible to derive other records from a record using an expression.

For a given record, the evaluation of an expression against it MUST either result in an ordered sequence of records, called the expression values, or throw an error. An expression MUST be valid for the given reference formulation.

Note

3.1 Extending the logical source

A logical iteration MUST have a string representation.

Issue 1

3.2 Record sequences

A record sequence is an ordered sequence of sets of key-value pairs, where each key is a string and each value a record. A record sequence MUST have a finite set of keys that appear in each set in the sequence. In any particular set in a record sequence, the value of a key MAY be a null value.

An iterator defines a record sequence from the iterator's [logical source], called the iterator record sequence. This record sequence has two keys:

4. Logical views

A logical view is a type of abstract logical source that is derived from another abstract logical source by defining fields with data from said abstract logical source.

A logical view (rml:LogicalView) is represented by a resource that MUST contain:

A logical view (rml:LogicalView) has an implicit default reference formulation (rml:referenceFormulation) and logical iterator (rml:iterator), which MUST not be overwritten.

Property Domain Range
rml:viewOn rml:LogicalView rml:LogicalSource
rml:field rml:LogicalView or rml:Field rml:Field
rml:leftJoin rml:LogicalView rml:LogicalViewJoin
rml:innerJoin rml:LogicalView rml:LogicalViewJoin

5. Fields

A field gives a name to data derived from the abstract logical source on which the logical view is defined.

A field (rml:Field) is represented by a resource that MUST contain:

Property Domain Range
rml:fieldName rml:Field xsd:string
rml:field rml:LogicalView or rml:Field rml:Field

There are two types of fields: an expression field and an iterable field.

An expression field (rml:ExpressionField) is a type of expression map. Consequently, an expression field MUST have an expression.

An iterable field (rml:IterableField) is a type of iterable. Consequently, an iterable field MUST have a reference formulation and a logical iterator. If no reference formulation is declared for a field, the reference formulation of the field's parent is implied.

5.1 Field parents

A field MUST have a parent that is either abstract logical source or another field. The parent relation MUST not contain cycles: it is tree-shaped with a logical view as its root. The transitive parents of a field, i.e., the field's parent, the parent of the field's parent, etcetera, are fittingly called the field's ancestors.

5.2 Field names

A field MUST have a declared name that is an alphanumerical string. Fields with the same parent MUST have different declared names. If a field's parent is another field, we distinguish between the field's declared name and the field's name. A field's name is the concatenation of the name of the parent field, a dot ., and the field's declared name.

Note

5.3 Field record sequences and records

A field defines a record sequence, called the field record sequence, that is obtained by consecutively applying the field's expression (in case of an expression field) or the field's reference formulation and a logical iterator (in case of an iterable field) on the parent records, the parent records being the records in the record sequence defined by the field's parent. For a given field, the field record sequence has these keys and corresponding values:

5.4 Field reference formulations

For the evaluation of the expression of an expression field (rml:ExpressionField) on the records of the field's parent, the parent's reference formulation is used. Consequently, the parent of an expression field MUST be an iterable, i.e. an abstract logical source or iterable field.

The default reference formulation of an iterable field (rml:IterableField) is the reference formulation of the field's parent. If the iterable field's parent is an expression field, a reference formulation MUST be declared for the iterable field. Declaring a new reference formulation, i.e. a reference formulation that is different from the reference formulation of the field's parent, is only allowed when the field's parent is an expression field.

Note some columns in the table below have been shortened for brevity.

5.5 Using field names in triples maps

A field reference is a reference expression that references a defined field. A field reference MUST be a defined field name of an expression field, to obtain the records in the field record sequence, or a defined field name followed by the string .# of a field, to obtain the index key of the position of the current entry in the field record sequence. A field reference is a special type of reference expression for which no reference formulation need be defined.

A field reference can be used in expression maps just as any other reference expression.

6. Logical view joins

A logical view join (rml:LogicalViewJoin) is an operation that extends the logical iteration of one logical view (the child logical view) with fields derived from another logical view (the parent logical view).

A logical view join (rml:LogicalViewJoin) MUST contain:

The logical view in the subject position of the join property, fulfills the role of child logical source in the join condition(s) of the logical view join, and is referred to as child logical view.

Property Domain Range
rml:parentLogicalView rml:LogicalViewJoin rml:LogicalView
rml:joinCondition rml:LogicalViewJoin rml:Join
rml:field rml:LogicalViewJoin rml:ExpressionField

6.1 Join types

The join property specifies the join type of the logical view join, i.e. a left join or an inner join.

A left join (rml:leftJoin) is the equivalent of a left (outer) join in SQL, where the child logical view is the left part of the join, and the parent logical view is the right part of the join. If any of the join conditions evaluates to false, the fields from the logical view join in the extended logical iteration contain a null value.

An inner join (rml:innerJoin) is the equivalent of an inner join in SQL. If any of the join conditions evaluates to false, the logical iteration is removed from the child logical view.

6.2 Logical view join examples

6.2.1 Left join

6.2.2 Inner join

6.2.3 Two left joins

7. Structural Annotations

Structural annotations provide a mechanism to express relationships between logical views, as well as additional information about fields.

Each logical view MAY have zero or more structural annotation properties (rml:structuralAnnotation), connecting the logical view to a structural annotation object (i.e., of type rml:StructuralAnnotation).

Property Domain Range
rml:structuralAnnotation rml:LogicalView rml:StructuralAnnotation

Following structural annotations MAY be defined:

All structural annotations of a logical view lv MUST have an on fields property (rml:onFields), linking the structural annotation to a list of field names occurring in lv. Intuitively, property on fields specifies the fields in lv that are involved by the structural annotation. The semantics of this involvement depends on the specific annotation.

Property Domain Range
rml:onFields rml:StructuralAnnotation rdf:List

7.1 Invariance Principle

Structural annotations provide additional information about the data that might be used by the RML processor to optimize the KG construction process. If this additional information is incorrect, then the RML processor might either fail or produce wrong results. When using structural annotations, users should make sure that the following invariance principle is satisfied:

For any source instances, the RDF graph produced by the RML engine using an RML mapping with annotations, and the same RML mapping where annotations have been removed, MUST be the same.

We emphasize that RML engines might exploit structural annotations, as they could totally ignore them. It is responsibility of the user to make sure that the annotations provided are indeed correct (that is, the data complies with the annotations). Sanity checks MAY be provided by the RML engines themselves, but this is not mandatory. Note that providing wrong annotations to an engine that takes into account for annotations, for instance for applying optimizations, could result in a violation of the invariance principle, with unpredictable results.

7.2 IriSafe

An IriSafe structural annotation (rml:IriSafeAnnotation) on fields F indicates that the content of each field in F is IRI safe, that is, each field in F does not contain any character that is not in the iunreserved production in RFC3987.

7.3 PrimaryKey

A PrimaryKey structural annotation (rml:PrimaryKeyAnnotation) on fields (f1, ..., fn) imposes two conditions:

Each logical view MAY specify AT MOST ONE PrimaryKey annotation.

7.4 Unique

The Unique structural annotation (rml:UniqueAnnotation) is analogous to the notion of UNIQUE constraints in databases. Specifically, a Unique annotation on fields (f1, ..., fn) imposes the following condition:

Note that every PrimaryKey annotation is, as a matter of fact, also a Unique annotation.

7.5 NotNull

The NotNull structural annotation (rml:NotNullAnnotation) is analogous to the notion of NOT NULL constraints in databases. Specifically, a NotNull annotation on fields F imposes that each field in F does not contain NULL values.

Note

7.6 ForeignKey

The ForeignKey structural annotation (rml:ForeignKeyAnnotation) is analogous to the notion of foreign key constraint in databases. Specifically, a ForeignKey annotation on fields (f1, ..., fn) , target view lv, and target fields (tf1,...,tfn) imposes the following conditions:

The target view is a logical view specified through the property rml:targetView, whereas the target fields are an RDF list of field names specified through the property rml:targetFields. These two properties are specified as follows:

Property Domain Range
rml:targetView rml:InclusionAnnotation rml:LogicalView
rml:targetFields rml:InclusionAnnotation rdf:List

Therefore, each ForeignKey annotation MUST specify (additionally to the inherited rml:onFields property):

7.7 Inclusion

The Inclusion structural annotation (rml:InclusionAnnotation) is analogous to the notion of inclusion dependency in databases. Specifically, an Inclusion annotation on fields (f1, ..., fn) , target view lv, and target fields (tf1,...,tfn) imposes the following condition:

As for ForeignKey annotation, the target view MUST be a logical view specified through the property rml:targetView, whereas the target fields MUST be an RDF list of field names specified through the property rml:targetFields.

Therefore, each inclusion annotation MUST specify (additionally to the inherited rml:onFields property):

Note

A. References

A.1 Normative references

[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174
[RML-Core]
RML-Core. W3C. 16 February 2024. Draft Community Group Report. URL: https://w3id.org/rml/core/spec
[Turtle]
RDF 1.1 Turtle. Eric Prud'hommeaux; Gavin Carothers. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/