RML Logical Views

Abstract

RML logical views is an extension of the RDF Mapping Language (RML) that increases the language's capability to construct RDF datasets from nested input data, to join data sources (also across data hierarchies), and to handle data sources that mix source formats, by allowing to specify a logical view: a flattened, source format-agnostic view over one or more existing data sources. Additionally, it provides a mechanism to express relationships between data sources, as well as additional information about their fields, through structural annotations.

This document describes RML logical views through definitions and examples.

The version of this document is DRAFT.

Prefix	Namespace
`rml:`	http://w3id.org/rml/
`xsd:`	http://www.w3.org/2001/XMLSchema#
`:`	http://example.org/

This section is non-normative.

RML Logical Views aims to resolve challenges such as handling hierarchy of nested data, more flexible joining (also across data hierarchies), and handling data sources that mix source formats.

References to nested data structures, like JSON or XML, may return multiple values. These values can be composite: they may again contain multiple values. RML-Core defines mapping constructs that produce results by combining the results of other mapping constructs in a specific order. For example, a triples map combines the results of a subject map and a predicate-object map in that order. Another example is a template expression, which combines character strings and zero or more reference expressions in declared order. When mapping constructs produce multiple results, the combining mapping constructs will apply an n-ary Cartesian product over the sets of results, maintaining the order of the mapping constructs. In the case of nested data structures, this may cause the generation of results that do not match the source hierarchy, i.e. do not follow the root-to-leaf paths in the source data, since values are combined irrespective of it.

Furthermore, there is varying expressiveness in data source expression and query languages, and many languages have limited support for hierarchy traversal. For example, JSONPath has no operator to refer to an ancestor in the document hierarchy.

This limits the ability of RML-Core to map nested data.

Example 1

It is not possible to declare the construction of below output triples from below data source with RML-Core.

{
  "people": [
    {
      "name": "alice",
      "items": [
        {
          "type": "sword",
          "weight": 1500
        },
        {
          "type": "shield",
          "weight": 2500
        }
      ]
    },
    {
      "name": "bob",
      "items": [
        {
          "type": "flower",
          "weight": 15
        }
      ]
    }
  ]
}

:person/alice/item/sword :hasName "sword" ;
  :hasWeight 1500 .
:person/alice/item/shield :hasName "shield" ;
  :hasWeight 2500 .
:person/bob/item/flower :hasName "flower" ;
  :hasWeight 15 .

Data in one format can contain multiple or composite values stored in another format, e.g. a CSV dataset could contain columns containing JSON values. To define the expected form of references to input data RML-Core employs the notion of a reference formulation that is a property of every logical source. However, currently a logical source is limited to having a single reference formulation, meaning mixed format data can only be referenced using a query language that supports just one of the formats.

Example 2

It is not possible to declare the construction of below output triples from below data source with RML-Core.

name,item  
alice,"{""type"":""sword"",""weight"": 2500}" 
alice,"{""type"":""shield"",""weight"": 1500}"  
bob,"{""type"":""flower"",""weight"": 15 }"

:person/alice :hasItem "sword" , "shield" .
:person/bob :hasItem "flower" .

RML-Core restricts join operations to referencing object maps. Since a referencing object map can only generate an object that is an IRI or blank node subject as specified by a parent triples map, it is not possible to combine data from two sources in one term, use data from a join on another position than the object, or generate a literal using data from a join. Moreover, RML-Core cannot declare join operations correctly across hierarchies.

Example 3

It is not possible to declare the construction of below output triples from below two data sources with RML-Core.

name,id
alice,123
bob,456
tobias,789

name,item_type
alice,sword
alice,shield
bob,flower

:person/123 :hasItem "sword", "shield" . 
:person/456 :hasItem "flower" .

A record is created using an iterator or an expression. Depending on the source type, records might take different forms: for tabular data sources, a record might be a row or a cell; for tree-structured sources like XML, a record might be a node; for document-structured sources like JSON, a record might be a document or property value.

A record MUST have a string representation. It MAY be possible to derive other records from a record using an expression.

Example 4

name,birthyear
alice,1995
bob,1999
tobias,2005

:csvSource a rml:LogicalSource ;
  rml:source :csvFile ;
  rml:referenceFormulation rml:CSV .

The default iterator for CSV files is row-based iteration, skipping the header row. Therefore, in this example, the iterator of the logical source :csvSource defines three records:

alice,1995

and

bob,1999

and

tobias,2005

Example 5

<People>
  <Person name="cindy">
    <Friends>
      <Person name="dave" />
      <Person name="edmund" />
    </Friends>
  </Person>
  <Person name="fred">
    <Friends>
    </Friends>
  </Person>
</People>

:xmlSource a rml:LogicalSource ;
  rml:source :xmlFile ;
  rml:referenceFormulation rml:XPath ;
  rml:iterator "/People/Person" .

In this example, the XPath expression /People/Person is used as <a data-cite="RML-Core#dfn-iterator>logical iterator. The sequence of records it defines are the Person nodes on the first level in the document that have string representations

<Person name="cindy">
  <Friends>
    <Person name="dave"/>
    <Person name="edmund"/>
  </Friends>
</Person>

and

<Person name="fred">
  <Friends>
  </Friends>
</Person>

Example 6

{
  "people": [
    {
      "name": "alice",
      "items": [
        {
          "type": "sword",
          "weight": 1500
        },
        {
          "type": "shield",
          "weight": 2500
        }
      ]
    },
    {
      "name": "bob",
      "items": [
        {
          "type": "flower",
          "weight": 15
        }
      ]
    }
  ]
}

:jsonSource a rml:LogicalSource ;
  rml:source :jsonFile ;
  rml:referenceFormulation rml:JSONPath ;
  rml:iterator "$.people[*]" .

In this example, the JSONPath expression $.people[*] is used as iterator. The sequence of records it defines are the elements of the people array in the document that have string representations.

{
  "name": "alice",
  "items": [
    {
      "type": "sword",
      "weight": 1500
    },
    {
      "type": "shield",
      "weight": 2500
    }
  ]
}

and

{
  "name": "bob",
  "items": [
    {
      "type": "flower",
      "weight": 15
    }
  ]
}

For a given record, the evaluation of an expression against it MUST either result in an ordered sequence of records, called the expression values, or throw an error. An expression MUST be valid for the given reference formulation.

Note

Example 9

Continuing Example 6, the reference $.items[*] would define the length-two sequence {"type": "sword", "weight": 1500}, {"type": "shield", "weight": 2500} from the record { "name": "alice", "items": [{"type": "sword", "weight": 1500}, {"type": "shield", "weight": 2500}]} and the length-one sequence {"type": "flower", "weight": 15} from the record { "name": "bob", "items": [{"type": "flower", "weight": 15}]}.

The references $.type can be used to refer to data inside the records in the sequences defined by $.items[*]. Doing so would define the three length-one sequences "sword", "shield" and "flower".

The reference $.nonsense would define empty sequences of record since the nonsense attribute does not occur in the data. The reference [unbalanced would give an error, since it is not a valid relative JSONPath expression.

A logical iteration MUST have a string representation.

Issue 1

A record sequence is an ordered sequence of sets of key-value pairs, where each key is a string and each value a record. A record sequence MUST have a finite set of keys that appear in each set in the sequence. In any particular set in a record sequence, the value of a key MAY be a null value.

An iterator defines a record sequence from the iterator's [logical source], called the iterator record sequence. This record sequence has two keys:

An index key # with as corresponding values the position of the current entry in the sequence defined by the iterator.
A key <it> with as corresponding values the records in the sequence defined by the iterator.

Example 10

{
  "people": [
    {
      "name": "alice",
      "items": [
        {
          "type": "sword",
          "weight": 1500
        },
        {
          "type": "shield",
          "weight": 2500
        }
      ]
    },
    {
      "name": "bob",
      "items": [
        {
          "type": "flower",
          "weight": 15
        }
      ]
    }
  ]
}

:jsonSource a rml:LogicalSource ;
  rml:source :jsonFile ;
  rml:referenceFormulation rml:JSONPath ;
  rml:iterator "$.people[*]" .

`#`	`<it>`
0	`{ "name": "alice", "items": [ { "type": "sword", "weight": 1500 }, { "type": "shield", "weight": 2500 } ] }`
1	`{ "name": "bob", "items": [ { "type": "flower", "weight": 15 } ] }`

A logical view is a type of abstract logical source that is derived from another abstract logical source by defining fields with data from said abstract logical source.

A logical view (rml:LogicalView) is represented by a resource that MUST contain:

exactly one view on property (rml:viewOn), whose value is a abstract logical source (rml:AbstractLogicalSource),
at least one field property (rml:field), whose value is a field (rml:Field).
zero or more join properties (rml:leftJoin, rml:innerJoin), whose value is a logical view join (rml:LogicalViewJoin).

A logical view (rml:LogicalView) has an implicit default reference formulation (rml:referenceFormulation) and logical iterator (rml:iterator), which MUST not be overwritten.

Property	Domain	Range
`rml:viewOn`	`rml:LogicalView`	`rml:LogicalSource`
`rml:field`	`rml:LogicalView` or `rml:Field`	`rml:Field`
`rml:leftJoin`	`rml:LogicalView`	`rml:LogicalViewJoin`
`rml:innerJoin`	`rml:LogicalView`	`rml:LogicalViewJoin`

A field gives a name to data derived from the abstract logical source on which the logical view is defined.

A field (rml:Field) is represented by a resource that MUST contain:

exactly one field name property (rml:fieldName), that specifies the name of the field
zero or more field properties (rml:field), to describe nested field, also of the type rml:Field

Property	Domain	Range
`rml:fieldName`	`rml:Field`	`xsd:string`
`rml:field`	`rml:LogicalView` or `rml:Field`	`rml:Field`

There are two types of fields: an expression field and an iterable field.

An expression field (rml:ExpressionField) is a type of expression map. Consequently, an expression field MUST have an expression.

An iterable field (rml:IterableField) is a type of iterable. Consequently, an iterable field MUST have a reference formulation and a logical iterator. If no reference formulation is declared for a field, the reference formulation of the field's parent is implied.

A field MUST have a parent that is either abstract logical source or another field. The parent relation MUST not contain cycles: it is tree-shaped with a logical view as its root. The transitive parents of a field, i.e., the field's parent, the parent of the field's parent, etcetera, are fittingly called the field's ancestors.

A field MUST have a declared name that is an alphanumerical string. Fields with the same parent MUST have different declared names. If a field's parent is another field, we distinguish between the field's declared name and the field's name. A field's name is the concatenation of the name of the parent field, a dot ., and the field's declared name.

Example 11

In this example a field with declared name "name" is declared on the logical source from Example 10 and added to the logical view. The parent of the field with declared name "name" is the logical source :jsonSource.

:jsonView a rml:LogicalView ;
  rml:viewOn :jsonSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name" ;
    rml:reference "$.name" ;
  ] .

#	`<it>`	name.#	name
0	`{ "name": "alice", "items": [ { "type": "sword", "weight": 1500 }, { "type": "shield", "weight": 2500 } ] }`	0	alice
1	`{ "name": "bob", "items": [ { "type": "flower", "weight": 15 } ] }`	0	bob

Note

A field defines a record sequence, called the field record sequence, that is obtained by consecutively applying the field's expression (in case of an expression field) or the field's reference formulation and a logical iterator (in case of an iterable field) on the parent records, the parent records being the records in the record sequence defined by the field's parent. For a given field, the field record sequence has these keys and corresponding values:

An index key {fieldName}.# with as values the position of the current entry in the field record sequence.
A key {fieldName} with as values the records in the field record sequence.

Example 12

In this example a field with declared name "item" is added to the logical view from Example 11. Additionally a nested field "type" and a nested field "weight" are added to the "item" field, .

:jsonView a rml:LogicalView ;
  rml:viewOn :jsonSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name" ;
    rml:reference "$.name" ;
  ] ;
  rml:field [
    a rml:IterableField ;
    rml:fieldName "item" ;
    rml:iterator "$.items[*]" ;
    rml:field [
      a rml:ExpressionField ; 
      rml:fieldName "type" ;
      rml:reference "$.type" ;
    ] ;
    rml:field [
      a rml:ExpressionField ; 
      rml:fieldName "weight" ;
      rml:reference "$.weight" ;
    ] ;
  ] .

Note some columns in the table below have been shortened for brevity.

--> --> -->

#	`<it>`	name	item.#	item	item.type	item.weight
0	`{...}`	alice	0	`{...}`	sword	1500
0	`{...}`	alice	1	`{ "type": "shield", "weight": 2500 }`	shield	2500
1	`{...}`	bob	0	`{ "type": "flower", "weight": 15 }`	flower	15

For the evaluation of the expression of an expression field (rml:ExpressionField) on the records of the field's parent, the parent's reference formulation is used. Consequently, the parent of an expression field MUST be an iterable, i.e. an abstract logical source or iterable field.

The default reference formulation of an iterable field (rml:IterableField) is the reference formulation of the field's parent. If the iterable field's parent is an expression field, a reference formulation MUST be declared for the iterable field. Declaring a new reference formulation, i.e. a reference formulation that is different from the reference formulation of the field's parent, is only allowed when the field's parent is an expression field.

Example 13

In this example a logical view is defined on a logical source with reference formulation rml:JSONPath. The field with declared name "items" is evaluated using this reference formulation. The nested field with declared name "item" has a declared reference formulation rml:CSV and CSV row as implicit iterator. Its records are a sequence of logical iterations defined by its iterator. The nested fields with declared name "type" and "weight" are evaluated using the reference formulation rml:CSV from their parent field with declared name "item".

{
  "people": [
    {
      "name": "alice",
      "items": "type,weight\nsword,1500\nshield,2500"
    },
    {
      "name": "bob",
      "items": "type,weight\nflower,15"
    }
  ]
}

:mixedJSONSource a rml:LogicalSource ;
  rml:source :mixedJSONFile ;
  rml:referenceFormulation rml:JSONPath ;
  rml:iterator "$.people[*]" .

:mixedJSONView a rml:LogicalView ;
  rml:viewOn :mixedJSONSource ;
  rml:field [ 
    a rml:ExpressionField ;
    rml:fieldName "items" ;
    rml:reference "$.items" ;
    rml:field [ 
      a rml:IterableField ; 
      rml:referenceFormulation rml:CSV;
      rml:field [ 
        a rml:ExpressionField ; 
        rml:fieldName "type" ;
        rml:reference "type" ;
      ] ;
      rml:field [ 
        a rml:ExpressionField;
        rml:fieldName "weight" ;
        rml:reference "weight" ;
      ] ;
    ] ; 
  ] .

Note some columns in the table below have been shortened for brevity.

#	<it>	items	items.#	item	item.#	item.type	item.type.#	item.weight	item.weight.#
0	`{...}`	type,weight\nsword,1500\nshield,2500	0	sword,1500	0	sword	0	1500	0
0	`{...}`	type,weight\nsword,1500\nshield,2500	0	shield,2500	1	shield	0	2500	0
1	`{...}`	type,weight\nflower,15	0	flower,15	0	flower	0	15	0

A field reference is a reference expression that references a defined field. A field reference MUST be a defined field name of an expression field, to obtain the records in the field record sequence, or a defined field name followed by the string .# of a field, to obtain the index key of the position of the current entry in the field record sequence. A field reference is a special type of reference expression for which no reference formulation need be defined.

A field reference can be used in expression maps just as any other reference expression.

Example 14

:jsonView a rml:LogicalView ;
  rml:viewOn :jsonSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name" ;
    rml:reference "$.name" ;
  ] ;
  rml:field [
    a rml:IterableField ;
    rml:fieldName "item" ;
    rml:reference "$.items[*]" ;
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "type" ;
      rml:reference "$.type" ;
    ] ;
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "weight" ;
      rml:reference "$.weight" ;
    ] ;
  ] .

Note some columns in the table below have been shortened for brevity.

:triplesMapPerson a rml:TriplesMap ;
  rml:logicalSource :jsonView ;
  rml:subjectMap [
    rml:template "http://example.org/person/{name}" ;
  ] ;
  rml:predicateObjectMap [
    rml:predicate :hasName ;
    rml:objectMap [
      rml:reference "name" ;
    ] ;
  ] ;
  rml:predicateObjectMap [
    rml:predicate :hasItem ;
    rml:objectMap [
      rml:parentTriplesMap :triplesMapItem ;
    ] ;
  ] .

:triplesMapItem a rml:TriplesMap ;
  rml:logicalSource :jsonView ;
  rml:subjectMap [
    rml:template "http://example.org/person/{name}/item/{item.#}" ;
  ] ;
  rml:predicateObjectMap [
    rml:predicate :hasName ;
    rml:objectMap [
      rml:reference "item.type" ;
    ] ;
  ] ;
  rml:predicateObjectMap [
    rml:predicate :hasWeight ;
    rml:objectMap [
      rml:reference "item.weight" ;
    ] ;
  ] .

<http://example.org/person/alice> :hasName "alice" ;
  :hasItem <http://example.org/person/alice/item/0> ,
           <http://example.org/person/alice/item/1> .

<http://example.org/person/bob> :hasName "bob" ;
  :hasItem <http://example.org/person/bob/item/0> .

<http://example.org/person/alice/item/0> :hasName "sword" ;
  :hasWeight 1500 .

<http://example.org/person/alice/item/1> :hasName "shield" ;
  :hasWeight 2500 .

<http://example.org/person/bob/item/0> :hasName "flower" ;
  :hasWeight 15 .

A logical view join (rml:LogicalViewJoin) is an operation that extends the logical iteration of one logical view (the child logical view) with fields derived from another logical view (the parent logical view).

A logical view join (rml:LogicalViewJoin) MUST contain:

exactly one parent logical view property (rml:parentLogicalView), whose value is a logical view (rml:LogicalView) that supplies the additional fields, fulfills the role of the parent logical source in the join condition(s) of the logical view join, and is referred to as parent logical view.
at least one join condition property (rml:joinCondition), whose value is a join condition.
at least one field property (rml:field), whose value is an expression field (rml:ExpressionField). This field SHOULD only contain field references that can be evaluated on the parent logical view.

The logical view in the subject position of the join property, fulfills the role of child logical source in the join condition(s) of the logical view join, and is referred to as child logical view.

Property	Domain	Range
`rml:parentLogicalView`	`rml:LogicalViewJoin`	`rml:LogicalView`
`rml:joinCondition`	`rml:LogicalViewJoin`	`rml:Join`
`rml:field`	`rml:LogicalViewJoin`	`rml:ExpressionField`

The join property specifies the join type of the logical view join, i.e. a left join or an inner join.

A left join (rml:leftJoin) is the equivalent of a left (outer) join in SQL, where the child logical view is the left part of the join, and the parent logical view is the right part of the join. If any of the join conditions evaluates to false, the fields from the logical view join in the extended logical iteration contain a null value.

An inner join (rml:innerJoin) is the equivalent of an inner join in SQL. If any of the join conditions evaluates to false, the logical iteration is removed from the child logical view.

Example 15

In this example a logical view with fields built with data from the logical source from Example 4 is joined with the logical view from Example 12. In case of a left join (as in the example), this results in 4 logical iterations in the logical view. If an inner joins would have been used, the logical view would have only 3 logical iterations.

:csvView a rml:LogicalView ;
  rml:viewOn :csvSource ;
  rml:field [
    a rml:ExpressionField ; 
    rml:fieldName "name" ;
    rml:reference "name" ;
  ] ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "birthyear" ;
    rml:reference "birthyear" ;
  ] ;
  rml:leftJoin [
    rml:parentLogicalView :jsonView ;
    rml:joinCondition [
      rml:parent "name" ;
      rml:child "name" ;
    ] ; 
    rml:field [
      a rml:ExpressionField ; 
      rml:fieldName "item_type" ;
      rml:reference "item.type" ;
    ] ;
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "item_weight" ;
      rml:reference "item.weight" ;
    ] ;
  ] .

Example 16

When an inner join is used, the resulting logical view has only 3 logical iterations.

:csvView a rml:LogicalView ;
  rml:viewOn :csvSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name" ;
    rml:reference "name" ;
  ] ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "birthyear" ;
    rml:reference "birthyear" ;
  ] ;
  rml:leftJoin [
    rml:parentLogicalView :jsonView
    rml:joinCondition [
      rml:parent "name" ;
      rml:child "name" ;
    ] ; 
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "item_type" ;
      rml:reference "item.type" ;
    ] ;
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "item_weight" ;
      rml:reference "item.weight" ;
    ] ;
  ] .

Example 17

In this example a second logical view join is added to the logical view from Example 15. The parent logical view of this second join is derived from logical source :additionalCsvSource with below input data.

name,id
alice,123
bob,456
tobias,789

:additionalCsvView a rml:LogicalView ;
  rml:viewOn :additioncalCsvSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name" ;
    rml:reference "name" ;
  ] ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "id" ;
    rml:reference "id" ;
  ] . 

:csvView a rml:LogicalView ;
  rml:viewOn :csvSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name" ;
    rml:reference "name" ;
  ] ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "birthyear" ;
    rml:reference "birthyear" ;
  ] ;
  rml:leftJoin [
    rml:parentLogicalView :jsonView ;
    rml:joinCondition [
      rml:parent "name" ;
      rml:child "name" ;
    ] ; 
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "item_type" ;
      rml:reference "item.type" ;
    ] ;
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "item_weight" ;
      rml:reference "item.weight" ;
    ] ;
  ] ; 
  rml:leftJoin [
    rml:parentLogicalView :additionalCsvView ;
    rml:joinCondition [
      rml:parent "name" ;
      rml:child "name" ;
    ] ; 
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "id" ;
      rml:reference "id" ;
    ] ;
  ] .

#	<it>	name.#	name	birthyear.#	birthyear	item_type.#	item_type	item_weight.#	item_weight	id#	id
0	(row)	0	alice	0	1995	0	sword	0	1500	0	123
0	(row)	0	alice	0	1995	1	shield	1	2500	0	123
1	(row)	0	bob	0	1999	0	flower	0	15	0	456
2	(row)	0	tobias	0	2005	null	null	null	null	0	789

Structural annotations provide a mechanism to express relationships between logical views, as well as additional information about fields.

Each logical view MAY have zero or more structural annotation properties (rml:structuralAnnotation), connecting the logical view to a structural annotation object (i.e., of type rml:StructuralAnnotation).

Property	Domain	Range
`rml:structuralAnnotation`	`rml:LogicalView`	`rml:StructuralAnnotation`

Following structural annotations MAY be defined:

Unique annotation (rml:UniqueAnnotation)
ForeignKey annotation (rml:ForeignKeyAnnotation)
NotNull annotation (rml:NotNullAnnotation)
IriSafe annotation (rml:IriSafeAnnotation)
PrimaryKey annotation (rml:PrimaryKeyAnnotation)
Inclusion annotation (rml:InclusionAnnotation)

All structural annotations of a logical view lv MUST have an on fields property (rml:onFields), linking the structural annotation to a list of field names occurring in lv. Intuitively, property on fields specifies the fields in lv that are involved by the structural annotation. The semantics of this involvement depends on the specific annotation.

Property	Domain	Range
`rml:onFields`	`rml:StructuralAnnotation`	`rdf:List`

Structural annotations provide additional information about the data that might be used by the RML processor to optimize the KG construction process. If this additional information is incorrect, then the RML processor might either fail or produce wrong results. When using structural annotations, users should make sure that the following invariance principle is satisfied:

For any source instances, the RDF graph produced by the RML engine using an RML mapping with annotations, and the same RML mapping where annotations have been removed, MUST be the same.

We emphasize that RML engines might exploit structural annotations, as they could totally ignore them. It is responsibility of the user to make sure that the annotations provided are indeed correct (that is, the data complies with the annotations). Sanity checks MAY be provided by the RML engines themselves, but this is not mandatory. Note that providing wrong annotations to an engine that takes into account for annotations, for instance for applying optimizations, could result in a violation of the invariance principle, with unpredictable results.

An IriSafe structural annotation (rml:IriSafeAnnotation) on fields F indicates that the content of each field in F is IRI safe, that is, each field in F does not contain any character that is not in the iunreserved production in RFC3987.

A PrimaryKey structural annotation (rml:PrimaryKeyAnnotation) on fields (f1, ..., fn) imposes two conditions:

no duplicate record sequences are present over the list of fields (f1, ..., fn);
No NULL value is admitted in any of the field f1, ..., fn.

Each logical view MAY specify AT MOST ONE PrimaryKey annotation.

Example 18

Consider the following CSV file containing birthdays of people:

name,birthyear
alice,1995
bob,1999
tobias,2005
lukas, 1986

Now, assume that we know:

Attribute name in the CSV is "UNIQUE" and "NOT NULL";

Such a constraint naturally corresponds to the notion of PRIMARY KEY from the world of relational databases. This fact could be valuable information for the RML engine, especially in the virtual setting. However, note that constraints cannot be expressed on CSV files.

We can exploit the mechanism of structural annotations to inform the RML engine about the existence of this constraint. We here work-out an example.

First, we specify the logical source corresponding to the CSV file:

:csvSource a rml:LogicalSource ;
  rml:source :csvFile ;
  rml:referenceFormulation rml:CSV .

We are now ready to specify our logical view and associated rml:primaryKeyAnnotation.

:csvSource a rml:LogicalView ;
  rml:viewOn :csvSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name";
    rml:reference "name"
 ];
 rml:field [
    a rml:ExpressionField ;
    rml:fieldName "birthday";
    rml:reference "birthday"
 ];
 rml:structuralAnnotation [
    a rml:primaryKeyAnnotation;
    rml:onFields ("name")
 ].

The Unique structural annotation (rml:UniqueAnnotation) is analogous to the notion of UNIQUE constraints in databases. Specifically, a Unique annotation on fields (f1, ..., fn) imposes the following condition:

no duplicate record sequences are present over the list of fields (f1, ..., fn).

Note that every PrimaryKey annotation is, as a matter of fact, also a Unique annotation.

The NotNull structural annotation (rml:NotNullAnnotation) is analogous to the notion of NOT NULL constraints in databases. Specifically, a NotNull annotation on fields F imposes that each field in F does not contain NULL values.

Note

The ForeignKey structural annotation (rml:ForeignKeyAnnotation) is analogous to the notion of foreign key constraint in databases. Specifically, a ForeignKey annotation on fields (f1, ..., fn) , target view lv, and target fields (tf1,...,tfn) imposes the following conditions:

each NULL-free record sequence over the list of fields (f1, ..., fn) occurs also as a record sequence in (tf1,...,tfn);
Target view lv defines a Unique annotation on fields (tf1,...,tfn).

The target view is a logical view specified through the property rml:targetView, whereas the target fields are an RDF list of field names specified through the property rml:targetFields. These two properties are specified as follows:

Property	Domain	Range
`rml:targetView`	`rml:InclusionAnnotation`	`rml:LogicalView`
`rml:targetFields`	`rml:InclusionAnnotation`	`rdf:List`

Therefore, each ForeignKey annotation MUST specify (additionally to the inherited rml:onFields property):

Exactly one rml:targetView property
Exactly one rml:targetFields property.

Example 19

Consider the following XML file containing information about warriors.

{
  "people": [
    {
      "name": "alice",
      "items": [
        {
          "type": "sword",
          "weight": 1500
        },
        {
          "type": "shield",
          "weight": 2500
        }
      ]
    },
    {
      "name": "bob",
      "items": [
        {
          "type": "flower",
          "weight": 15
        }
      ]
    }
  ]
}

Now, assume that we know:

Attribute name in the CSV is "UNIQUE" and "NOT NULL";
All warriors in our domain are contained in the CSV file.

The first constraint naturally corresponds to the notion of PRIMARY KEY from the world of relational databases. This fact could be valuable information for the RML engine, especially in the virtual setting. However, note that constraints cannot be expressed on CSV files.

The second constraint naturally corresponds to a FOREIGN KEY constraint from the world of relational databases. However, being the involved values spread across different (and diverse) sources, it obviously cannot be expressed as such. Also this one could provide valuable information for the RML engine.

We can exploit the mechanism of structural annotations to inform the RML engine about the existence of such "relational-like" constraint. We here work-out an example.

First, we need to specify the logical sources. The logical source corresponding to the CSV:

:csvSource a rml:LogicalSource ;
  rml:source :csvFile ;
  rml:referenceFormulation rml:CSV .

The logical source corresponding to the JSON:

:jsonSource a rml:LogicalSource ;
  rml:source :jsonFile ;
  rml:referenceFormulation rml:JSONPath ;
  rml:iterator "$.people[*]" .

We are now ready to specify our logical views, and associated structural annotations. The first logical view is the one corresponding to :csvSource.

:csvSource a rml:LogicalView ;
  rml:viewOn :csvSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name";
    rml:reference "name"
 ];
 rml:field [
    a rml:ExpressionField ;
    rml:fieldName "birthday";
    rml:reference "birthday"
 ];
 rml:structuralAnnotation [
    a rml:PrimaryKeyAnnotation;
    rml:onFields ("name");
 ].

Now, we declare the logical view corresponding to :jsonSource. Note that this view contains a rml:ForeignKeyAnnotation:

:jsonView a rml:LogicalView ;
  rml:viewOn :jsonSource ;
  rml:field [
    a rml:ExpressionField ;
    rml:fieldName "name" ;
    rml:reference "$.name" ;
  ] ;
  rml:field [
    a rml:IterableField ;
    rml:fieldName "item" ;
    rml:iterator "$.items[*]" ;
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "type" ;
      rml:reference "$.type" ;
    ] ;
    rml:field [
      a rml:ExpressionField ;
      rml:fieldName "weight" ;
      rml:reference "$.weight" ;
    ]
  ] ;
  rml:structuralAnnotation [
    a rml:ForeignKeyAnnotation;
    rml:onFields ("name");
    rml:targetView :csvView;
    rml:targetFields ("name")
  ].

The Inclusion structural annotation (rml:InclusionAnnotation) is analogous to the notion of inclusion dependency in databases. Specifically, an Inclusion annotation on fields (f1, ..., fn) , target view lv, and target fields (tf1,...,tfn) imposes the following condition:

each NULL-free record sequence over the list of fields (f1, ..., fn) occurs also as a record sequence in (tf1,...,tfn);

As for ForeignKey annotation, the target view MUST be a logical view specified through the property rml:targetView, whereas the target fields MUST be an RDF list of field names specified through the property rml:targetFields.

Therefore, each inclusion annotation MUST specify (additionally to the inherited rml:onFields property):

Exactly one rml:targetView property
Exactly one rml:targetFields property.

Note

RML Logical Views

Abstract

Status of This Document

1. Overview

1.1 Document conventions

1.2 Conformance

2. Problem

2.1 Nested data structures

2.2 Mixed data formats

2.3 Joining of data sources

3. Records

3.1 Extending the logical source

3.2 Record sequences

4. Logical views

5. Fields

5.1 Field parents

5.2 Field names

5.3 Field record sequences and records

5.4 Field reference formulations

5.5 Using field names in triples maps

6. Logical view joins

6.1 Join types

6.2 Logical view join examples

6.2.1 Left join

6.2.2 Inner join

6.2.3 Two left joins

7. Structural Annotations

7.1 Invariance Principle

7.2 IriSafe

7.3 PrimaryKey

7.4 Unique

7.5 NotNull

7.6 ForeignKey

7.7 Inclusion

A. References

A.1 Normative references

#	`<it>`	name	item.#	item	item.type	item.weight
0	`{...}`	alice	0	`{ "type": "sword", "weight": 1500 }`	sword	1500
0	`{...}`	alice	1	`{ "type": "shield", "weight": 2500 }`	shield	2500
1	`{...}`	bob	0	`{ "type": "flower", "weight": 15 }`	flower	15