Linked Data Event Streams

Living Standard,

This version:
https://w3id.org/ldes/specification
Issue Tracking:
GitHub
Editor:
Pieter Colpaert

Abstract

A Linked Data Event Stream is a collection of immutable objects (such as version objects, sensor observations or archived representations). Each object is described in RDF.

1. Introduction

A Linked Data Event Stream (LDES) (ldes:EventStream) is a collection (rdfs:subClassOf tree:Collection) of immutable objects, each object being described using a set of RDF triples ([rdf-primer]).

This specification uses the TREE specification for its collection and fragmentation (or pagination) features, which in its turn is compatible to other specifications such as [activitystreams-core], [VOCAB-DCAT-2], [LDP] or Shape Trees. For the specific compatibility rules, read the TREE specification.

Note: When a client once processed a member, it should never have to process it again. A Linked Data Event Stream client can thus keep a list of (or cache) already processed member IRIs. A reference implementation of a client is available as part of the Comunica framework on NPM and Github.

The base URI for LDES is https://w3id.org/ldes#, and the preferred prefix is ldes:. Other prefixes are used following prefix.cc.

ex:C1 a ldes:EventStream ;
      ldes:timestampPath sosa:resultTime ;
      tree:shape ex:shape1.shacl ;
      tree:member ex:Observation1 .

ex:Observation1 a sosa:Observation ;
                sosa:resultTime "2021-01-01T00:00:00Z"^^xsd:dateTime ;
                sosa:hasSimpleResult "..." .

The ldes:EventStream instance SHOULD have these properties:

The ldes:EventStream instance MAY have these properties:

ex:C2 a ldes:EventStream ;
      ldes:timestampPath dcterms:created ;
      ldes:versionOfPath dcterms:isVersionOf ;
      tree:shape ex:shape2.shacl ;
      tree:member ex:AddressRecord1-version1 .

ex:AddressRecord1-version1 dcterms:created "2021-01-01T00:00:00Z"^^xsd:dateTime ;
                           adms:versionNotes "First version of this address" ;
                           dcterms:isVersionOf ex:AddressRecord1 ;
                           dcterms:title "Streetname X, ZIP Municipality, Country" .

Note: When you need to change an earlier version of an ldes:EventStream, there are two options: create a new version of the object with a new shape that is backward compatible, and add the new version of that object again as a member on the stream, or replicate and transform the entire collection into a new ldes:EventStream. You can indicate that the new ldes:EventStream is derived from another ldes:EventStream.

Note: in Example 1, we consider the Observation object to be an immutable object and we can use the existing identifiers. In Example 2 however, we still had to create version IRIs in order to be able to link to immutable objects.

2. Fragmenting and pagination

The focus of an LDES is to allow clients to replicate the history of a dataset and efficiently synchronize with its latest changes. Linked Data Event Streams MAY be fragmented when their size becomes too big for 1 HTTP response. Fragmentations MUST be described using the features in the TREE specification. All relation types from the TREE specification MAY be used.

ex:C1 a ldes:EventStream ;
      ldes:timestampPath sosa:resultTime ;
      tree:shape ex:shape1.shacl ;
      tree:member ex:Obervation1, ... ;
      tree:view <?page=1> .

<?page=1> a tree:Node ;
          tree:relation [
              a tree:GreaterThanOrEqualToRelation ;
              tree:path sosa:resultTime ;
              tree:node <?page=2> ;
              tree:value "2020-12-24T12:00:00Z"^^xsd:dateTime
          ] .

An tree:importStream MAY be used to describe a publish-subscribe interface to subscribe to new members in the LDES.

Note: A 1-dimensional fragmentation based on creation time of the immutable objects is probably going to be the most interesting and highest priority fragmentation for an LDES, as only the latest page, once replicated, should be subscribed to for updates. However, it may happen that a time-based fragmentation cannot be applied. For example: the backend system on which the LDES has been built does not receive the events at the time they were created, due to human errors (forgetting to indicate that a change was made), external systems or just latency. Applying a time-based fragmentation in that situation will result in losing caching, due to the ever-changing pages. Instead, in the spirit of an LDES’s goal, the publisher should publish the events in the order they were received by the backend system (that order is never changing), trying to give as many pages as possible an HTTP Cache-Control: public, max-age=604800, immutable header.

Note: Cfr. the example in the TREE specification on “searching through a list of objects ordered in time”, also a search form can optionally make a one dimensional feed of immutable objects more searchable.

3. Retention policies

By default, an LDES MUST keep all data that has been added to the tree:Collection (or ldes:EventStream) as defined by the TREE specification. It MAY add a retention policy in which the server indicates data will be removed from the server. Third parties SHOULD read retention policies to understand what subset of the data is available in this tree:View, and MAY archive these members.

In the LDES specification, three types of retention policies are defined which can be used with a ldes:retentionPolicy with an instance of a tree:View as its subject:

  1. ldes:DurationAgoPolicy: a time-based retention policy in which data generated before a specified duration is removed

  2. ldes:LatestVersionSubset: a version subset based on the latest versions of an entity in the stream

  3. ldes:PointInTimePolicy: a point-in-time retention policy in which data generated before a specific time is removed

Different retention policies MAY be combined. When policies are used together, a server MUST store the members as long they are not all matched.

3.1. Time-based retention policies

A time-based retention policy can be introduced as follows:

ex:C3 a ldes:EventStream ;
      ldes:timestampPath prov:generatedAtTime ;
      tree:view <> .

<> ldes:retentionPolicy ex:P1 .

ex:P1 a ldes:DurationAgoPolicy ;
      tree:value "P1Y"^^xsd:duration . # Keep 1 year of data

A ldes:DurationAgoPolicy uses a tree:value with an xsd:duration-typed literal to indicate how long ago the timestamp, indicated by the ldes:timestampPath that MAY be redefined in the policy itself.

3.2. Version-based retention policies

In order to indicate you only keep 2 versions of an object referred to using dcterms:isVersionOf:
ex:C2 a ldes:EventStream ;
      ldes:timestampPath dcterms:created ;
      ldes:versionOfPath dcterms:isVersionOf ;
      tree:view <> .

<> ldes:retentionPolicy ex:P2 .

ex:P2 a ldes:LatestVersionSubset;
      ldes:amount 2 ;
      #If different from the Event Stream, this can optionally be overwritten here
      ldes:timestampPath dcterms:created ;
      ldes:versionOfPath dcterms:isVersionOf .

A ldes:LatestVersionSubset MUST define the predicate ldes:amount and MAY redefine the ldes:timestampPath and/or ldes:versionOfPath. It MAY also define a compound version key using ldes:versionKey (see example below) instead of the more ldes:versionOfPath. The ldes:amount has a xsd:nonNegativeInteger datatype and indicated how many to keep that defaults to 1. The ldes:versionKey is an rdf:List of SHACL property paths indicating objects that MUST be concatenated together to find the key on which versions are matched. When the ldes:versionKey is set to an empty path (), all members MUST be seen as a version of the same thing.

For sensor datasets the version key may get more complex, grouping observations by both the observed property as the sensor that made the observation.
ex:C1 a ldes:EventStream ;
      tree:view <> .

<> ldes:retentionPolicy ex:P3 .

ex:P3 a ldes:LatestVersionSubset;
      ldes:amount 2 ; 
      ldes:versionKey ( ( sosa:observedProperty ) ( sosa:madeBySensor ) ) .

3.3. Point-in-time retention policies

A point-in-time retention policy can be introduced as follows:

ex:C4 a ldes:EventStream ;
      ldes:timestampPath prov:generatedAtTime ;
      tree:view <> .

<> ldes:retentionPolicy ex:P4 .

ex:P4 a ldes:PointInTimePolicy ;
      ldes:pointInTime "2023-04-12T00:00:00"^^xsd:dateTime . # Keep data after April 12th, 2023

A ldes:PointInTimePolicy uses a ldes:pointInTime with an xsd:dateTime-typed literal to indicate the point in time after which data is kept.

4. Derived collections

We will extend the spec with multiple best practices on how to annotate that your newly published collection is derived from an LDES.

First we talk about a versioned LDES. Versioned LDESes allow for changing an object in an ldes:EvenStream, while maintaining the history of events. It is achieved by defining change in an ldes:EventStream through new tree:member in the ldes:EventStream through added metadata for both the ldes:EvenStream and each tree:member.

Secondly, version materializations are defined that use a versioned LDES as a basis. This technique allows to create snapshots in time of a versioned LDES. Here we define a snapshot as tree:Collection of the most recent versions of all objects in the versioned LDES.

4.1. Versioning

A versioned LDES is defined with two properties: ldes:versionOfPath and ldes:timestampPath.

A versioned LDES with one member.

dct:isVersionOf is used as property for ldes:versionOfPath, which indicates that ex:resource1v0 is a version of ex:resource1.

dct:issued is used as property for ldes:timestampPath, which indicates that ex:resource1v0 was issued in the LDES at "2021-12-15T10:00:00.000Z".

ex:ES a ldes:EventStream;
    ldes:versionOfPath dct:isVersionOf;
    ldes:timestampPath dct:issued;
    tree:member ex:resource1v0.

ex:resource1v0
    dct:isVersionOf ex:resource1;
    dct:issued "2021-12-15T10:00:00.000Z"^^xsd:dateTime;
    dct:title "First version of the title".
A versioned LDES with two members which are both versions of the same object.
ex:ES a ldes:EventStream;
    ldes:versionOfPath dct:isVersionOf;
    ldes:timestampPath dct:issued;
    tree:member ex:resource1v0, ex:resource1v1.
    
ex:resource1v0
    dct:isVersionOf ex:resource1;
    dct:issued "2021-12-15T10:00:00.000Z"^^xsd:dateTime;
    dct:title "First version of the title".
    
ex:resource1v1 
    dct:isVersionOf ex:resource1;
    dct:issued "2021-12-15T12:00:00.000Z"^^xsd:dateTime;
    dct:title "Title has been updated once".
2 hours after ex:resource1v0 was created, the title of ex:resource1 was changed. This change can be seen by the creation of ex:resource1v1, which is the newest version of ex:resource1.

4.2. Version Materializations

A version materialization can be defined only if the original LDES defines both ldes:versionOfPath and ldes:timestampPath.

A version materialization replaces the subject of a member with its ldes:versionOfPath IRI, and selects a certain version of this object. It also translates created style timestamp predicates to modified-style predicates.

In this example an event stream with 2 versions of the same object got materialized until 2020-10-05T12:00:00Z
ex:ES1 a ldes:EventStream
    ldes:versionOfPath dct:isVersionOf;
    ldes:timestampPath dct:created;
       tree:member [
           dcterms:isVersionOf <A>;
           dcterms:created "2020-10-05T11:00:00Z";
           owl:versionInfo "v0.0.1";
           rdfs:label "A v0.0.1"
       ], [
           dcterms:isVersionOf <A> ;
           dcterms:created "2020-10-06T13:00:00Z";
           owl:versionInfo "v0.0.2";
           rdfs:label "A v0.0.2"
       ].
towards the snapshot below
ex:ES1v1 a tree:Collection ; # the members are no longer immutable
         ldes:versionMaterializationOf ex:ES1;
         ldes:versionMaterializationUntil "2020-10-05T12:00:00Z"^^xsd:dateTime;
         tree:member <A>.

<A> rdfs:label "A v0.0.1";
    dcterms:modified "2020-10-05T11:00:00Z".

A version materialization is thus a tree:Collection instance that has two predicates set:

Note: We see versionMaterializationUntil mainly useful for historical and static datasets that deliberately will not be updated to the latest state of the LDES.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[ACTIVITYSTREAMS-CORE]
James Snell; Evan Prodromou. Activity Streams 2.0. URL: https://w3c.github.io/activitystreams/core/
[LDP]
Steve Speicher; John Arwe; Ashok Malhotra. Linked Data Platform 1.0. URL: https://www.w3.org/2012/ldp/hg/ldp.html
[RDF-PRIMER]
Frank Manola; Eric Miller. RDF Primer. 10 February 2004. REC. URL: https://www.w3.org/TR/rdf-primer/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[VOCAB-DCAT-2]
Riccardo Albertoni; et al. Data Catalog Vocabulary (DCAT) - Version 2. URL: https://w3c.github.io/dxwg/dcat/