The Smart Data Specification for Semantically Describing Streams (SDS)

Living Standard,

This version:
https://w3id.org/sds/specification
Issue Tracking:
GitHub
Editors:
Pieter Colpaert
Arthur Vercruysse

Abstract

A Semantically Described Stream is a stream of Records that have undergone Steps part of a Plan. These transformations are described using RDF. This metadata model helps determine the lineage of Records on a Stream.

1. Overview

The latest OWL encoding of the SDS Ontology can be found here.

A Semantically Described Stream (SDS) consists of two parts, the SDS description and the SDS data. The description consists of the lineage of the elements on the data stream.

1.1. SDS Description

sds:Stream is an entity (rdfs:subClassOf p-plan:Entity), which carries (sds:carries) a stream of data records (sds:Record) generated (prov:wasGeneratedBy) as a result of an activity (p-plan:Activity). It generates a new dataset (sds:dataset), consisting of the aforementioned records, for the next activity.

This specification uses the P-Plan specification to describe activities from which Streams (sds:Stream) are generated. Furthermore, the P-Plan specification makes it possible to determine the lineage of the data. This is a crucial part of Linked Data Event Stream. LDES exposes a stream of RDF members, these members often do not originate from RDF data but originating from, for example, sensor observations.

1.2. SDS Data

A sds:Record instance describes a data point. This instance links together the data and the stream that generated (or used) that data. This instance can also have other properties that were created or generated by the process generating the stream. All instances of sds:Record MUST conform to the shape specified in the SDS description for property sds:shape, this property is however optional.

1.3. Overview of used onthologies

Prefix Namespace
dcat https://www.w3.org/ns/dcat#
sds https://w3id.org/sds#
sh http://www.w3.org/ns/shacl#
p-plan http://purl.org/net/p-plan#
prov http://www.w3.org/ns/prov#

2. Smart Data Stream vocabulary

The base URI for the Smart Data Specification’s vocabulary is https://w3id.org/sds# and the preferred prefix is sds:.

The vocabulary exists of two main classes: sds:Stream a subclass of p-plan:Entity representing a stream of sds:Record which contain the actual data.

2.1. SDS Description

A Stream (sds:Stream) extends Entity (p-plan:Entity) and contains additional properties:

2.2. SDS Data

A Record (sds:Record) contains the following properties:

A Member (sds:Member) subclasses sds:Record and has the same properties. The range of sds:payload is different:

A Member can also have some of the following property:

Buckets (sds:Bucket) can be used to create tree:Collection and contain the following properties:

2.3. Properties overview

Property Domain Range
sds:dataset sds:Stream dcat:Dataset
sds:carries sds:Stream rdfs:Class
sds:shape sds:Stream sh:ShapeNode
sds:stream sds:Record sds:Stream
sds:payload sds:Record any
sds:payload sds:Member IRI
sds:bucket sds:Member sds:Bucket
sds:relationType sds:Bucket tree:Relation
sds:relationBucket sds:Bucket sds:Bucket
sds:relationValue sds:Bucket any
sds:relationPath sds:Bucket IRI

3. Examples

This section is non-normative.

3.1. SDS Description

When creating a sds:Stream the user should create a p-plan:Plan indicating the expected transformations on the data stream.

The example covers a plan that starts from a csv file, transforms it to RDF with a RML Mapper. The generated RDF is bucketized based on the foaf:label property and is lastly exposed as a LDES.

@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix p-plan: <http://purl.org/net/p-plan#> .
@prefix prov:   <http://www.w3.org/ns/prov#> .
@prefix sds:    <http://semweb.mmlab.be/ns/sds#> .
@prefix dcat:   <https://www.w3.org/ns/dcat#> .
@prefix :       <#> .

<somePlan> a p-plan:Plan;
  rdfs:comment "A epic plan to map csv file to a LDES".


# Specify p-plan variables 

<csvLocationVar> a p-plan:Variable;
  p-plan:isVariableOfPlan <somePlan>;
  rdfs:comment "Location of the CSV file".
  
:csvLocation#1 rdfs:subClassOf <csvLocationVar>.

<rmlConfigVar> a p-plan:Variable;
  p-plan:isVariableOfPlan <somePlan>;
  rdfs:comment "Location of RML config file".

:rmlConfig#1 rdfs:subClassOf <rmlConfigVar>.

<bucketConfigVar> a p-plan:Variable;
  p-plan:isVariableOfPlan <somePlan>;
  rdfs:comment "Location of RML config file".
  
:bucketConfig#1 rdfs:subClassOf <bucketConfigVar>.

<ldesServerConfigVar> a p-plan:Variable;
  p-plan:isVariableOfPlan <ldesServerConfig>;
  rdfs:subClassOf <LdesConfig>.
  
:ldesServerConfig#1 rdfs:subClassOf <ldesServerConfigVar>.

<streamVar> a p-plan:Variable;
  p-plan:isVariableOfPlan <somePlan>;
  rdfs:subClassOf <Channel>.

:stream#1 rdfs:subClassOf <streamVar>.
:stream#2 rdfs:subClassOf <streamVar>.
:stream#3 rdfs:subClassOf <streamVar>.


# Specifiy steps in the plan

<readStep> a p-plan:Step;
  p-plan:hasInputVar :csvLocation#1;
  p-plan:isStepOfPlan <somePlan>;
  p-plan:hasOutputVar :stream#1.

<rmlStep> a p-plan:Step;
  a rmlStep;
  p-plan:hasInputVar :rmlConfig#1, :stream#1;
  p-plan:hasOutputVar :stream#2;
  p-plan:isPrecededBy <readStep>;
  p-plan:isStepOfPlan <somePlan>;
  rdfs:comment "Map CSV rows to RML".

<bucketStep> a p-plan:Step;
  p-plan:hasInputVar :bucketConfig#1, :stream#2;
  p-plan:hasOutputVar :stream#3;
  p-plan:isPrecededBy <rmlStep>;
  p-plan:isStepOfPlan <somePlan>;
  rdfs:comment "Add geospatial bucketization".

<ldesStep> a p-plan:Step;
  p-plan:hasInputVar :ldesServerConfig#1, :stream#3;
  p-plan:isPrecededBy <bucketStep>;
  p-plan:isStepOfPlan <epicPlan>;
  rdfs:comment "Expose LDES".

Note: This p-plan:Plan does not explicitely inform how to connect these steps together, see the Connector Architecture for this.

Activities can be generated after running the steps of the plan. These activities are each linked to a corresponding p-plan:step.

<readCsv> a p-plan:Activity;
  p-plan:correspondsToStep <readStep>;
  prov:used [
    a void:Dataset;
    void:dataDump <file:///data/input.csv>;
  ].

<csvStream> a sds:Stream;
    prov:wasGeneratedBy <readCsv>; 
    sds:carries sds:Record; 
    p-plan:correspondsToVariable :stream#1; 
    sds:dataset [ a  dcat:Dataset ]. 

<rmlProc> a p-plan:Activity; 
  p-plan:correspondsToStep <rmlStep>; 
  prov:used <csvStream>, [
      rml:Location "somewhere";
      p-plan:correspondsToVariable :rmlConfig#1;
  ]; 
  prov:startedAtTime "1650886052".  

<rmlStream> a sds:Stream;
  a sds:Stream;
  prov:wasGeneratedBy <rmlProc>;
  sds:carries sds:Member; 
  sds:Shape <sh>; 
  p-plan:correspondsToVariable :stream#2;
  sds:dataset [ a dcat:Dataset ].

<bucketization> a p-plan:Activiy;
  p-plan:correspondsToStep <bucketStep>;
  prov:used <rmlStream>, [
      ldes:bucketType ldes:subject;
      ldes:propertyPath ex:x;
  ];
  prov:startedAtTime "1650889052".

<bucketizedStream> a sds:Stream;
    prov:wasGeneratedBy <bucketization>;
    sds:carries sds:Member;
    sds:shape <sh>;
    p-plan:correspondsToVariable :stream#3;
    sds:dataset [ a dcat:Dataset ]

<ldesServer> a p-plan:Activity;
  p-plan:correspondsToStep <ldesStep>;
  prov:used <bucketizedStream>, [
    ldes:view "abc.com/epicLDES";
    p-plan:correspondsToVariable :ldesServerConfig#1
  ];
  prov:startedAtTime "1650889452".

3.2. SDS Data

[] sds:payload "a,v,d"^^ex:csvRow;
   sds:stream <csvStream>.


ex:sample1 ex:x 1;
           ex:y 2. 

# The rmlStream contains a sds:member ex:sample1
[] sds:payload ex:sample1;
   sds:stream <rmlStream>.


<bucket1> sds:relation [
  sds:relationType tree:GreaterThanRelation ;
  sds:relationBucket <bucket2> ;
  sds:relationValue 1;
  sds:relationPath ex:x 
] .

# The bucketizedStream also contains this member ex:sample1,
# but added a sds:bucket property pointing to <bucket2>
[] sds:stream <bucketizedStream>;
   sds:payload ex:sample1;
   sds:bucket <bucket2>.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119