Abstract

In this paper we describe cLODg2 (conference Linked Open Data generator - version 2), a tool to collect, refine and produce Linked Data about scientific conferences with their associated publications, participants and events. Conference metadata collected from different unstructured and semi-structured resources must be expressed with appropriate vocabularies to be exposed as Linked Data. cLODg2 facilitates this task by providing a one-click workflow to generate data which is ready to be integrated in the ScholarlyData.org dataset. cLODg2 is an open source project, which has the aim to foster the publication of scholarly Linked Open Data and encourage collaborative efforts in this direction between researchers and publishers.

Introduction

Scholarlydata is the evolution of the Semantic Web Dog Food (SWDF) dataset. The SWDF corpus was the first considerable effort to offer comprehensive semantic descriptions of conference events , collecting linked data about papers, people, organizations, and events related to academic conferences.

A comprehensive description of Scholarlydata can be found in , while in this paper we provide technical details about cLODg2, the Open Source tool that supports data generation for Scholarlydata. cLODg2 (conference Linked Open Data generator - version 2) provides a one click process for the conference metadata publication workflow. cLODg2 has been used to refactor the SWDF dataset and to gather and publish new conference metadata. The tool provides an easy process to generate Linked Data which can be directly added to the ScholarlyData dataset.

cLODg2 - publishing Conference Semantic Data

The main goal of cLODg2 is to facilitate the generation of conference Linked Data which can be readily integrated in the Scholarlydata dataset. Scholarlydata is the evolution of the SWDF dataset based on an improvement of the Semantic Web Conference (SWC) Ontology, the Conference Ontology , which improves SWC adopting best ontology design practices. The necessary steps to add conference data to Scholarlydata are: (i) Data acquisition, (ii) Linked Data generation, (iii) Linked Data enrichment and (iv) Linked Data Publication.

The Data acquisition step, to be done by the user, consists of acquiring metadata about the conference, generally exported from a conference management system. We currently support data acquisition from CSV files. Additionally, Linked Data represented with the SWC ontology can be used as initial input.

Starting from provided input cLODg2 performs two sequential steps: Linked Data generation and data enrichment. shows the system architecture, including all accessed services and technologies, modelled as an UML activity diagram. The initialisation step merely consists of configuring a property file to point to (i) the collected CSV files containing the input data and (ii) the D2RQ mapping that will serve for converting CSV files to RDF. A D2RQ mapping for dealing with easychair data is provided by default, but expert users can change this to import ad hoc CSV files.

cLODg architecture

cLODg2 architecture represented as an UML activity diagram.

The Linked Data generation activity is composed of the following steps:

The Linked Data enrichment activity is composed of the following actions:

The Linked Data Publication step, which is the last action in the cLODg2 workflow, has to be done by the user and consists of submitting produced data to Scholarlydata.org.

Conclusions

This paper describes cLODg2, a tool to collect, refine and produce Linked Data to describe scientific conferences and their publications, participants and events. The main contribution of this work is an open source tool to support the production of metadata for conferences and scholarly data which is ready to be integrate in the ScholarlyData dataset, with minimal user effort. Future work will be mainly focused at addressing data quality and reduce duplications and misspelling in the data.

References

  1. A. G. Nuzzolese, A. L. Gentile, V. Presutti, and A. Gangemi. Conference Linked Data Our Web Dog Food has gone gourmet. In Proc. of ISWC2016 Resource Track, page to appear, 2016.

  2. A. G. Nuzzolese, A. L. Gentile, V. Presutti, and A. Gangemi. Semantic web conference ontology - a refactoring solution. In The Semantic Web: ESWC 2016 Satellite Events, page to appear. Springer, 2016.

  3. K. Möller, T. Heath, S. Handschuh, and J. Domingue. Recipes for semantic web dog food: The eswc and iswc metadata projects. In Proc. of ISWC’07/ASWC’07, pages 802–815, Berlin, Heidelberg, 2007. Springer.

  4. C. Bizer and R. Cyganiak. D2R Server - Publishing Relational Databases on the Semantic Web. In Proc. of ISWC2006 Poster&Demo, 2006.

  5. C. Bizer and A. Seaborne. D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs. In Proc. of ISWC2004 posters, 2004.

SWDF: http://data.semanticweb.org

https://github.com/anuzzolese/cLODg2

Amongst other it has been used for ESWC conference since 2014 http://2016.eswc-conferences.org

http://w3id.org/scholarlydata

http://data.semanticweb.org/ns/swc/swc_2009-05-09.html

Refer to http://w3id.org/scholarlydata/ontology/conference-ontology.owl to obtain the OWL source code and to http://goo.gl/4lOHSk to obtain the HTML documentation of the Conference Ontology.

A simplified example of such data, exported from http://www.easychair.org can be found at https://github.com/anuzzolese/cLODg2/tree/master/csv_samples.

Example dump at https://github.com/AnLiGentile/cLODg/tree/master/resources/swdf_samples.

http://hsqldb.org.

http://www.sparontologies.net.

http://www.ontologydesignpatterns.org/ont/dul/d0.owl.

https://www.w3.org/TR/vocab-org.

http://purl.org/co.

http://orcid.org.

https://www.doi.org.

http://members.orcid.org/api/introduction-orcid-public-api.

http://www.crossref.org/guestquery.