← Go back

ORCA

Production-ready project (January 2019 - Ongoing)

ORCA logo

ORCA is a Crawler Analysis Benchmark for Data Web Crawlers

About the project

ORCA is a benchmark for Data Web crawler, i.e., crawler that are focussed on gathering structured data. The main idea of ORCA is to generate a synthetic Data Web for which the ground truth is known. It is based on the HOBBIT benchmarking platform and supports distributed crawler implementations.

Available Adapters for Crawler

Server Types

  • Dump file
  • Dereferencing Server
  • HTML Webserver with embedded RDFa
  • SPARQL endpoint
  • CKAN

All servers work on HTTP.

Publications

Proceedings of the 15th IEEE International Conference on Semantic Computing (ICSC), 2021, #inproceedings

ORCA -- a Benchmark for Data Web Crawlers Get BibTex

By Michael Röder, Geraldo de Souza Jr., Denis Kuchelev, Abdelmoneim Amer Desouki, Axel-Cyrille Ngonga Ngomo