Production-ready project (January 2019 - Ongoing)
ORCA is a Crawler Analysis Benchmark for Data Web Crawlers
ORCA is a benchmark for Data Web crawler, i.e., crawler that are focussed on gathering structured data. The main idea of ORCA is to generate a synthetic Data Web for which the ground truth is known. It is based on the HOBBIT benchmarking platform and supports distributed crawler implementations.
All servers work on HTTP.
Proceedings of the 15th IEEE International Conference on Semantic Computing (ICSC), 2021, #inproceedings