# Knowledge Graph Construction with R2RML and RML
We conduct an evaluation of KGC engines considering several R2RML and RML processors to identify their strengths and weaknesses. We (i) perform a qualitative analysis of the distinctive features of each engine, (ii) examine their conformance with the mapping language specification they support, and (iii) assess their performance and scalability using the GTFS-Madrid-Bench benchmark.

**Citing**:

```bib
@inproceedings{arenas2021knowledge,
  title = {{Knowledge Graph Construction with R2RML and RML: An ETL System-based Overview}},
  author = {Arenas-Guerrero, Julián and Scrocca, Mario and Iglesias-Molina, Ana and Toledo, Jhon and Pozo-Gilo, Luis and Doña, Daniel and Corcho, Oscar and Chaves-Fraga, David},
  booktitle = {Proceedings of the 2nd International Workshop on Knowledge Graph Construction},
  year = {2021},
  series = {CEUR Workshop Proceedings},
  publisher = {CEUR-WS.org},
  volume = {2873},
  url = {http://ceur-ws.org/Vol-2873/paper11.pdf},
}
```

## Engines
We test the performance and scalability of a set of KG construction engines:

R2RML-based:
- Ontop v4.1.0
- Morph-RDB v3.12.5
- R2RML-F v1.2.3
- db2triples v2.2

RML-based:
- RMLMapper v4.9.1
- CARML v0.3.2
- RocketRML v1.8.2
- SDM-RDFizer v3.5
- RMLStreamer v2.0
- Chimera v2.1

## Evaluation resources

### GTFS-Madrid-Bench
Using the [GTFS-Madrid-Bench](https://github.com/oeg-upm/gtfs-bench) and based on the input dataset we create the following distributions to test the engines:

- Formats: CSV, XML, JSON, RDB and Random-Custom (sources in different formats)
- Scale-Sizes: 1, 10, 100 and 1000

Data can be directly download executing `bash scripts/download-data.sh`

### R2RML and RML test-cases
We use the resources provided by the W3C community on KG-Construction (https://www.w3.org/community/kg-construct/) to run the R2RML and RML test-cases over the selected engines.

## Results
We created a comparative framework to gather and compare the information about the engines' features, availabe [here](https://github.com/oeg-upm/kgc-eval/tree/master/results/table); and tested the engines with the mentioned benchmark in terms of time and memory used. The raw data resulting from the evaluation is stored [here](https://github.com/oeg-upm/kgc-eval/tree/master/results/raw-data), and the resulting figures can be seen [here](https://github.com/oeg-upm/kgc-eval/tree/master/results/figures).

## Authors
- Julián Arenas-Guerrero - julian.arenas.guerrero@upm.es (Ontology Engineering Group - UPM)
- Mario Scrocca (Cefriel - Politecnico di Milano)
- David Chaves-Fraga (Ontology Engineering Group - UPM)
- Jhon Toledo (Ontology Engineering Group - UPM) 
- Daniel Doña (Ontology Engineering Group - UPM)
- Luis Pozo-Gilo (Ontology Engineering Group - UPM)
- Ana Iglesias (Ontology Engineering Group - UPM)