In this post, we will present some preliminary results of the survey we ran for the past few months.
At the time of writing, there were 61 survey participants. These were evenly divided among the different profiles, namely solution providers, technology users and scientific community. Most of the participants have two or more of these profiles.
All participants were mostly evenly divided among the different Linked Data areas, namely storage/querying, interlinking, classification/enrichment, discovery, extraction and reasoning. The results did however show that most of the solution providers were active in the storage/querying area.
Other domains that were reported are reporting/visualization and inconsistency detection.
The main KPI’s that were identified are:
- Accuracy (precision, recall, F-measure, mean reciprocal rank)
- Runtime / Speed
- Memory usage
- CPU usage
52% of the participants already use a benchmark to evaluate their own or other software. 74% of this benchmarking is done in the area of storage/querying.
22% uses pure synthetic data while benchmarking, 46% uses real data, and the remaining 32% uses a combination of both.
66% of these datasets or generators are public or can be made public.
Regarding the dataset sizes, the figure below shows that most of the used datasets are at least in the order of 1 million triples.
Our survey still remains open, so if you notice that your own requirements are not really met by the current survey results, make sure to let us know!