Performance benchmarks
The following results were obtained by benchmarking Jelly-JVM against serializations built into Apache Jena (including binary formats).
The benchmarks were performed on two kinds of RDF streams (according to the RDF-STaX taxonomy):
- Flat RDF streams – streams of RDF triples or quads. This is the "classic" serialization – equivalent to, for example N-Triples or N-Quads.
- Grouped RDF streams – streams of RDF graphs or datasets.
Jelly has a major performance advantage especially in grouped RDF streams. This is mostly due to Jelly being the only tested serialization that natively supports grouped RDF streams. Because of this, Jelly can exploit the repeating terms, prefixes, and structures in the stream to achieve much better compression and serialization speed.
If you are only interested in parsing/writing a single graph or dataset, look at the flat streaming results.
Benchmark setup
All benchmarks presented here were performed using the RiverBench benchmark suite, version 2.1.0. Out of the 13 used datasets (all datasets available in RiverBench 2.1.0), 1 used RDF-star, and 3 included RDF quads/datasets. You can find the links to the specific used RiverBench profiles and tasks in the results below.
The benchmarks were executed using this code (Apache 2.0 license) in a JVM with options: -Xms1G -Xmx32G
. The large heap size was necessary to fit the benchmark data in memory, making the benchmark independent of disk I/O.
Hardware: AMD Ryzen 9 7900 (12-core, 24-thread, 5.0 GHz); 64 GB RAM (DDR5 5600 MT/s). The disk was not used during the benchmarks (all data was in memory). The throughput benchmarks are single-threaded, but the JVM was allowed to use all available cores for garbage collection, JIT compilation, and other tasks.
Software: Linux kernel 6.10.11, Oracle GraalVM 23.0.1+11.1, Apache Jena 5.2.0, Eclipse RDF4J 5.0.2, Jelly-JVM 2.2.2.
Tested methods
- W3C RDF/XML (Apache Jena 5.2.0,
RDFXML_PLAIN
) - W3C N-Triples / N-Quads (Apache Jena 5.2.0,
NTRIPLES
andNQUADS
) - W3C JSON-LD (Apache Jena 5.2.0,
JSONLD_PLAIN
) - W3C Turtle / TriG (Apache Jena 5.2.0)
- In grouped streaming, the default (
TURTLE_PRETTY
andTRIG_PRETTY
) Turtle/TriG variant was used. - In flat streaming, the
TURTLE_BLOCKS
andTRIG_BLOCKS
variant was used. See Jena's documentation on streaming writers for more details.
- In grouped streaming, the default (
- Jena's RDF binary Protobuf format (Apache Jena 5.2.0,
RDF_PROTO
) - Jena's RDF binary Thrift format (Apache Jena 5.2.0,
RDF_THRIFT
) - RDF4J Binary RDF Format (Eclipse RDF4J 5.0.2,
BINARY
)- Note: to avoid confusion, on this page we only show the performance results for Apache Jena. The results for RDF4J can be found here: RDF4J performance.
- Jelly (Jelly-JVM 2.2.2, "big" preset)
- Jelly without prefix compression (Jelly-JVM 2.2.2, "big" preset with prefix table disabled)
Results
RDF4J performance results
This page only shows the performance results for Apache Jena. The results for RDF4J can be found here: RDF4J performance.
Warning
The results below were averaged over all datasets used in the benchmarks. For RDF/XML and JSON-LD the results are incomplete due to missing support for some datasets. For them, only the datasets that were successfully processed are included in the averages.
RDF/XML failed on 5 out of 13 datasets due to lack of support for RDF datasets (assist-iot-weather-graphs
, citypulse-graphs
, nanopubs
), RDF-star (yago-annotated-facts
), and no support for encoding ASCII control characters (politiquices
– see RiverBench documentation for more details).
JSON-LD failed on 1 out of 13 datasets due to lack of support for RDF-star (yago-annotated-facts
).
Grouped streaming serialized size
- RiverBench task:
stream-compression
(2.1.0) - RiverBench profile:
stream-mixed-rdfstar
(2.1.0) - The entire (full-length) datasets were used for this benchmark.
- The data was serialized to a byte-counting output stream and then discarded.
Jelly has a major advantage here (~2x smaller than the next best format, RDF4J Binary), but that is because it is the only format that natively supports grouped RDF streams. The other formats cannot exploit the repeating patterns between elements in the stream, leading to much larger sizes.
Flat streaming serialized size
- RiverBench task:
flat-compression
(2.1.0) - RiverBench profile:
flat-mixed-rdfstar
(2.1.0) - The entire (full-length) datasets were used for this benchmark.
- The data was serialized to a byte-counting output stream and then discarded.
In flat streaming, the compression ratios for Jelly are almost identical to grouped streaming. This is also a case where RDF4J Binary has a comparable result to Jelly – it does well when serializing large batches of RDF triples/quads, because it maintains a large buffer of statements in the serializer. This means RDF4J is optimized for throughput – not latency. Jelly can do both, at the same time. Jena's binary formats have no in-built compression, and thus are much larger.
Flat streaming serialization throughput
- RiverBench task:
flat-serialization-throughput
(2.1.0) - RiverBench profile:
flat-mixed-rdfstar
(2.1.0) - The first 5,000,000 statements of each dataset were used for this benchmark.
- Each method/dataset combination was run 15 times, the first 5 runs were discarded to account for JVM warmup, and the remaining 10 runs were averaged.
- The data was preloaded into memory and serialized to a null output stream.
When reviewing these results, it's important to remember that Jelly is faster than Jena's binary formats, while also being much more compact – see serialized size results above.
Flat streaming deserialization throughput
- RiverBench task:
flat-deserialization-throughput
(2.1.0) - RiverBench profile:
flat-mixed-rdfstar
(2.1.0) - The first 5,000,000 statements of each dataset were used for this benchmark.
- Each method/dataset combination was run 15 times, the first 5 runs were discarded to account for JVM warmup, and the remaining 10 runs were averaged.
- Before running the benchmark, the data was serialized to a single byte array and then deserialized from it. The deserializer was emitting only a stream of triples/quads, without any further processing.
Grouped streaming serialization throughput
- RiverBench task:
stream-serialization-throughput
(2.1.0) - RiverBench profile:
stream-mixed-rdfstar
(2.1.0) - The first 100,000 stream elements of each dataset were used for this benchmark.
- Each method/dataset combination was run 15 times, the first 5 runs were discarded to account for JVM warmup, and the remaining 10 runs were averaged.
- The data was preloaded into memory and serialized to a null output stream.
Grouped streaming deserialization throughput
- RiverBench task:
stream-deserialization-throughput
(2.1.0) - RiverBench profile:
stream-mixed-rdfstar
(2.1.0) - The first 100,000 stream elements of each dataset were used for this benchmark.
- Each method/dataset combination was run 15 times, the first 5 runs were discarded to account for JVM warmup, and the remaining 10 runs were averaged.
- Before running the benchmark, the data was serialized to a list of byte arrays (one array per stream element) and then deserialized from it. The deserializer was emitting only a stream of triples/quads, without any further processing.
See also
- Benchmarks with RDF4J
- Benchmark code
- RiverBench benchmark suite
- Jelly-JVM – the Jelly implementation used in the benchmarks
- User guide