Jelly user guide

Jelly is a high-performance protocol for streaming and non-streaming RDF data. It is designed to be simple, fast, and easy to implement. This guide will help you get started with Jelly.

Jelly uses Protocol Buffers 3 as the basis of its serialization. This means that you can quickly create a new Jelly implementation using code generation. You can also use an existing implementation, such as the JVM (Scala) implementation.

What can it do?

Jelly is designed to be a protocol for streaming RDF data, but it can also be used with "classic", static RDF data. The main design goals of Jelly are speed, simplicity, and wide coverage of use cases.

Jelly can work with any RDF data, including RDF-star, RDF 1.1, and generalized RDF.
Jelly can be used to represent streams of triples, quads, graphs, or datasets.
Jelly can also be used to represent a single graph or dataset.
Jelly can be used for streaming data over the network (e.g., with MQTT, Kafka, gRPC), but also for working with flat files.
Jelly can compress RDF data on the fly, without having to know the data in advance.
Jelly is super-fast and lightweight, scaling both down to embedded devices and up to high-performance servers.

How to use it?

To use Jelly you firstly need an implementation of the protocol. There is currently one implementation available: Jelly JVM (Scala), which supports both Apache Jena and Eclipse RDF4J. It also has support for reactive streams and gRPC.

The implementation will support several stream types and patterns that you can use. Which stream type you choose depends on your use case (see stream types below).

All stream types use the same concept of stream frames – discrete elements into which the stream is divided. Each frame contains a number of rows, which are the actual RDF data (RDF triples, quads, etc.). Jelly does not enforce the semantics of stream frames, although it does have a mechanism to suggest to consumers and producers how should they understand the stream. Still, you can interpret the stream however you like.

Why doesn't Jelly enforce the semantics of stream frames?

There are many, many ways in which streams of RDF data can be used – there are different use cases, network protocols, QoS settings, ordering guarantees, stream semantics, etc. One stream is also often viewed from different perspectives by the different actors producing and consuming it. Picking and enforcing specific semantics for stream frames would hopelessly overcomplicate the protocol and make it less useful in some use cases.

Jelly does have a system of logical stream types based on the RDF Stream Taxonomy (RDF-STaX), which can be used to suggest how the stream should be interpreted. However, these are just suggestions – you can interpret the stream however you like.

Stream types

Jelly has the notions of physical stream types and logical stream types. The physical type tells you how Jelly sends the data on the wire, which is a technical detail. The logical type tells you how you should interpret the stream. Specifying the logical type is optional and is only a suggestion to the consumer. You can always interpret the stream however you like.

There are three physical stream types in Jelly:

TRIPLES: Data is encoded using triple statements. There is no information about the graph name in this type of stream.
QUADS: Data is encoded using quad statements. Each quad has a graph name, which can also be empty for the default graph.
GRAPHS: Data is encoded using named graphs, where the graph name can also be empty for the default graph. Each named graph can contain multiple triples.

As for logical stream types, they are taken directly from RDF-STaX – see the RDF-STaX website for a complete list of them. The following table summarizes which physical stream types may be used for each logical stream type. Please note that the table covers only the cases that are directly supported by the Jelly protocol specification and its official implementations.

RDF-STaX (logical type) / Physical type	`TRIPLES`	`QUADS`	`GRAPHS`
Graph stream	Framed	✘	✘
Subject graph stream	Framed	✘	✘
Dataset stream	✘	Framed	Framed
Named graph stream	✘	Framed	Framed
Timestamped named graph stream	✘	Framed	Framed
Flat triple stream	Continuous	✘	✘
Flat quad stream	✘	Continuous	Continuous

The values in the table mean the following:

Framed: Each stream frame corresponds to exactly one logical element of the stream type. For example, in a graph stream, each frame corresponds to a single RDF graph. This usage pattern is common in real-time streaming scenarios like IoT systems.
Continuous: The stream is a continuous sequence of logical elements. For example, in a flat triple stream, the stream is just a sequence of triples.
✘: The physical stream type is not directly supported for the logical stream type. However, you may still find a way to use it, depending on your use case.

The flat logical stream types (flat RDF triple stream and flat RDF quad stream in RDF-STaX) can also be treated as a single RDF graph or RDF dataset, respectively.

Common patterns cookbook

Below you will find some common patterns for using Jelly. These are just examples – you can use Jelly in many other ways. All of the presented patterns are supported in the Jelly JVM (Scala) implementation with the Reactive Streaming module.

Flat RDF triple stream – "just a bunch of triples"

Let's say you want to stream a lot of triples from A to B – maybe you're doing some kind of data migration, or you're sending data to a data lake. You don't care about the graph they belong to – you just want to send a bunch of triples.

This means you are using logically a flat RDF triple stream. It can be physically encoded as as TRIPLES stream, batching the triples into frames of an arbitrary size (let's say, 1000 triples each):

Example (click to expand)

Stream frame 1
- Stream options
- Triple 1
- Triple 2
- ...
- Triple 1000
Stream frame 2
- Triple 1001
- Triple 1002
- ...
- Triple 2000
...

You can then send these frames one-by-one over gRPC or Kafka, or write them to a file. The consumer will be able to read the triples one frame at a time, without having to know how many triples there are in total.

RDF graph stream

In this case we have (for example) an IoT sensor that periodically emits an RDF graph that describes what the sensor saw (something like SOSA/SSN). The graphs may be of different sizes (depending on what the sensor saw) and they can be emitted at different rates (depending on how often the sensor is triggered). We want to stream these graphs to a server that will process them in real-time with no additional latency.

This means you are using logically an RDF graph stream. You can encode it as a TRIPLES stream, where the stream frames correspond to different unnamed (default) graphs:

Example (click to expand)

Stream frame 1
- Stream options
- Triple 1 (of graph 1)
- Triple 2 (of graph 1)
- ...
- Triple 134 (of graph 1)
Stream frame 2
- Triple 1 (of graph 2)
- Triple 2 (of graph 2)
- ...
- Triple 97 (of graph 2)
...

The consumer will be able to read the graphs one frame at a time, without having to know how many graphs there are in total.

RiverBench uses this pattern for distributing its triple streams (see example). Note that in RiverBench the stream may be equivalently considered "just a bunch of triples" – the serialization is the same, it only depends on the interpretation on the side of the consumer.

Flat RDF quad stream – "just a bunch of quads"

You want to stream a lot of quads – similar to the "just a bunch of triples" case above, but you also want to include the graph node. This is logically a flat RDF quad stream. It can be physically encoded as a QUADS stream, batching the quads into frames of an arbitrary size (let's say, 1000 quads each):

Example (click to expand)

Stream frame 1
- Stream options
- Quad 1
- Quad 2
- ...
- Quad 1000
Stream frame 2
- Quad 1001
- Quad 1002
- ...
- Quad 2000
...

The mechanism is exactly the same as with a flat RDF triple stream.

Flat RDF quad stream (as `GRAPHS`)

This a slightly different take on the problem of "just a bunch of quads" – you also want to transmit what is essentially a single RDF dataset, but instead of sending individual quads, you want to send it graph-by-graph. This makes most sense if your data changes on a per-graph basis, or you are streaming a static RDF dataset.

This is logically again a flat RDF quad stream, but it can be physically encoded as a GRAPHS stream, batching the triples in the graphs into frames of an arbitrary size (let's say, 1000 triples each):

Example (click to expand)

Stream frame 1
- Stream options
- Start graph (named 1)
- Triple 1 (of graph 1)
- Triple 2 (of graph 1)
- ...
- Triple 134 (of graph 1)
- End graph
- Start graph (named 2)
- Triple 1 (of graph 2)
- Triple 2 (of graph 2)
- ...
- Triple 97 (of graph 2)
Stream frame 2
- Triple 98 (of graph 2)
- ...
- Triple 130 (of graph 2)
- End graph
- Start graph (named 3)
- Triple 1 (of graph 3)
- Triple 2 (of graph 3)
- ...
- Triple 77 (of graph 3)
- End graph
- Start graph (named 4)
- Triple 1 (of graph 4)
- Triple 2 (of graph 4)
- ...
- Triple 21 (of graph 4)
- End graph
...

Notice that one named graph can span multiple stream frames, and one stream frame can contain multiple graphs. The consumer will be able to read the graphs one frame at a time, without having to know how many graphs there are in total.

RDF dataset stream (as `QUADS`)

You want to stream RDF datasets – similar to the "a stream of graphs" case above, but your elements are entire datasets. This is logically an RDF dataset stream, which can be physically encoded as a QUADS stream, where the stream frames correspond to different datasets:

Example (click to expand)

Stream frame 1
- Stream options
- Quad 1 (of dataset 1)
- Quad 2 (of dataset 1)
- ...
- Quad 454 (of dataset 1)
Stream frame 2
- Quad 1 (of dataset 2)
- Quad 2 (of dataset 2)
- ...
- Quad 323 (of dataset 2)
...

The mechanism is exactly the same as with a triple stream of graphs.

RiverBench uses this pattern for distributing its RDF dataset streams (see example). Note that in RiverBench the stream may be equivalently considered a flat RDF quad stream – the serialization is the same, it only depends on the interpretation on the side of the consumer.

RDF dataset stream (as `GRAPHS`)

You want to stream RDF datasets or a subclass of them – for example timestamped named graphs, using the RSP Data Model, where each stream element is a named graph and a bunch of statements about this graph in the default graph. This can be physically encoded as a GRAPHS stream, where the stream frames correspond to different datasets:

Example (click to expand)

Stream frame 1
- Stream options
- Start graph (default)
- Triple 1 (of default graph, dataset 1)
- Triple 2 (of default graph, dataset 1)
- ...
- Triple 134 (of default graph, dataset 1)
- End graph
- Start graph (named)
- Triple 1 (of named graph, dataset 1)
- Triple 2 (of named graph, dataset 1)
- ...
- Triple 97 (of named graph, dataset 1)
- End graph
Stream frame 2
- Start graph (default)
- Triple 1 (of default graph, dataset 2)
- Triple 2 (of default graph, dataset 2)
- ...
- Triple 77 (of default graph, dataset 2)
- End graph
- Start graph (named)
- Triple 1 (of named graph, dataset 2)
- Triple 2 (of named graph, dataset 2)
- ...
- Triple 21 (of named graph, dataset 2)
- End graph
...

Of course each stream frame could contain more than one named graph, and the graphs can be of different sizes.

Ordering and delivery guarantees

To be able to compress RDF streams on-the-fly, Jelly requires that stream frames are kept strictly in order (see also the spec). This is because the compression algorithm updates its lookup tables dynamically over the course of the stream, and a given frame depends on the lookups defined in previous frames. If the frames are out of order, the compression may fail.

There are use cases where it's hard to guarantee strict ordering of messages, such as IoT messaging (e.g., MQTT with QoS 0) or high-throughput streams with parallel partitions (e.g., Kafka). In these cases you may want to employ one of these strategies:

Emit shared lookup tables at the start of the stream: If you know the vocabulary of the stream, you can emit most of the content of the lookup tables at the start of the stream, and then only update the lookup elements that vary frame-to-frame, keeping the updates local to the frame. This strategy is especially useful in IoT scenarios, where the vocabulary is usually known in advance. You don't need to modify the consumer in this case.
- A variation of this strategy is to communicate the lookup tables over a separate channel before starting the stream. This is useful if you can't guarantee that the lookup tables will be delivered before the stream frames.
Use a "frame ID" to keep track of the order: If you can't guarantee the order of the frames, you can add a "frame ID" to each frame, which will allow the consumer to reorder the frames before processing them. This strategy is useful in high-throughput scenarios, where you can't guarantee the order of the frames. You will need to modify the consumer to reorder the frames before processing them. However, handling failures in this scenario may be complicated.
Use partitions that are guaranteed to be in-order: If you can't guarantee the order of the frames, you can use partitions that are guaranteed to be in-order (e.g., Kafka partitions). Then, each partition should have its own set of lookups (essentially treating each partition as a separate stream in Jelly's terms). This strategy is useful in high-throughput scenarios.

Note that Jelly by default also assumes that frames are delivered at least once. At-least-once delivery is good enough (as long as the order is kept), as lookup updates are idempotent – you may only need to de-duplicate the frames afterwards. At-most-once delivery requires you to make the frames independent of each other, such as with the IoT strategy above.

Implementing Jelly

Note

This section is intended only for those who want to write a new Jelly implementation from scratch. It's much easier to use an existing implementation, such as the JVM (Scala) implementation.

Implementing Jelly from scratch is greatly simplified by the existing Protobuf and RDF libraries. Essentially, the only thing you'll need to do is to glue them together:

Find a Protobuf library for your language. You can find a list of official Protobuf implementations here and a list of community-maintained implementations here.
Use the library to generate the code for the Jelly messages (this usually involves using protoc). You can find the Protobuf definitions in the jelly-protobuf repository.
Find an RDF library for your language. You can find a list of RDF libraries here.
Implement conversions to and/or from the RDF library's data structures. You can find an example of the conversion code in the Jelly JVM (Scala) implementation (core, jena, and rdf4j modules).
In the implementation follow the specification to ensure compatibility.

That's it! You may also want to implement streaming facilities, such as Reactive Streams in Java/Scala. Implementing the gRPC publish/subscribe mechanism follows a very similar procedure – many Protobuf libraries have built-in support for gRPC with code generation.

Jelly user guide

What can it do?

How to use it?

Stream types

Common patterns cookbook

Flat RDF triple stream – "just a bunch of triples"

RDF graph stream

Flat RDF quad stream – "just a bunch of quads"

Flat RDF quad stream (as GRAPHS)

RDF dataset stream (as QUADS)

RDF dataset stream (as GRAPHS)

Ordering and delivery guarantees

Implementing Jelly

Flat RDF quad stream (as `GRAPHS`)

RDF dataset stream (as `QUADS`)

RDF dataset stream (as `GRAPHS`)