Jelly user guide

To use Jelly, pick an implementation that matches your tech stack:

Jelly-JVM – written in Java, integrated with Apache Jena, RDF4J, and Titanium.
pyjelly – written in Python, integrated with RDFLib.
jelly.rs (experimental) – written in Rust, integrated with Sophia.
jelly-cli – command-line tool, works on Windows, macOS, and Linux.

You can also create your own implementation. Because Jelly is built on top of Protocol Buffers, you can generate most of the code automatically in any popular programming language.

What can it do?

Jelly is designed to be a protocol for streaming RDF knowledge graphs, but it can also be used with static RDF datasets. Jelly was designed to be fast, well-compressed, and versatile.

Jelly can work with any RDF knowledge graph data, including RDF 1.1, RDF-star, and generalized RDF.
Jelly can be used to represent streams of triples, quads, graphs, or datasets.
Jelly can also be used to represent a single graph or dataset.
Jelly-Patch can be used to record changes to RDF datasets, including add/delete operations and transactions.
Jelly can be used for streaming data over the network (e.g., with MQTT, Kafka, gRPC), but also for working with flat files.
Jelly can compress RDF data on the fly, without having to know the data in advance.
Jelly is super-fast and lightweight, scaling down to IoT and up to high-performance servers.

Quick start

CLI tool

The easiest way to do something with Jelly is with the jelly-cli command line tool.

Go to the jelly-cli download page and download a binary for you platform.
- Alternatively, you can download the jelly-cli.jar file and run it with java -jar jelly-cli.jar.
Run ./jelly-cli rdf to-jelly some-rdf-file.ttl > output.jelly to convert an RDF file to Jelly.
Run ./jelly-cli rdf from-jelly output.jelly to convert the Jelly file back to RDF.
Run ./jelly-cli --help to see all available commands.

You can find more information about the tool in its README on GitHub.

Example Jelly files

Go check out the Use cases page where we list links to example datasets in the Jelly format.

Apache Jena / RDF4J plugins

Check out the dedicated guide for installing plugins in Jena and RDF4J. You can use them to quickly add Jelly support to, for example, Apache Jena Fuseki and load Jelly files just like any other RDF file.

Java & Scala programming

Go see the Jelly-JVM getting started guide for devs. It contains a lot of examples and code snippets for using Jelly in Java and Scala, with Jena, RDF4J, and Titanium.

Python programming

See the pyjelly getting started guide. It contains examples and code snippets for using Jelly with rdflib.

How to use it – in detail

Jelly supports several stream types – which stream type you choose depends on your use case (see stream types below).

All stream types use the same concept of stream frames – discrete elements into which the stream is divided. Each frame contains a number of rows, which are the actual RDF data (RDF triples, quads, etc.). Jelly does not enforce the semantics of stream frames, although it does have a mechanism to suggest to consumers and producers how should they understand the stream. Still, you can interpret the stream however you like.

Why doesn't Jelly enforce the semantics of stream frames?

There are many, many ways in which streams of RDF data can be used – there are different use cases, network protocols, QoS settings, ordering guarantees, stream semantics, etc. One stream is also often viewed from different perspectives by the different actors producing and consuming it. Picking and enforcing specific semantics for stream frames would hopelessly overcomplicate the protocol and make it less useful in some use cases.

Jelly does have a system of logical stream types based on the RDF Stream Taxonomy (RDF-STaX), which can be used to suggest how the stream should be interpreted. However, these are just suggestions – you can interpret the stream however you like.

Stream types

Jelly has the notions of physical stream types and logical stream types. The physical type tells you how Jelly sends the data on the wire, which is a technical detail. The logical type tells you how you should interpret the stream. Specifying the logical type is optional and is only a suggestion to the consumer. You can always interpret the stream however you like.

There are three physical stream types in Jelly:

TRIPLES: Data is encoded using triple statements. There is no information about the graph name in this type of stream.
QUADS: Data is encoded using quad statements. Each quad has a graph name, which can also be the default graph.
GRAPHS: Data is encoded using named graphs, where the graph name can also be the default graph. Each named graph can contain multiple triples.

As for logical stream types, they are taken directly from RDF-STaX – see the RDF-STaX website for a complete list of them. The following table summarizes which physical stream types may be used for each logical stream type. Please note that the table covers only the cases that are directly supported by the Jelly protocol specification and its official implementations.

RDF-STaX (logical type) / Physical type	`TRIPLES`	`QUADS`	`GRAPHS`
Graph stream	Framed	✘	✘
Subject graph stream	Framed	✘	✘
Dataset stream	✘	Framed	Framed
Named graph stream	✘	Framed	Framed
Timestamped named graph stream	✘	Framed	Framed
Flat triple stream	Continuous	✘	✘
Flat quad stream	✘	Continuous	Continuous

The values in the table mean the following:

Framed: Each stream frame corresponds to exactly one logical element of the stream type. For example, in a graph stream, each frame corresponds to a single RDF graph. This usage pattern is common in real-time streaming scenarios like IoT systems.
Continuous: The stream is a continuous sequence of logical elements. For example, in a flat triple stream, the stream is just a sequence of triples.
✘: The physical stream type is not directly supported for the logical stream type. However, you may still find a way to use it, depending on your use case.

The flat logical stream types (flat RDF triple stream and flat RDF quad stream in RDF-STaX) can also be treated as a single RDF graph or RDF dataset, respectively.

Common patterns cookbook

Below you will find some common patterns for using Jelly. These are just examples – you can use Jelly in many other ways. All of the presented patterns are supported in the Jelly-JVM implementation with the Reactive Streaming module.

Flat RDF triple stream – "just a bunch of triples"

Let's say you want to stream a lot of triples from A to B – maybe you're doing some kind of data migration, or you're sending data to a data lake. You don't care about the graph they belong to – you just want to send a bunch of triples.

This means you are using logically a flat RDF triple stream. It can be physically encoded as as TRIPLES stream, batching the triples into frames of an arbitrary size (let's say, 1000 triples each):

Example (click to expand)

Stream frame 1
- Stream options
- Triple 1
- Triple 2
- ...
- Triple 1000
Stream frame 2
- Triple 1001
- Triple 1002
- ...
- Triple 2000
...

You can then send these frames one-by-one over gRPC or Kafka, or write them to a file. The consumer will be able to read the triples one frame at a time, without having to know how many triples there are in total.

RDF graph stream

In this case we have (for example) an IoT sensor that periodically emits an RDF graph that describes what the sensor saw (something like SOSA/SSN). The graphs may be of different sizes (depending on what the sensor saw) and they can be emitted at different rates (depending on how often the sensor is triggered). We want to stream these graphs to a server that will process them in real-time with no additional latency.

This means you are using logically an RDF graph stream. You can encode it as a TRIPLES stream, where the stream frames correspond to different unnamed (default) graphs:

Example (click to expand)

Stream frame 1
- Stream options
- Triple 1 (of graph 1)
- Triple 2 (of graph 1)
- ...
- Triple 134 (of graph 1)
Stream frame 2
- Triple 1 (of graph 2)
- Triple 2 (of graph 2)
- ...
- Triple 97 (of graph 2)
...

The consumer will be able to read the graphs one frame at a time, without having to know how many graphs there are in total.

RiverBench uses this pattern for distributing its triple streams (see example). Note that in RiverBench the stream may be equivalently considered "just a bunch of triples" – the serialization is the same, it only depends on the interpretation on the side of the consumer.

Flat RDF quad stream – "just a bunch of quads"

You want to stream a lot of quads – similar to the "just a bunch of triples" case above, but you also want to include the graph node. This is logically a flat RDF quad stream. It can be physically encoded as a QUADS stream, batching the quads into frames of an arbitrary size (let's say, 1000 quads each):

Example (click to expand)

Stream frame 1
- Stream options
- Quad 1
- Quad 2
- ...
- Quad 1000
Stream frame 2
- Quad 1001
- Quad 1002
- ...
- Quad 2000
...

The mechanism is exactly the same as with a flat RDF triple stream.

Flat RDF quad stream (as `GRAPHS`)

This a slightly different take on the problem of "just a bunch of quads" – you also want to transmit what is essentially a single RDF dataset, but instead of sending individual quads, you want to send it graph-by-graph. This makes most sense if your data changes on a per-graph basis, or you are streaming a static RDF dataset.

This is logically again a flat RDF quad stream, but it can be physically encoded as a GRAPHS stream, batching the triples in the graphs into frames of an arbitrary size (let's say, 1000 triples each):

Example (click to expand)

Stream frame 1
- Stream options
- Start graph (named 1)
- Triple 1 (of graph 1)
- Triple 2 (of graph 1)
- ...
- Triple 134 (of graph 1)
- End graph
- Start graph (named 2)
- Triple 1 (of graph 2)
- Triple 2 (of graph 2)
- ...
- Triple 97 (of graph 2)
Stream frame 2
- Triple 98 (of graph 2)
- ...
- Triple 130 (of graph 2)
- End graph
- Start graph (named 3)
- Triple 1 (of graph 3)
- Triple 2 (of graph 3)
- ...
- Triple 77 (of graph 3)
- End graph
- Start graph (named 4)
- Triple 1 (of graph 4)
- Triple 2 (of graph 4)
- ...
- Triple 21 (of graph 4)
- End graph
...

Notice that one named graph can span multiple stream frames, and one stream frame can contain multiple graphs. The consumer will be able to read the graphs one frame at a time, without having to know how many graphs there are in total.

RDF dataset stream (as `QUADS`)

You want to stream RDF datasets – similar to the "a stream of graphs" case above, but your elements are entire datasets. This is logically an RDF dataset stream, which can be physically encoded as a QUADS stream, where the stream frames correspond to different datasets:

Example (click to expand)

Stream frame 1
- Stream options
- Quad 1 (of dataset 1)
- Quad 2 (of dataset 1)
- ...
- Quad 454 (of dataset 1)
Stream frame 2
- Quad 1 (of dataset 2)
- Quad 2 (of dataset 2)
- ...
- Quad 323 (of dataset 2)
...

The mechanism is exactly the same as with a triple stream of graphs.

RiverBench uses this pattern for distributing its RDF dataset streams (see example). Note that in RiverBench the stream may be equivalently considered a flat RDF quad stream – the serialization is the same, it only depends on the interpretation on the side of the consumer.

RDF dataset stream (as `GRAPHS`)

You want to stream RDF datasets or a subclass of them – for example timestamped named graphs, using the RSP Data Model, where each stream element is a named graph and a bunch of statements about this graph in the default graph. This can be physically encoded as a GRAPHS stream, where the stream frames correspond to different datasets:

Example (click to expand)

Stream frame 1
- Stream options
- Start graph (default)
- Triple 1 (of default graph, dataset 1)
- Triple 2 (of default graph, dataset 1)
- ...
- Triple 134 (of default graph, dataset 1)
- End graph
- Start graph (named)
- Triple 1 (of named graph, dataset 1)
- Triple 2 (of named graph, dataset 1)
- ...
- Triple 97 (of named graph, dataset 1)
- End graph
Stream frame 2
- Start graph (default)
- Triple 1 (of default graph, dataset 2)
- Triple 2 (of default graph, dataset 2)
- ...
- Triple 77 (of default graph, dataset 2)
- End graph
- Start graph (named)
- Triple 1 (of named graph, dataset 2)
- Triple 2 (of named graph, dataset 2)
- ...
- Triple 21 (of named graph, dataset 2)
- End graph
...

Of course each stream frame could contain more than one named graph, and the graphs can be of different sizes.

Ordering and delivery guarantees

To be able to compress RDF streams on-the-fly, Jelly requires that stream frames are kept strictly in order (see also the spec). This is because the compression algorithm updates its lookup tables dynamically over the course of the stream, and a given frame depends on the lookups defined in previous frames. If the frames are out of order, the compression may fail.

There are use cases where it's hard to guarantee strict ordering of messages, such as IoT messaging (e.g., MQTT with QoS 0) or high-throughput streams with parallel partitions (e.g., Kafka). In these cases you may want to employ one of these strategies:

Emit shared lookup tables at the start of the stream: If you know the vocabulary of the stream, you can emit most of the content of the lookup tables at the start of the stream, and then only update the lookup elements that vary frame-to-frame, keeping the updates local to the frame. This strategy is especially useful in IoT scenarios, where the vocabulary is usually known in advance. You don't need to modify the consumer in this case.
- A variation of this strategy is to communicate the lookup tables over a separate channel before starting the stream. This is useful if you can't guarantee that the lookup tables will be delivered before the stream frames.
Use a "frame ID" to keep track of the order: If you can't guarantee the order of the frames, you can add a "frame ID" to each frame, which will allow the consumer to reorder the frames before processing them. This strategy is useful in high-throughput scenarios, where you can't guarantee the order of the frames. You will need to modify the consumer to reorder the frames before processing them. However, handling failures in this scenario may be complicated.
Use partitions that are guaranteed to be in-order: If you can't guarantee the order of the frames, you can use partitions that are guaranteed to be in-order (e.g., Kafka partitions). Then, each partition should have its own set of lookups (essentially treating each partition as a separate stream in Jelly's terms). This strategy is useful in high-throughput scenarios.

Note that Jelly by default also assumes that frames are delivered at least once. At-least-once delivery is good enough (as long as the order is kept), as lookup updates are idempotent – you may only need to de-duplicate the frames afterwards. At-most-once delivery requires you to make the frames independent of each other, such as with the IoT strategy above.

Delimited vs. non-delimited Jelly

Protobuf messages by default are not delimited. This means that when you serialize a Protobuf message (e.g., a Jelly stream frame), the serialization does not include any information about where the message ends. This is fine when there is something else telling the parser where the message ends – for example, when you're sending the message over a gRPC, Kafka, or MQTT stream, the streaming protocol tells the parser how long the message is. However, if you wanted to write multiple stream frames to a file, you would need to add some kind of delimiter between the frames – otherwise the parser would not know where one frame ends and the next one begins.

So, to summarize:

Use case	Jelly variant	Description
Jelly gRPC streaming protocol	Non-delimited	The gRPC protocol tells the parser how long the message is.
Streaming with Kafka, MQTT, or similar	Non-delimited	The streaming protocol tells the parser how long the message is.
Writing to a file	Delimited	You need to add a delimiter between the frames.
Writing to a raw network socket	Delimited	You need to add a delimiter between the frames.

The delimited variant works by adding an integer before the stream frame that specifies the length of the frame, in bytes. That's it.

You can read more about how this works in the serialization format specification.

Examples

Jelly-JVM supports both variants, but uses them in different contexts. When writing to a Java byte stream (typically a file) with Apache Jena RIOT or RDF4J Rio, the delimited variant is used. However, the RIOT/Rio integrations can parse either delimited or non-delimited Jelly data. In the gRPC protocol, the non-delimited variant is used.
RiverBench publishes its RDF metadata and datasets as Jelly files. These files are always written using the delimited variant.

Implementing Jelly

Note

This section is intended only for those who want to write a new Jelly implementation from scratch. It's much easier to use an existing implementation, such as the JVM implementation.

Implementing Jelly from scratch is greatly simplified by the existing Protobuf and RDF libraries. Essentially, the only thing you'll need to do is to glue them together:

Find a Protobuf library for your language. You can find a list of official Protobuf implementations here and a list of community-maintained implementations here.
Use the library to generate the code for the Jelly messages (this usually involves using protoc). You can find the Protobuf definitions in the jelly-protobuf repository.
Find an RDF library for your language. You can find a list of RDF libraries here.
Implement conversions to and/or from the RDF library's data structures. You can find an example of the conversion code in the Jelly-JVM implementation (core, jena, and rdf4j modules).
In the implementation follow the specification to ensure compatibility.

That's it! You may also want to implement streaming facilities, such as Reactive Streams in Java/Scala. Implementing the gRPC publish/subscribe mechanism follows a very similar procedure – many Protobuf libraries have built-in support for gRPC with code generation.

Jelly user guide

What can it do?

Quick start

CLI tool

Apache Jena / RDF4J plugins

Java & Scala programming

Python programming

How to use it – in detail

Stream types

Common patterns cookbook

Flat RDF triple stream – "just a bunch of triples"

RDF graph stream

Flat RDF quad stream – "just a bunch of quads"

Flat RDF quad stream (as GRAPHS)

RDF dataset stream (as QUADS)

RDF dataset stream (as GRAPHS)

Ordering and delivery guarantees

Delimited vs. non-delimited Jelly

Examples

Implementing Jelly

See also

Flat RDF quad stream (as `GRAPHS`)

RDF dataset stream (as `QUADS`)

RDF dataset stream (as `GRAPHS`)