Skip to content

Getting started

This guide shows how to install pyjelly and prepare your environment for use with RDFLib.

Installation (with RDFLib)

Install pyjelly from PyPI:

pip install pyjelly[rdflib]

pyjelly requires Python 3.9 or newer and works on all major platforms (Linux, macOS, Windows).

Usage with RDFLib

Once installed, pyjelly integrates with RDFLib automatically. You can immediately serialize and parse .jelly files using the standard RDFLib API.

Serializing a graph

To serialize a graph to the Jelly format:

from rdflib import Graph

g = Graph()
g.parse("http://xmlns.com/foaf/spec/index.rdf")
g.serialize(destination="foaf.jelly", format="jelly")

This creates a delimited Jelly stream using default options.

Parsing a graph

To load RDF data from a .jelly file:

from rdflib import Graph

g = Graph()
g.parse("foaf.jelly", format="jelly")

print("Parsed triples:")
for s, p, o in g:
    print(f"{s} {p} {o}")

RDFLib will reconstruct the graph from the Jelly file.

Parsing a stream of graphs

You can process a Jelly stream as a stream of graphs. A Jelly file consists of "frames" (batches of statements) – we can load each frame as a separate RDFLib graph.

In this example, we use a dataset of weather measurements, which is an RDF graph stream. We count the number of triples in each graph:

import gzip
import urllib.request

from pyjelly.integrations.rdflib.parse import parse_jelly_grouped

# Dataset: Katrina weather measurements (10k graphs)
# Documentation: https://w3id.org/riverbench/datasets/lod-katrina/dev
url = "https://w3id.org/riverbench/datasets/lod-katrina/dev/files/jelly_10K.jelly.gz"

# Load, uncompress .gz file, and pass to Jelly parser, all in a streaming manner
with (
    urllib.request.urlopen(url) as response,
    gzip.open(response) as jelly_stream,
):
    graphs = parse_jelly_grouped(jelly_stream)
    for i, graph in enumerate(graphs):
        print(f"Graph {i} in the stream has {len(graph)} triples")

Because parse_jelly_grouped returns a generator, each iteration receives one graph, keeping memory usage bounded to the current frame. So, large datasets and live streams can be processed efficiently.

Parsing a stream of triples

You can also process a Jelly stream as a flat stream of triples.

In this more complex example, we look through a fragment of Denmark's OpenStreetMap to find all city names:

import gzip
import urllib.request

from pyjelly.integrations.rdflib.parse import parse_jelly_flat, Triple
from rdflib import URIRef

# Dataset: OpenStreetMap data for Denmark (first 10k objects)
# Documentation: https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev
url = (
    "https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev/files/jelly_10K.jelly.gz"
)

# We are looking for city names in the dataset
predicate_to_look_for = URIRef("https://www.openstreetmap.org/wiki/Key:addr:city")
city_names = set()

with (
    urllib.request.urlopen(url) as response,
    gzip.open(response) as jelly_stream,
):
    for event in parse_jelly_flat(jelly_stream):
        if isinstance(event, Triple):  # we are only interested in triples
            if event.p == predicate_to_look_for:
                city_names.add(event.o)

print(f"Found {len(city_names)} unique city names in the dataset.")
print("10 random city names:")
for city in list(city_names)[:10]:
    print(f"- {city}")

parse_jelly_flat returns a generator of stream events (i.e., statements parsed). This allows you to efficiently process the file triple-by-triple and build custom aggregations from the stream.

Serializing a stream of graphs

If you have a generator object containing graphs, you can easily serialize it into the Jelly format, like in the following example:

from pyjelly.integrations.rdflib.serialize import grouped_stream_to_file

from rdflib import Graph, Literal, Namespace
import random


def generate_sample_graphs():
    ex = Namespace("http://example.org/")
    for _ in range(10):
        g = Graph()
        g.add((ex.sensor, ex.temperature, Literal(random.random())))
        g.add((ex.sensor, ex.humidity, Literal(random.random())))
        yield g


output_file_name = "output.jelly"

print(f"Streaming graphs into {output_file_name}…")
sample_graphs = generate_sample_graphs()
with open(output_file_name, "wb") as out_file:
    grouped_stream_to_file(sample_graphs, out_file)
print("All done.")

This method allows for transmitting logically grouped data, preserving their original division. For more precise control over frame serialization you can use lower-level API

Serializing a stream of statements

If you have a generator object containing statements, you can easily serialize it into the Jelly format, like in the following example:

from pyjelly.integrations.rdflib.serialize import flat_stream_to_file
from rdflib import Literal, Namespace
import random


# example generator with triples statements
def generate_sample_triples():
    ex = Namespace("http://example.org/")
    for _ in range(10):
        yield (ex.sensor, ex.temperature, Literal(random.random()))


output_file_name = "flat_output.jelly"

print(f"Streaming triples into {output_file_name}…")
sample_triples = generate_sample_triples()
with open(output_file_name, "wb") as out_file:
    flat_stream_to_file(sample_triples, out_file)
print("All done.")

The flat method transmits the data as a continuous sequence of individual statements (i.e., triples or quads), keeping the simplicity and order of the data. For more precise control over frame serialization you can use lower-level API

Serializing a stream of graphs

If you have a generator object containing graphs, you can easily serialize it into the Jelly format, like in the following example:

from pyjelly.integrations.rdflib.serialize import grouped_stream_to_file

from rdflib import Graph, Literal, Namespace
import random


def generate_sample_graphs():
    ex = Namespace("http://example.org/")
    for _ in range(10):
        g = Graph()
        g.add((ex.sensor, ex.temperature, Literal(random.random())))
        g.add((ex.sensor, ex.humidity, Literal(random.random())))
        yield g


output_file_name = "output.jelly"

print(f"Streaming graphs into {output_file_name}…")
sample_graphs = generate_sample_graphs()
with open(output_file_name, "wb") as out_file:
    grouped_stream_to_file(sample_graphs, out_file)
print("All done.")

This method allows for transmitting logically grouped data, preserving their original division. For more precise control over frame serialization you can use lower-level API

File extension support

You can generally omit the format="jelly" parameter if the file ends in .jelly – RDFLib will auto-detect the format:

from rdflib import Graph
import pyjelly.integrations.rdflib

g = Graph()
g.parse("foaf.jelly")

Warning

Unfortunately, the way this is implemented in RDFLib is a bit wonky, so it will only work if you explicitly import pyjelly.integrations.rdflib, or you used format="jelly" in the serialize() or parse() call before.