Getting started

This guide walks you through installing and working with pyjelly and RDFLib.

Installation (with RDFLib)

Install pyjelly from PyPI:

pip install pyjelly[rdflib]

Requirements

Python 3.10 or newer
Linux, macOS, or Windows

Usage with RDFLib

Once you install pyjelly, it integrates automatically with RDFLib through standard RDFLib API.

Serializing a graph

To serialize a graph to the Jelly format see:

from rdflib import Graph

g = Graph()
g.parse("https://www.w3.org/TR/vocab-ssn/integrated/examples/sunspots.ttl")
g.serialize(destination="foaf.jelly", format="jelly")

This creates a delimited Jelly stream using default options.

Including namespace declarations (prefixes)

By default, Jelly serializes only triples/quads. To also include namespace declarations (prefixes) in the output, enable the namespace_declarations option. Prefixes bound in RDFLib's namespace manager will then be written into the Jelly stream and restored on parsing.

from rdflib import Graph, Namespace, URIRef, Literal
from pyjelly.integrations.rdflib.serialize import SerializerOptions, StreamParameters

# Build a tiny graph and bind a prefix
g = Graph()
EX = Namespace("http://example.org/")
g.namespace_manager.bind("ex", EX)
g.add((EX.alice, URIRef("http://xmlns.com/foaf/0.1/name"), Literal("Alice")))

print("IN  namespaces:", dict(g.namespaces()))

# Enable namespace declarations in Jelly output
options = SerializerOptions(params=StreamParameters(namespace_declarations=True))

# Serialize with options
g.serialize("sample_test.jelly", format="jelly", options=options)

# Parse back and check namespaces
g_new = Graph()
g_new.parse("sample_test.jelly", format="jelly")
print("OUT namespaces:", dict(g_new.namespaces()))

Tip

For an existing graph you can (re)bind a prefix just before saving:

g.namespace_manager.bind("ex", EX, replace=True)

Parsing a graph

To load RDF data from a .jelly file see:

from rdflib import Graph

g = Graph()
g.parse("foaf.jelly", format="jelly")

print("Parsed triples:")
for s, p, o in g:
    print(f"{s} {p} {o}")

RDFLib will reconstruct the graph from the Jelly file.

Parsing a stream of graphs

You can process a Jelly stream as a stream of graphs. A Jelly file consists of "frames" (batches of statements) – we can load each frame as a separate RDFLib graph.

In this example, we use a dataset of weather measurements. We count the number of triples in each graph:

import gzip
import urllib.request

from pyjelly.integrations.rdflib.parse import parse_jelly_grouped

# Dataset: Katrina weather measurements (10k graphs)
# Documentation: https://w3id.org/riverbench/datasets/lod-katrina/dev
url = "https://w3id.org/riverbench/datasets/lod-katrina/dev/files/jelly_10K.jelly.gz"

# Load, uncompress .gz file, and pass to Jelly parser, all in a streaming manner
with (
    urllib.request.urlopen(url) as response,
    gzip.open(response) as jelly_stream,
):
    graphs = parse_jelly_grouped(jelly_stream)
    for i, graph in enumerate(graphs):
        print(f"Graph {i} in the stream has {len(graph)} triples")
        # Limit to 50 graphs for demonstration -- the rest will not be parsed
        if i >= 50:
            break

Each iteration receives only one graph, allowing for processing large datasets efficiently, without exhausting memory.

Parsing a stream of triples

You can also process a Jelly stream as a flat stream of triples.

We look through a fragment of Denmark's OpenStreetMap to find all city names:

import gzip
import urllib.request

from pyjelly.integrations.rdflib.parse import parse_jelly_flat, Triple
from rdflib import URIRef

# Dataset: OpenStreetMap data for Denmark (first 10k objects)
# Documentation: https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev
url = (
    "https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev/files/jelly_10K.jelly.gz"
)

# We are looking for city names in the dataset
predicate_to_look_for = URIRef("https://www.openstreetmap.org/wiki/Key:addr:city")
city_names = set()

with (
    urllib.request.urlopen(url) as response,
    gzip.open(response) as jelly_stream,
):
    for event in parse_jelly_flat(jelly_stream):
        if isinstance(event, Triple):  # we are only interested in triples
            if event.p == predicate_to_look_for:
                city_names.add(event.o)

print(f"Found {len(city_names)} unique city names in the dataset.")
print("10 random city names:")
for city in list(city_names)[:10]:
    print(f"- {city}")

parse_jelly_flat returns a generator of stream events (i.e., statements parsed). This case allows you to efficiently process the file triple-by-triple and build custom aggregations from the stream.

Serializing a stream of graphs

If you have a generator object containing graphs, you can easily serialize it into the Jelly format:

from pyjelly.integrations.rdflib.serialize import grouped_stream_to_file

from rdflib import Graph, Literal, Namespace
import random


def generate_sample_graphs():
    ex = Namespace("http://example.org/")
    for _ in range(10):
        g = Graph()
        g.add((ex.sensor, ex.temperature, Literal(random.random())))
        g.add((ex.sensor, ex.humidity, Literal(random.random())))
        yield g


output_file_name = "output.jelly"

print(f"Streaming graphs into {output_file_name}…")
sample_graphs = generate_sample_graphs()
with open(output_file_name, "wb") as out_file:
    grouped_stream_to_file(sample_graphs, out_file)
print("All done.")

This method allows for transmitting logically grouped data, preserving their original division. For more precise control over frame serialization you can use lower-level API

Serializing a stream of statements

If you have a generator object containing statements, you can easily serialize it into the Jelly format:

from pyjelly.integrations.rdflib.serialize import flat_stream_to_file
from rdflib import Literal, Namespace
import random


# example generator with triples statements
def generate_sample_triples():
    ex = Namespace("http://example.org/")
    for _ in range(10):
        yield (ex.sensor, ex.temperature, Literal(random.random()))


output_file_name = "flat_output.jelly"

print(f"Streaming triples into {output_file_name}…")
sample_triples = generate_sample_triples()
with open(output_file_name, "wb") as out_file:
    flat_stream_to_file(sample_triples, out_file)
print("All done.")

The flat method transmits the data as a continuous sequence of statements, keeping it simple and ordered. For more precise control over frame serialization you can use lower-level API

File extension support

You can generally omit the format="jelly" parameter if the file ends in .jelly – RDFLib will auto-detect the format:

from rdflib import Graph
import pyjelly.integrations.rdflib

g = Graph()
g.parse("foaf.jelly")

Warning

Unfortunately, the way this is implemented in RDFLib is a bit wonky, so it will only work if you explicitly import pyjelly.integrations.rdflib, or you used format="jelly" in the serialize() or parse() call before.