Skip to content

Getting started

This guide walks you through installing and working with pyjelly and RDFLib.

Installation (with RDFLib)

Install pyjelly from PyPI:

pip install pyjelly[rdflib]

Requirements

  • Python 3.9 or newer
  • Linux, macOS, or Windows

Usage with RDFLib

Once you install pyjelly, it integrates automatically with RDFLib through standard RDFLib API.

Serializing a graph

To serialize a graph to the Jelly format see:

from rdflib import Graph

g = Graph()
g.parse("http://xmlns.com/foaf/spec/index.rdf")
g.serialize(destination="foaf.jelly", format="jelly")

This creates a delimited Jelly stream using default options.

Parsing a graph

To load RDF data from a .jelly file see:

from rdflib import Graph

g = Graph()
g.parse("foaf.jelly", format="jelly")

print("Parsed triples:")
for s, p, o in g:
    print(f"{s} {p} {o}")

RDFLib will reconstruct the graph from the Jelly file.

Parsing a stream of graphs

You can process a Jelly stream as a stream of graphs. A Jelly file consists of "frames" (batches of statements) – we can load each frame as a separate RDFLib graph.

In this example, we use a dataset of weather measurements. We count the number of triples in each graph:

import gzip
import urllib.request

from pyjelly.integrations.rdflib.parse import parse_jelly_grouped

# Dataset: Katrina weather measurements (10k graphs)
# Documentation: https://w3id.org/riverbench/datasets/lod-katrina/dev
url = "https://w3id.org/riverbench/datasets/lod-katrina/dev/files/jelly_10K.jelly.gz"

# Load, uncompress .gz file, and pass to Jelly parser, all in a streaming manner
with (
    urllib.request.urlopen(url) as response,
    gzip.open(response) as jelly_stream,
):
    graphs = parse_jelly_grouped(jelly_stream)
    for i, graph in enumerate(graphs):
        print(f"Graph {i} in the stream has {len(graph)} triples")
        # Limit to 50 graphs for demonstration -- the rest will not be parsed
        if i >= 50:
            break

Each iteration receives only one graph, allowing for processing large datasets efficiently, without exhausting memory.

Parsing a stream of triples

You can also process a Jelly stream as a flat stream of triples.

We look through a fragment of Denmark's OpenStreetMap to find all city names:

import gzip
import urllib.request

from pyjelly.integrations.rdflib.parse import parse_jelly_flat, Triple
from rdflib import URIRef

# Dataset: OpenStreetMap data for Denmark (first 10k objects)
# Documentation: https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev
url = (
    "https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev/files/jelly_10K.jelly.gz"
)

# We are looking for city names in the dataset
predicate_to_look_for = URIRef("https://www.openstreetmap.org/wiki/Key:addr:city")
city_names = set()

with (
    urllib.request.urlopen(url) as response,
    gzip.open(response) as jelly_stream,
):
    for event in parse_jelly_flat(jelly_stream):
        if isinstance(event, Triple):  # we are only interested in triples
            if event.p == predicate_to_look_for:
                city_names.add(event.o)

print(f"Found {len(city_names)} unique city names in the dataset.")
print("10 random city names:")
for city in list(city_names)[:10]:
    print(f"- {city}")

parse_jelly_flat returns a generator of stream events (i.e., statements parsed). This case allows you to efficiently process the file triple-by-triple and build custom aggregations from the stream.

Serializing a stream of graphs

If you have a generator object containing graphs, you can easily serialize it into the Jelly format:

from pyjelly.integrations.rdflib.serialize import grouped_stream_to_file

from rdflib import Graph, Literal, Namespace
import random


def generate_sample_graphs():
    ex = Namespace("http://example.org/")
    for _ in range(10):
        g = Graph()
        g.add((ex.sensor, ex.temperature, Literal(random.random())))
        g.add((ex.sensor, ex.humidity, Literal(random.random())))
        yield g


output_file_name = "output.jelly"

print(f"Streaming graphs into {output_file_name}…")
sample_graphs = generate_sample_graphs()
with open(output_file_name, "wb") as out_file:
    grouped_stream_to_file(sample_graphs, out_file)
print("All done.")

This method allows for transmitting logically grouped data, preserving their original division. For more precise control over frame serialization you can use lower-level API

Serializing a stream of statements

If you have a generator object containing statements, you can easily serialize it into the Jelly format:

from pyjelly.integrations.rdflib.serialize import flat_stream_to_file
from rdflib import Literal, Namespace
import random


# example generator with triples statements
def generate_sample_triples():
    ex = Namespace("http://example.org/")
    for _ in range(10):
        yield (ex.sensor, ex.temperature, Literal(random.random()))


output_file_name = "flat_output.jelly"

print(f"Streaming triples into {output_file_name}…")
sample_triples = generate_sample_triples()
with open(output_file_name, "wb") as out_file:
    flat_stream_to_file(sample_triples, out_file)
print("All done.")

The flat method transmits the data as a continuous sequence of statements, keeping it simple and ordered. For more precise control over frame serialization you can use lower-level API

File extension support

You can generally omit the format="jelly" parameter if the file ends in .jelly – RDFLib will auto-detect the format:

from rdflib import Graph
import pyjelly.integrations.rdflib

g = Graph()
g.parse("foaf.jelly")

Warning

Unfortunately, the way this is implemented in RDFLib is a bit wonky, so it will only work if you explicitly import pyjelly.integrations.rdflib, or you used format="jelly" in the serialize() or parse() call before.