Getting started
This guide shows how to install pyjelly and prepare your environment for use with RDFLib.
Installation (with RDFLib)
Install pyjelly from PyPI:
pyjelly requires Python 3.9 or newer and works on all major platforms (Linux, macOS, Windows).
Usage with RDFLib
Once installed, pyjelly integrates with RDFLib automatically. You can immediately serialize and parse .jelly
files using the standard RDFLib API.
Serializing a graph
To serialize a graph to the Jelly format:
from rdflib import Graph
g = Graph()
g.parse("http://xmlns.com/foaf/spec/index.rdf")
g.serialize(destination="foaf.jelly", format="jelly")
This creates a delimited Jelly stream using default options.
Parsing a graph
To load RDF data from a .jelly
file:
from rdflib import Graph
g = Graph()
g.parse("foaf.jelly", format="jelly")
print("Parsed triples:")
for s, p, o in g:
print(f"{s} {p} {o}")
RDFLib will reconstruct the graph from the Jelly file.
Parsing a stream of graphs
You can process a Jelly stream as a stream of graphs. A Jelly file consists of "frames" (batches of statements) – we can load each frame as a separate RDFLib graph.
In this example, we use a dataset of weather measurements, which is an RDF graph stream. We count the number of triples in each graph:
import gzip
import urllib.request
from pyjelly.integrations.rdflib.parse import parse_jelly_grouped
# Dataset: Katrina weather measurements (10k graphs)
# Documentation: https://w3id.org/riverbench/datasets/lod-katrina/dev
url = "https://w3id.org/riverbench/datasets/lod-katrina/dev/files/jelly_10K.jelly.gz"
# Load, uncompress .gz file, and pass to Jelly parser, all in a streaming manner
with (
urllib.request.urlopen(url) as response,
gzip.open(response) as jelly_stream,
):
graphs = parse_jelly_grouped(jelly_stream)
for i, graph in enumerate(graphs):
print(f"Graph {i} in the stream has {len(graph)} triples")
Because parse_jelly_grouped
returns a generator, each iteration receives one graph, keeping memory usage bounded to the current frame. So, large datasets and live streams can be processed efficiently.
Parsing a stream of triples
You can also process a Jelly stream as a flat stream of triples.
In this more complex example, we look through a fragment of Denmark's OpenStreetMap to find all city names:
import gzip
import urllib.request
from pyjelly.integrations.rdflib.parse import parse_jelly_flat, Triple
from rdflib import URIRef
# Dataset: OpenStreetMap data for Denmark (first 10k objects)
# Documentation: https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev
url = (
"https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev/files/jelly_10K.jelly.gz"
)
# We are looking for city names in the dataset
predicate_to_look_for = URIRef("https://www.openstreetmap.org/wiki/Key:addr:city")
city_names = set()
with (
urllib.request.urlopen(url) as response,
gzip.open(response) as jelly_stream,
):
for event in parse_jelly_flat(jelly_stream):
if isinstance(event, Triple): # we are only interested in triples
if event.p == predicate_to_look_for:
city_names.add(event.o)
print(f"Found {len(city_names)} unique city names in the dataset.")
print("10 random city names:")
for city in list(city_names)[:10]:
print(f"- {city}")
parse_jelly_flat
returns a generator of stream events (i.e., statements parsed). This allows you to efficiently process the file triple-by-triple and build custom aggregations from the stream.
Serializing a stream of graphs
If you have a generator object containing graphs, you can easily serialize it into the Jelly format, like in the following example:
from pyjelly.integrations.rdflib.serialize import grouped_stream_to_file
from rdflib import Graph, Literal, Namespace
import random
def generate_sample_graphs():
ex = Namespace("http://example.org/")
for _ in range(10):
g = Graph()
g.add((ex.sensor, ex.temperature, Literal(random.random())))
g.add((ex.sensor, ex.humidity, Literal(random.random())))
yield g
output_file_name = "output.jelly"
print(f"Streaming graphs into {output_file_name}…")
sample_graphs = generate_sample_graphs()
with open(output_file_name, "wb") as out_file:
grouped_stream_to_file(sample_graphs, out_file)
print("All done.")
This method allows for transmitting logically grouped data, preserving their original division. For more precise control over frame serialization you can use lower-level API
Serializing a stream of statements
If you have a generator object containing statements, you can easily serialize it into the Jelly format, like in the following example:
from pyjelly.integrations.rdflib.serialize import flat_stream_to_file
from rdflib import Literal, Namespace
import random
# example generator with triples statements
def generate_sample_triples():
ex = Namespace("http://example.org/")
for _ in range(10):
yield (ex.sensor, ex.temperature, Literal(random.random()))
output_file_name = "flat_output.jelly"
print(f"Streaming triples into {output_file_name}…")
sample_triples = generate_sample_triples()
with open(output_file_name, "wb") as out_file:
flat_stream_to_file(sample_triples, out_file)
print("All done.")
The flat method transmits the data as a continuous sequence of individual statements (i.e., triples or quads), keeping the simplicity and order of the data. For more precise control over frame serialization you can use lower-level API
Serializing a stream of graphs
If you have a generator object containing graphs, you can easily serialize it into the Jelly format, like in the following example:
from pyjelly.integrations.rdflib.serialize import grouped_stream_to_file
from rdflib import Graph, Literal, Namespace
import random
def generate_sample_graphs():
ex = Namespace("http://example.org/")
for _ in range(10):
g = Graph()
g.add((ex.sensor, ex.temperature, Literal(random.random())))
g.add((ex.sensor, ex.humidity, Literal(random.random())))
yield g
output_file_name = "output.jelly"
print(f"Streaming graphs into {output_file_name}…")
sample_graphs = generate_sample_graphs()
with open(output_file_name, "wb") as out_file:
grouped_stream_to_file(sample_graphs, out_file)
print("All done.")
This method allows for transmitting logically grouped data, preserving their original division. For more precise control over frame serialization you can use lower-level API
File extension support
You can generally omit the format="jelly"
parameter if the file ends in .jelly
– RDFLib will auto-detect the format:
Warning
Unfortunately, the way this is implemented in RDFLib is a bit wonky, so it will only work if you explicitly import pyjelly.integrations.rdflib
, or you used format="jelly"
in the serialize()
or parse()
call before.