Getting started
This guide walks you through installing and working with pyjelly and RDFLib.
Installation (with RDFLib)
Install pyjelly from PyPI:
Requirements
- Python 3.9 or newer
- Linux, macOS, or Windows
Usage with RDFLib
Once you install pyjelly, it integrates automatically with RDFLib through standard RDFLib API.
Serializing a graph
To serialize a graph to the Jelly format see:
from rdflib import Graph
g = Graph()
g.parse("http://xmlns.com/foaf/spec/index.rdf")
g.serialize(destination="foaf.jelly", format="jelly")
This creates a delimited Jelly stream using default options.
Including namespace declarations (prefixes)
By default, Jelly serializes only triples/quads. To also include namespace declarations (prefixes) in the output, enable the namespace_declarations
option.
Prefixes bound in RDFLib's namespace manager will then be written into the Jelly stream and restored on parsing.
from rdflib import Graph, Namespace, URIRef, Literal
from pyjelly.integrations.rdflib.serialize import SerializerOptions, StreamParameters
# Build a tiny graph and bind a prefix
g = Graph()
EX = Namespace("http://example.org/")
g.namespace_manager.bind("ex", EX)
g.add((EX.alice, URIRef("http://xmlns.com/foaf/0.1/name"), Literal("Alice")))
print("IN namespaces:", dict(g.namespaces()))
# Enable namespace declarations in Jelly output
options = SerializerOptions(params=StreamParameters(namespace_declarations=True))
# Serialize with options
g.serialize("sample_test.jelly", format="jelly", options=options)
# Parse back and check namespaces
g_new = Graph()
g_new.parse("sample_test.jelly", format="jelly")
print("OUT namespaces:", dict(g_new.namespaces()))
Tip
For an existing graph you can (re)bind a prefix just before saving:
Parsing a graph
To load RDF data from a .jelly
file see:
from rdflib import Graph
g = Graph()
g.parse("foaf.jelly", format="jelly")
print("Parsed triples:")
for s, p, o in g:
print(f"{s} {p} {o}")
RDFLib will reconstruct the graph from the Jelly file.
Parsing a stream of graphs
You can process a Jelly stream as a stream of graphs. A Jelly file consists of "frames" (batches of statements) – we can load each frame as a separate RDFLib graph.
In this example, we use a dataset of weather measurements. We count the number of triples in each graph:
import gzip
import urllib.request
from pyjelly.integrations.rdflib.parse import parse_jelly_grouped
# Dataset: Katrina weather measurements (10k graphs)
# Documentation: https://w3id.org/riverbench/datasets/lod-katrina/dev
url = "https://w3id.org/riverbench/datasets/lod-katrina/dev/files/jelly_10K.jelly.gz"
# Load, uncompress .gz file, and pass to Jelly parser, all in a streaming manner
with (
urllib.request.urlopen(url) as response,
gzip.open(response) as jelly_stream,
):
graphs = parse_jelly_grouped(jelly_stream)
for i, graph in enumerate(graphs):
print(f"Graph {i} in the stream has {len(graph)} triples")
# Limit to 50 graphs for demonstration -- the rest will not be parsed
if i >= 50:
break
Each iteration receives only one graph, allowing for processing large datasets efficiently, without exhausting memory.
Parsing a stream of triples
You can also process a Jelly stream as a flat stream of triples.
We look through a fragment of Denmark's OpenStreetMap to find all city names:
import gzip
import urllib.request
from pyjelly.integrations.rdflib.parse import parse_jelly_flat, Triple
from rdflib import URIRef
# Dataset: OpenStreetMap data for Denmark (first 10k objects)
# Documentation: https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev
url = (
"https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev/files/jelly_10K.jelly.gz"
)
# We are looking for city names in the dataset
predicate_to_look_for = URIRef("https://www.openstreetmap.org/wiki/Key:addr:city")
city_names = set()
with (
urllib.request.urlopen(url) as response,
gzip.open(response) as jelly_stream,
):
for event in parse_jelly_flat(jelly_stream):
if isinstance(event, Triple): # we are only interested in triples
if event.p == predicate_to_look_for:
city_names.add(event.o)
print(f"Found {len(city_names)} unique city names in the dataset.")
print("10 random city names:")
for city in list(city_names)[:10]:
print(f"- {city}")
parse_jelly_flat
returns a generator of stream events (i.e., statements parsed). This case allows you to efficiently process the file triple-by-triple and build custom aggregations from the stream.
Serializing a stream of graphs
If you have a generator object containing graphs, you can easily serialize it into the Jelly format:
from pyjelly.integrations.rdflib.serialize import grouped_stream_to_file
from rdflib import Graph, Literal, Namespace
import random
def generate_sample_graphs():
ex = Namespace("http://example.org/")
for _ in range(10):
g = Graph()
g.add((ex.sensor, ex.temperature, Literal(random.random())))
g.add((ex.sensor, ex.humidity, Literal(random.random())))
yield g
output_file_name = "output.jelly"
print(f"Streaming graphs into {output_file_name}…")
sample_graphs = generate_sample_graphs()
with open(output_file_name, "wb") as out_file:
grouped_stream_to_file(sample_graphs, out_file)
print("All done.")
This method allows for transmitting logically grouped data, preserving their original division. For more precise control over frame serialization you can use lower-level API
Serializing a stream of statements
If you have a generator object containing statements, you can easily serialize it into the Jelly format:
from pyjelly.integrations.rdflib.serialize import flat_stream_to_file
from rdflib import Literal, Namespace
import random
# example generator with triples statements
def generate_sample_triples():
ex = Namespace("http://example.org/")
for _ in range(10):
yield (ex.sensor, ex.temperature, Literal(random.random()))
output_file_name = "flat_output.jelly"
print(f"Streaming triples into {output_file_name}…")
sample_triples = generate_sample_triples()
with open(output_file_name, "wb") as out_file:
flat_stream_to_file(sample_triples, out_file)
print("All done.")
The flat method transmits the data as a continuous sequence of statements, keeping it simple and ordered. For more precise control over frame serialization you can use lower-level API
File extension support
You can generally omit the format="jelly"
parameter if the file ends in .jelly
– RDFLib will auto-detect the format:
Warning
Unfortunately, the way this is implemented in RDFLib is a bit wonky, so it will only work if you explicitly import pyjelly.integrations.rdflib
, or you used format="jelly"
in the serialize()
or parse()
call before.