Getting started
This guide walks you through installing and working with pyjelly and RDFLib.
Installation (with RDFLib)
Install pyjelly from PyPI:
Requirements
- Python 3.9 or newer
- Linux, macOS, or Windows
Usage with RDFLib
Once you install pyjelly, it integrates automatically with RDFLib through standard RDFLib API.
Serializing a graph
To serialize a graph to the Jelly format see:
from rdflib import Graph
g = Graph()
g.parse("http://xmlns.com/foaf/spec/index.rdf")
g.serialize(destination="foaf.jelly", format="jelly")
This creates a delimited Jelly stream using default options.
Parsing a graph
To load RDF data from a .jelly
file see:
from rdflib import Graph
g = Graph()
g.parse("foaf.jelly", format="jelly")
print("Parsed triples:")
for s, p, o in g:
print(f"{s} {p} {o}")
RDFLib will reconstruct the graph from the Jelly file.
Parsing a stream of graphs
You can process a Jelly stream as a stream of graphs. A Jelly file consists of "frames" (batches of statements) – we can load each frame as a separate RDFLib graph.
In this example, we use a dataset of weather measurements. We count the number of triples in each graph:
import gzip
import urllib.request
from pyjelly.integrations.rdflib.parse import parse_jelly_grouped
# Dataset: Katrina weather measurements (10k graphs)
# Documentation: https://w3id.org/riverbench/datasets/lod-katrina/dev
url = "https://w3id.org/riverbench/datasets/lod-katrina/dev/files/jelly_10K.jelly.gz"
# Load, uncompress .gz file, and pass to Jelly parser, all in a streaming manner
with (
urllib.request.urlopen(url) as response,
gzip.open(response) as jelly_stream,
):
graphs = parse_jelly_grouped(jelly_stream)
for i, graph in enumerate(graphs):
print(f"Graph {i} in the stream has {len(graph)} triples")
# Limit to 50 graphs for demonstration -- the rest will not be parsed
if i >= 50:
break
Each iteration receives only one graph, allowing for processing large datasets efficiently, without exhausting memory.
Parsing a stream of triples
You can also process a Jelly stream as a flat stream of triples.
We look through a fragment of Denmark's OpenStreetMap to find all city names:
import gzip
import urllib.request
from pyjelly.integrations.rdflib.parse import parse_jelly_flat, Triple
from rdflib import URIRef
# Dataset: OpenStreetMap data for Denmark (first 10k objects)
# Documentation: https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev
url = (
"https://w3id.org/riverbench/datasets/osm2rdf-denmark/dev/files/jelly_10K.jelly.gz"
)
# We are looking for city names in the dataset
predicate_to_look_for = URIRef("https://www.openstreetmap.org/wiki/Key:addr:city")
city_names = set()
with (
urllib.request.urlopen(url) as response,
gzip.open(response) as jelly_stream,
):
for event in parse_jelly_flat(jelly_stream):
if isinstance(event, Triple): # we are only interested in triples
if event.p == predicate_to_look_for:
city_names.add(event.o)
print(f"Found {len(city_names)} unique city names in the dataset.")
print("10 random city names:")
for city in list(city_names)[:10]:
print(f"- {city}")
parse_jelly_flat
returns a generator of stream events (i.e., statements parsed). This case allows you to efficiently process the file triple-by-triple and build custom aggregations from the stream.
Serializing a stream of graphs
If you have a generator object containing graphs, you can easily serialize it into the Jelly format:
from pyjelly.integrations.rdflib.serialize import grouped_stream_to_file
from rdflib import Graph, Literal, Namespace
import random
def generate_sample_graphs():
ex = Namespace("http://example.org/")
for _ in range(10):
g = Graph()
g.add((ex.sensor, ex.temperature, Literal(random.random())))
g.add((ex.sensor, ex.humidity, Literal(random.random())))
yield g
output_file_name = "output.jelly"
print(f"Streaming graphs into {output_file_name}…")
sample_graphs = generate_sample_graphs()
with open(output_file_name, "wb") as out_file:
grouped_stream_to_file(sample_graphs, out_file)
print("All done.")
This method allows for transmitting logically grouped data, preserving their original division. For more precise control over frame serialization you can use lower-level API
Serializing a stream of statements
If you have a generator object containing statements, you can easily serialize it into the Jelly format:
from pyjelly.integrations.rdflib.serialize import flat_stream_to_file
from rdflib import Literal, Namespace
import random
# example generator with triples statements
def generate_sample_triples():
ex = Namespace("http://example.org/")
for _ in range(10):
yield (ex.sensor, ex.temperature, Literal(random.random()))
output_file_name = "flat_output.jelly"
print(f"Streaming triples into {output_file_name}…")
sample_triples = generate_sample_triples()
with open(output_file_name, "wb") as out_file:
flat_stream_to_file(sample_triples, out_file)
print("All done.")
The flat method transmits the data as a continuous sequence of statements, keeping it simple and ordered. For more precise control over frame serialization you can use lower-level API
File extension support
You can generally omit the format="jelly"
parameter if the file ends in .jelly
– RDFLib will auto-detect the format:
Warning
Unfortunately, the way this is implemented in RDFLib is a bit wonky, so it will only work if you explicitly import pyjelly.integrations.rdflib
, or you used format="jelly"
in the serialize()
or parse()
call before.