API reference
pyjelly
Modules:
Name | Description |
---|---|
errors |
|
integrations |
|
jelly |
|
options |
|
parse |
|
serialize |
|
errors
Classes:
Name | Description |
---|---|
JellyConformanceError |
Raised when Jelly conformance is violated. |
JellyAssertionError |
Raised when a recommended assertion from the specification fails. |
JellyNotImplementedError |
Raised when a future feature is not yet implemented. |
JellyConformanceError
Bases: Exception
Raised when Jelly conformance is violated.
JellyAssertionError
Bases: AssertionError
Raised when a recommended assertion from the specification fails.
JellyNotImplementedError
Bases: NotImplementedError
Raised when a future feature is not yet implemented.
integrations
Modules:
Name | Description |
---|---|
generic |
|
rdflib |
|
generic
Modules:
Name | Description |
---|---|
generic_sink |
|
parse |
|
serialize |
|
generic_sink
Classes:
Name | Description |
---|---|
BlankNode |
Class for blank nodes, storing BN's identifier as a string. |
IRI |
Class for IRIs, storing IRI as a string. |
Literal |
Class for literals. |
Triple |
Class for RDF triples. |
Quad |
Class for RDF quads. |
Prefix |
Class for generic namespace declaration. |
GenericStatementSink |
|
BlankNode(identifier)
IRI(iri)
Literal(lex, langtag=None, datatype=None)
Class for literals.
Notes: Consists of: lexical form, and optional language tag and datatype. All parts of literal are stored as strings.
Source code in pyjelly/integrations/generic/generic_sink.py
Triple
Quad
Prefix
GenericStatementSink(identifier=DefaultGraph)
Notes: _store preserves the order of statements.
Args: identifier (str, optional): Identifier for a sink. Defaults to DefaultGraph.
Attributes:
Name | Type | Description |
---|---|---|
is_triples_sink |
bool
|
Check if the sink contains triples or quads. |
Source code in pyjelly/integrations/generic/generic_sink.py
is_triples_sink
Check if the sink contains triples or quads.
Returns: bool: true, if length of statement is 3.
parse
Classes:
Name | Description |
---|---|
GenericStatementSinkAdapter |
Implement Adapter for generic statements. |
GenericTriplesAdapter |
Triples adapted implementation for GenericStatementSink. |
GenericQuadsAdapter |
Extends GenericQuadsBaseAdapter for QUADS physical type. |
GenericGraphsAdapter |
Extends GenericQuadsBaseAdapter for GRAPHS physical type. |
Functions:
Name | Description |
---|---|
parse_triples_stream |
Parse flat triple stream. |
parse_quads_stream |
Parse flat quads stream. |
parse_jelly_grouped |
Take a jelly file and return generators of generic statements sinks. |
parse_jelly_to_graph |
Add statements from Generator to GenericStatementSink. |
parse_jelly_flat |
Parse jelly file with FLAT logical type into a Generator of stream events. |
GenericStatementSinkAdapter(options, parsing_mode=ParsingMode.FLAT)
Bases: Adapter
Implement Adapter for generic statements.
Notes: Returns custom RDF terms expected by GenericStatementSink, handles namespace declarations, and quoted triples.
Args: Adapter (type): base Adapter class
Source code in pyjelly/parse/decode.py
GenericTriplesAdapter(options)
Bases: GenericStatementSinkAdapter
Triples adapted implementation for GenericStatementSink.
Args: GenericStatementSinkAdapter (type): base GenericStatementSink adapter implementation that handles terms and namespaces.
Source code in pyjelly/integrations/generic/parse.py
GenericQuadsAdapter(options)
Bases: GenericQuadsBaseAdapter
Extends GenericQuadsBaseAdapter for QUADS physical type.
Args: GenericQuadsBaseAdapter (type): quads adapter that handles base quads processing.
Source code in pyjelly/integrations/generic/parse.py
GenericGraphsAdapter(options)
Bases: GenericQuadsBaseAdapter
Extends GenericQuadsBaseAdapter for GRAPHS physical type.
Notes: introduces graph start/end, checks if graph exists.
Args: GenericQuadsBaseAdapter (type): quads adapter that handles base quads processing.
Raises: JellyConformanceError: raised if graph start message was not received.
Source code in pyjelly/integrations/generic/parse.py
parse_triples_stream(frames, options)
Parse flat triple stream.
Args: frames (Iterable[jelly.RdfStreamFrame]): iterator over stream frames options (ParserOptions): stream options
Yields: Generator[Iterable[Triple | Prefix]]: Generator of iterables of Triple or Prefix objects, one iterable per frame.
Source code in pyjelly/integrations/generic/parse.py
parse_quads_stream(frames, options)
Parse flat quads stream.
Args: frames (Iterable[jelly.RdfStreamFrame]): iterator over stream frames options (ParserOptions): stream options
Yields: Generator[Iterable[Quad | Prefix]]: Generator of iterables of Quad or Prefix objects, one iterable per frame.
Source code in pyjelly/integrations/generic/parse.py
parse_jelly_grouped(inp, sink_factory=lambda: GenericStatementSink(), *, logical_type_strict=False)
Take a jelly file and return generators of generic statements sinks.
Yields one generic statements sink per frame.
Args: inp (IO[bytes]): input jelly buffered binary stream sink_factory (Callable): lambda to construct a statement sink. By default, creates an empty in-memory GenericStatementSink. logical_type_strict (bool): If True, validate the logical type in stream options and require a grouped logical type. Otherwise, only the physical type is used to route parsing.
Raises: NotImplementedError: is raised if a physical type is not implemented
Yields: Generator[GenericStatementSink]: returns generators for GenericStatementSink, regardless of stream type.
Source code in pyjelly/integrations/generic/parse.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
|
parse_jelly_to_graph(inp, sink_factory=lambda: GenericStatementSink())
Add statements from Generator to GenericStatementSink.
Args: inp (IO[bytes]): input jelly stream. sink_factory (Callable[[], GenericStatementSink]): factory to create statement sink. By default creates an empty in-memory GenericStatementSink. Has no division for datasets/graphs, utilizes the same underlying data structures.
Returns: GenericStatementSink: GenericStatementSink with statements.
Source code in pyjelly/integrations/generic/parse.py
parse_jelly_flat(inp, frames=None, options=None, *, logical_type_strict=False)
Parse jelly file with FLAT logical type into a Generator of stream events.
Args: inp (IO[bytes]): input jelly buffered binary stream. frames (Iterable[jelly.RdfStreamFrame | None): jelly frames if read before. options (ParserOptions | None): stream options if read before. logical_type_strict (bool): If True, validate the logical type in stream options and require FLAT (TRIPLES/QUADS). Otherwise, only the physical type is used to route parsing.
Raises: NotImplementedError: if physical type is not supported
Yields: Generator[Statement | Prefix]: Generator of stream events
Source code in pyjelly/integrations/generic/parse.py
serialize
Classes:
Name | Description |
---|---|
GenericSinkTermEncoder |
|
Functions:
Name | Description |
---|---|
triples_stream_frames |
Serialize a GenericStatementSink into frames using physical type triples stream. |
quads_stream_frames |
Serialize a GenericStatementSink into jelly frames using physical type quads stream. |
graphs_stream_frames |
Serialize a GenericStatementSink into jelly frames as a stream of graphs. |
split_to_graphs |
Split a generator of quads to graphs. |
guess_options |
Guess the serializer options based on the store type. |
guess_stream |
Return an appropriate stream implementation for the given options. |
grouped_stream_to_frames |
Transform multiple GenericStatementSinks into Jelly frames. |
grouped_stream_to_file |
Write stream of GenericStatementSink to a binary file. |
flat_stream_to_frames |
Serialize a stream of raw GenericStatementSink's triples or quads into Jelly frames. |
flat_stream_to_file |
Write Triple or Quad events to a binary file. |
GenericSinkTermEncoder(lookup_preset=None)
Bases: TermEncoder
Methods:
Name | Description |
---|---|
encode_spo |
Encode term based on its GenericSink object. |
encode_graph |
Encode graph term based on its GenericSink object. |
Source code in pyjelly/serialize/encode.py
encode_spo(term, slot, statement)
Encode term based on its GenericSink object.
Args: term (object): term to encode slot (Slot): its place in statement. statement (Statement): Triple/Quad/GraphStart message to fill with terms.
Returns: Rows: encoded extra rows
Source code in pyjelly/integrations/generic/serialize.py
encode_graph(term, statement)
Encode graph term based on its GenericSink object.
Args: term (object): term to encode statement (HasGraph): Quad/GraphStart message to fill g_{} in.
Returns: Rows: encoded extra rows
Source code in pyjelly/integrations/generic/serialize.py
triples_stream_frames(stream, data)
Serialize a GenericStatementSink into frames using physical type triples stream.
Args: stream (TripleStream): stream that specifies triples processing data (GenericStatementSink | Generator[Triple]): GenericStatementSink/Statements to serialize.
Yields: Generator[jelly.RdfStreamFrame]: jelly frames.
Source code in pyjelly/integrations/generic/serialize.py
quads_stream_frames(stream, data)
Serialize a GenericStatementSink into jelly frames using physical type quads stream.
Args: stream (QuadStream): stream that specifies quads processing data (GenericStatementSink | Generator[Quad]): Dataset to serialize.
Yields: Generator[jelly.RdfStreamFrame]: jelly frames
Source code in pyjelly/integrations/generic/serialize.py
graphs_stream_frames(stream, data)
Serialize a GenericStatementSink into jelly frames as a stream of graphs.
Notes: If flow of DatasetsFrameFlow type, the whole dataset will be encoded into one frame. Graphs are generated from the GenericStatementSink by iterating over statements and yielding one new GenericStatementSink per a sequence of quads with the same g term.
Args: stream (GraphStream): stream that specifies graphs processing data (GenericStatementSink | Generator[Quad]): Dataset to serialize.
Yields: Generator[jelly.RdfStreamFrame]: jelly frames
Source code in pyjelly/integrations/generic/serialize.py
split_to_graphs(data)
Split a generator of quads to graphs.
Notes: New graph is generated by iterating over statements and yielding one new GenericStatementSink per a sequence of quads with the same g term.
Args: data (Generator[Quad]): generator of quads
Yields: Generator[GenericStatementSink]: generator of GenericStatementSinks, each having triples in store and identifier set.
Source code in pyjelly/integrations/generic/serialize.py
guess_options(sink)
Guess the serializer options based on the store type.
Source code in pyjelly/integrations/generic/serialize.py
guess_stream(options, sink)
Return an appropriate stream implementation for the given options.
Notes: if base(!) logical type is GRAPHS and sink.is_triples_sink is false, initializes TripleStream
Source code in pyjelly/integrations/generic/serialize.py
grouped_stream_to_frames(sink_generator, options=None)
Transform multiple GenericStatementSinks into Jelly frames.
Notes: One frame per GenericStatementSink.
Note: options are guessed if not provided.
Args: sink_generator (Generator[GenericStatementSink]): Generator of GenericStatementSink to transform. options (SerializerOptions | None, optional): stream options to use. Options are guessed based on the sink store type. Defaults to None.
Yields: Generator[jelly.RdfStreamFrame]: produced Jelly frames
Source code in pyjelly/integrations/generic/serialize.py
grouped_stream_to_file(stream, output_file, **kwargs)
Write stream of GenericStatementSink to a binary file.
Args: stream (Generator[GenericStatementSink]): Generator of GenericStatementSink to serialize. output_file (IO[bytes]): output buffered writer. **kwargs (Any): options to pass to stream.
Source code in pyjelly/integrations/generic/serialize.py
flat_stream_to_frames(statements, options=None)
Serialize a stream of raw GenericStatementSink's triples or quads into Jelly frames.
Args: statements (Generator[Triple | Quad]): s/p/o triples or s/p/o/g quads to serialize. options (SerializerOptions | None, optional): if omitted, guessed based on the first tuple.
Yields: Generator[jelly.RdfStreamFrame]: generated frames.
Source code in pyjelly/integrations/generic/serialize.py
flat_stream_to_file(statements, output_file, options=None)
Write Triple or Quad events to a binary file.
Args: statements (Generator[Triple | Quad]): statements to serialize. output_file (IO[bytes]): output buffered writer. options (SerializerOptions | None, optional): stream options.
Source code in pyjelly/integrations/generic/serialize.py
rdflib
Modules:
Name | Description |
---|---|
parse |
|
serialize |
|
Functions:
Name | Description |
---|---|
register_extension_to_rdflib |
Make rdflib.util.guess_format discover Jelly format. |
register_extension_to_rdflib(extension='.jelly')
Make rdflib.util.guess_format discover Jelly format.
rdflib.util.guess_format("foo.jelly") register_extension_to_rdflib() rdflib.util.guess_format("foo.jelly") 'jelly'
Source code in pyjelly/integrations/rdflib/__init__.py
parse
Classes:
Name | Description |
---|---|
Triple |
Describe RDFLib triple. |
Quad |
Describe RDFLib quad. |
Prefix |
Describe RDF Prefix(i.e, namespace declaration). |
RDFLibAdapter |
RDFLib adapter class, is extended by triples and quads implementations. |
RDFLibTriplesAdapter |
Triples adapter RDFLib implementation. |
RDFLibQuadsAdapter |
Extended RDFLib adapter for the QUADS physical type. |
RDFLibGraphsAdapter |
Extension of RDFLibQuadsBaseAdapter for the GRAPHS physical type. |
RDFLibJellyParser |
|
Functions:
Name | Description |
---|---|
parse_triples_stream |
Parse flat triple stream. |
parse_quads_stream |
Parse flat quads stream. |
parse_jelly_grouped |
Take jelly file and return generators based on the detected physical type. |
parse_jelly_to_graph |
Add statements from Generator to provided Graph/Dataset. |
parse_jelly_flat |
Parse jelly file with FLAT logical type into a Generator of stream events. |
Triple
Quad
Prefix
RDFLibAdapter(options, parsing_mode=ParsingMode.FLAT)
Bases: Adapter
RDFLib adapter class, is extended by triples and quads implementations.
Args: Adapter (): abstract adapter class
Source code in pyjelly/parse/decode.py
RDFLibTriplesAdapter(options)
RDFLibQuadsAdapter(options)
Bases: RDFLibQuadsBaseAdapter
Extended RDFLib adapter for the QUADS physical type.
Args: RDFLibQuadsBaseAdapter (RDFLibAdapter): base quads adapter (shared with graphs physical type)
Source code in pyjelly/integrations/rdflib/parse.py
RDFLibGraphsAdapter(options)
Bases: RDFLibQuadsBaseAdapter
Extension of RDFLibQuadsBaseAdapter for the GRAPHS physical type.
Notes: introduces graph start/end, checks if graph exists.
Args: RDFLibQuadsBaseAdapter (RDFLibAdapter): base adapter for quads management.
Raises: JellyConformanceError: if no graph_start was encountered
Source code in pyjelly/integrations/rdflib/parse.py
RDFLibJellyParser
Bases: Parser
Methods:
Name | Description |
---|---|
parse |
Parse jelly file into provided RDFLib Graph. |
parse(source, sink)
Parse jelly file into provided RDFLib Graph.
Args: source (InputSource): jelly file as buffered binary stream InputSource obj sink (Graph): RDFLib Graph
Raises: TypeError: raises error if invalid input
Source code in pyjelly/integrations/rdflib/parse.py
parse_triples_stream(frames, options)
Parse flat triple stream.
Args: frames (Iterable[jelly.RdfStreamFrame]): iterator over stream frames options (ParserOptions): stream options
Yields: Generator[Iterable[Triple | Prefix]]: Generator of iterables of Triple or Prefix objects, one iterable per frame.
Source code in pyjelly/integrations/rdflib/parse.py
parse_quads_stream(frames, options)
Parse flat quads stream.
Args: frames (Iterable[jelly.RdfStreamFrame]): iterator over stream frames options (ParserOptions): stream options
Yields: Generator[Iterable[Quad | Prefix]]: Generator of iterables of Quad or Prefix objects, one iterable per frame.
Source code in pyjelly/integrations/rdflib/parse.py
parse_jelly_grouped(inp, graph_factory=lambda: Graph(), dataset_factory=lambda: Dataset(), *, logical_type_strict=False)
Take jelly file and return generators based on the detected physical type.
Yields one graph/dataset per frame.
Args: inp (IO[bytes]): input jelly buffered binary stream graph_factory (Callable): lambda to construct a Graph. By default creates an empty in-memory Graph, but you can pass something else here. dataset_factory (Callable): lambda to construct a Dataset. By default creates an empty in-memory Dataset, but you can pass something else here. logical_type_strict (bool): If True, validate the logical type in stream options and require a grouped logical type. Otherwise, only the physical type is used to route parsing.
Raises: NotImplementedError: is raised if a physical type is not implemented
Yields: Generator[Graph] | Generator[Dataset]: returns generators for graphs/datasets based on the type of input
Source code in pyjelly/integrations/rdflib/parse.py
312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 |
|
parse_jelly_to_graph(inp, graph_factory=lambda: Graph(), dataset_factory=lambda: Dataset())
Add statements from Generator to provided Graph/Dataset.
Args: inp (IO[bytes]): input jelly stream. graph_factory (Callable[[], Graph]): factory to create Graph. By default creates an empty in-memory Graph, but you can pass something else here. dataset_factory (Callable[[], Dataset]): factory to create Dataset. By default creates an empty in-memory Dataset, but you can pass something else here.
Returns: Dataset | Graph: Dataset or Graph with statements.
Source code in pyjelly/integrations/rdflib/parse.py
parse_jelly_flat(inp, frames=None, options=None, *, logical_type_strict=False)
Parse jelly file with FLAT logical type into a Generator of stream events.
Args: inp (IO[bytes]): input jelly buffered binary stream. frames (Iterable[jelly.RdfStreamFrame | None): jelly frames if read before. options (ParserOptions | None): stream options if read before. logical_type_strict (bool): If True, validate the logical type in stream options and require FLAT_(TRIPLES|QUADS). Otherwise, only the physical type is used to route parsing.
Raises: NotImplementedError: if physical type is not supported
Yields: Generator[Statement | Prefix]: Generator of stream events
Source code in pyjelly/integrations/rdflib/parse.py
serialize
Classes:
Name | Description |
---|---|
RDFLibTermEncoder |
|
RDFLibJellySerializer |
RDFLib serializer for writing graphs in Jelly RDF stream format. |
Functions:
Name | Description |
---|---|
triples_stream_frames |
Serialize a Graph/Dataset into jelly frames. |
quads_stream_frames |
Serialize a Dataset into jelly frames. |
graphs_stream_frames |
Serialize a Dataset into jelly frames as a stream of graphs. |
guess_options |
Guess the serializer options based on the store type. |
guess_stream |
Return an appropriate stream implementation for the given options. |
grouped_stream_to_frames |
Transform Graphs/Datasets into Jelly frames, one frame per Graph/Dataset. |
grouped_stream_to_file |
Write stream of Graphs/Datasets to a binary file. |
flat_stream_to_frames |
Serialize a stream of raw triples or quads into Jelly frames. |
flat_stream_to_file |
Write Triple or Quad events to a binary file in Jelly flat format. |
RDFLibTermEncoder(lookup_preset=None)
Bases: TermEncoder
Methods:
Name | Description |
---|---|
encode_spo |
Encode s/p/o term based on its RDFLib object. |
encode_graph |
Encode graph name term based on its RDFLib object. |
Source code in pyjelly/serialize/encode.py
encode_spo(term, slot, statement)
Encode s/p/o term based on its RDFLib object.
Args: term (object): term to encode slot (Slot): its place in statement. statement (Statement): Triple/Quad message to fill with s/p/o terms.
Returns: Rows: encoded extra rows
Source code in pyjelly/integrations/rdflib/serialize.py
encode_graph(term, statement)
Encode graph name term based on its RDFLib object.
Args: term (object): term to encode statement (HasGraph): Quad/GraphStart message to fill g_{} in.
Returns: Rows: encoded extra rows
Source code in pyjelly/integrations/rdflib/serialize.py
RDFLibJellySerializer(store)
Bases: Serializer
RDFLib serializer for writing graphs in Jelly RDF stream format.
Handles streaming RDF terms into Jelly frames using internal encoders. Supports only graphs and datasets (not quoted graphs).
Methods:
Name | Description |
---|---|
serialize |
Serialize self.store content to Jelly format. |
Source code in pyjelly/integrations/rdflib/serialize.py
serialize(out, /, *, stream=None, options=None, **unused)
Serialize self.store content to Jelly format.
Args: out (IO[bytes]): output buffered writer stream (Stream | None, optional): Jelly stream object. Defaults to None. options (SerializerOptions | None, optional): Serializer options if defined beforehand, e.g., read from a separate file. Defaults to None. **unused(Any): unused args for RDFLib serialize
Source code in pyjelly/integrations/rdflib/serialize.py
triples_stream_frames(stream, data)
Serialize a Graph/Dataset into jelly frames.
Args: stream (TripleStream): stream that specifies triples processing data (Graph | Dataset | Generator[Triple]): Graph/Dataset/Statements to serialize.
Notes: if Dataset is given, its graphs are unpacked and iterated over if flow is GraphsFrameFlow, emits a frame per graph.
Yields: Generator[jelly.RdfStreamFrame]: jelly frames.
Source code in pyjelly/integrations/rdflib/serialize.py
quads_stream_frames(stream, data)
Serialize a Dataset into jelly frames.
Notes: Emits one frame per dataset if flow is of DatasetsFrameFlow.
Args: stream (QuadStream): stream that specifies quads processing data (Dataset | Generator[Quad]): Dataset to serialize.
Yields: Generator[jelly.RdfStreamFrame]: jelly frames
Source code in pyjelly/integrations/rdflib/serialize.py
graphs_stream_frames(stream, data)
Serialize a Dataset into jelly frames as a stream of graphs.
Notes: If flow of DatasetsFrameFlow type, the whole dataset will be encoded into one frame.
Args: stream (GraphStream): stream that specifies graphs processing data (Dataset | Generator[Quad]): Dataset to serialize.
Yields: Generator[jelly.RdfStreamFrame]: jelly frames
Source code in pyjelly/integrations/rdflib/serialize.py
guess_options(sink)
Guess the serializer options based on the store type.
guess_options(Graph()).logical_type 1 guess_options(Dataset()).logical_type 2
Source code in pyjelly/integrations/rdflib/serialize.py
guess_stream(options, sink)
Return an appropriate stream implementation for the given options.
Notes: if base(!) logical type is GRAPHS and Dataset is given, initializes TripleStream
graph_ser = RDFLibJellySerializer(Graph()) ds_ser = RDFLibJellySerializer(Dataset())
type(guess_stream(guess_options(graph_ser.store), graph_ser.store))
type(guess_stream(guess_options(ds_ser.store), ds_ser.store))
Source code in pyjelly/integrations/rdflib/serialize.py
grouped_stream_to_frames(sink_generator, options=None)
Transform Graphs/Datasets into Jelly frames, one frame per Graph/Dataset.
Note: options are guessed if not provided.
Args: sink_generator (Generator[Graph] | Generator[Dataset]): Generator of Graphs/Dataset to transform. options (SerializerOptions | None, optional): stream options to use. Options are guessed based on the sink store type. Defaults to None.
Yields: Generator[jelly.RdfStreamFrame]: produced Jelly frames
Source code in pyjelly/integrations/rdflib/serialize.py
grouped_stream_to_file(stream, output_file, **kwargs)
Write stream of Graphs/Datasets to a binary file.
Args: stream (Generator[Graph] | Generator[Dataset]): Generator of Graphs/Dataset to transform. output_file (IO[bytes]): output buffered writer. **kwargs (Any): options to pass to stream.
Source code in pyjelly/integrations/rdflib/serialize.py
flat_stream_to_frames(statements, options=None)
Serialize a stream of raw triples or quads into Jelly frames.
Args: statements (Generator[Triple | Quad]): s/p/o triples or s/p/o/g quads to serialize. options (SerializerOptions | None, optional): if omitted, guessed based on the first tuple.
Yields: Generator[jelly.RdfStreamFrame]: generated frames.
Source code in pyjelly/integrations/rdflib/serialize.py
flat_stream_to_file(statements, output_file, options=None)
Write Triple or Quad events to a binary file in Jelly flat format.
Args: statements (Generator[Triple | Quad]): statements to serialize. output_file (IO[bytes]): output buffered writer. options (SerializerOptions | None, optional): stream options.
Source code in pyjelly/integrations/rdflib/serialize.py
jelly
Modules:
Name | Description |
---|---|
rdf_pb2 |
Generated protocol buffer code. |
rdf_pb2
Generated protocol buffer code.
options
Classes:
Name | Description |
---|---|
StreamTypes |
|
Functions:
Name | Description |
---|---|
register_mimetypes |
Associate files that have Jelly extension with Jelly MIME types. |
Attributes:
Name | Type | Description |
---|---|---|
INTEGRATION_SIDE_EFFECTS |
bool
|
Whether to allow integration module imports to trigger side effects. |
INTEGRATION_SIDE_EFFECTS = True
Whether to allow integration module imports to trigger side effects.
These side effects are cheap and may include populating some registries for guessing the defaults for external integrations that work with Jelly.
StreamTypes(physical_type=jelly.PHYSICAL_STREAM_TYPE_UNSPECIFIED, logical_type=jelly.LOGICAL_STREAM_TYPE_UNSPECIFIED)
Methods:
Name | Description |
---|---|
__repr__ |
Return the representation of StreamTypes. |
__repr__()
Return the representation of StreamTypes.
repr(StreamTypes(9999, 8888)) 'StreamTypes(9999, 8888)'
Source code in pyjelly/options.py
register_mimetypes(extension='.jelly')
Associate files that have Jelly extension with Jelly MIME types.
register_mimetypes() mimetypes.guess_type("out.jelly") ('application/x-jelly-rdf', None)
Source code in pyjelly/options.py
parse
Modules:
Name | Description |
---|---|
decode |
|
ioutils |
|
lookup |
|
decode
Classes:
Name | Description |
---|---|
ParsingMode |
Specifies how jelly frames should be treated. |
Decoder |
|
Functions:
Name | Description |
---|---|
options_from_frame |
Fill stream options based on the options row. |
ParsingMode
Bases: Enum
Specifies how jelly frames should be treated.
Modes: FLAT Yield all frames as one Graph or Dataset. GROUPED Yield one Graph/Dataset per frame (grouped parsing).
Decoder(adapter)
Initializes decoder with a lookup tables with preset sizes, integration-dependent adapter and empty repeated terms dictionary.
Args: adapter (Adapter): integration-dependent adapter that specifies terms conversion to specific objects, framing, namespace declarations, and graphs/datasets forming.
Methods:
Name | Description |
---|---|
iter_rows |
Iterate through rows in the frame. |
decode_row |
Decode a row based on its type. |
ingest_prefix_entry |
Update prefix lookup table based on the table entry. |
ingest_name_entry |
Update name lookup table based on the table entry. |
ingest_datatype_entry |
Update datatype lookup table based on the table entry. |
decode_term |
Decode a term based on its type: IRI/literal/BN/default graph. |
decode_iri |
Decode RdfIri message to IRI using a custom adapter. |
decode_bnode |
Decode string message to blank node (BN) using a custom adapter. |
decode_literal |
Decode RdfLiteral to literal based on custom adapter implementation. |
decode_statement |
Decode a triple/quad message. |
Source code in pyjelly/parse/decode.py
iter_rows(frame)
Iterate through rows in the frame.
Args: frame (jelly.RdfStreamFrame): jelly frame Yields: Iterator[Any]: decoded rows
Source code in pyjelly/parse/decode.py
decode_row(row)
Decode a row based on its type.
Notes: uses custom adapters to decode triples/quads, namespace declarations, graph start/end.
Args: row (Any): protobuf row message
Raises: TypeError: raises error if this type of protobuf message does not have a respective handler
Returns: Any | None: decoded row - result from calling decode_row (row type appropriate handler)
Source code in pyjelly/parse/decode.py
ingest_prefix_entry(entry)
Update prefix lookup table based on the table entry.
Args: entry (jelly.RdfPrefixEntry): prefix message, containing id and value
Source code in pyjelly/parse/decode.py
ingest_name_entry(entry)
Update name lookup table based on the table entry.
Args: entry (jelly.RdfNameEntry): name message, containing id and value
Source code in pyjelly/parse/decode.py
ingest_datatype_entry(entry)
Update datatype lookup table based on the table entry.
Args: entry (jelly.RdfDatatypeEntry): name message, containing id and value
Source code in pyjelly/parse/decode.py
decode_term(term)
Decode a term based on its type: IRI/literal/BN/default graph.
Notes: requires a custom adapter with implemented methods for terms decoding.
Args: term (Any): IRI/literal/BN(string)/Default graph message
Raises: TypeError: raises error if no handler for the term is found
Returns: Any: decoded term (currently, rdflib objects, e.g., rdflib.term.URIRef)
Source code in pyjelly/parse/decode.py
decode_iri(iri)
Decode RdfIri message to IRI using a custom adapter.
Args: iri (jelly.RdfIri): RdfIri message
Returns: Any: IRI, based on adapter implementation, e.g., rdflib.term.URIRef
Source code in pyjelly/parse/decode.py
decode_bnode(bnode)
Decode string message to blank node (BN) using a custom adapter.
Args: bnode (str): blank node id
Returns: Any: blank node object from the custom adapter
Source code in pyjelly/parse/decode.py
decode_literal(literal)
Decode RdfLiteral to literal based on custom adapter implementation.
Notes: checks for langtag existence; for datatype checks for non-zero table size and datatype field presence
Args: literal (jelly.RdfLiteral): RdfLiteral message
Returns: Any: literal returned by the custom adapter
Source code in pyjelly/parse/decode.py
decode_statement(statement, oneofs)
Decode a triple/quad message.
Notes: also updates repeated terms dictionary
Args: statement (jelly.RdfTriple | jelly.RdfQuad): triple/quad message oneofs (Sequence[str]): terms s/p/o/g(if quads)
Raises: ValueError: if a missing repeated term is encountered
Returns: Any: a list of decoded terms
Source code in pyjelly/parse/decode.py
options_from_frame(frame, *, delimited)
Fill stream options based on the options row.
Notes: generalized_statements, rdf_star, and namespace declarations are set to false by default
Args: frame (jelly.RdfStreamFrame): first non-empty frame from the stream delimited (bool): derived delimited flag
Returns: ParserOptions: filled options with types/lookups/stream parameters information
Source code in pyjelly/parse/decode.py
ioutils
Functions:
Name | Description |
---|---|
delimited_jelly_hint |
Detect whether a Jelly file is delimited from its first 3 bytes. |
get_options_and_frames |
Return stream options and frames from the buffered binary stream. |
delimited_jelly_hint(header)
Detect whether a Jelly file is delimited from its first 3 bytes.
Truth table (notation: 0A
= 0x0A
, NN
= not 0x0A
, ??
= don't care):
Byte 1 | Byte 2 | Byte 3 | Result |
---|---|---|---|
NN |
?? |
?? |
Delimited |
0A |
NN |
?? |
Non-delimited |
0A |
0A |
NN |
Delimited (size = 10) |
0A |
0A |
0A |
Non-delimited (stream options size = 10) |
delimited_jelly_hint(bytes([0x00, 0x00, 0x00])) True
delimited_jelly_hint(bytes([0x00, 0x00, 0x0A])) True
delimited_jelly_hint(bytes([0x00, 0x0A, 0x00])) True
delimited_jelly_hint(bytes([0x00, 0x0A, 0x0A])) True
delimited_jelly_hint(bytes([0x0A, 0x00, 0x00])) False
delimited_jelly_hint(bytes([0x0A, 0x00, 0x0A])) False
delimited_jelly_hint(bytes([0x0A, 0x0A, 0x00])) True
delimited_jelly_hint(bytes([0x0A, 0x0A, 0x0A])) False
Source code in pyjelly/parse/ioutils.py
get_options_and_frames(inp)
Return stream options and frames from the buffered binary stream.
Args: inp (IO[bytes]): jelly buffered binary stream
Raises: JellyConformanceError: if no non-empty frames detected in the delimited stream JellyConformanceError: if non-delimited, error is raised if no rows are detected (empty frame)
Returns: tuple[ParserOptions, Iterator[jelly.RdfStreamFrame]]: ParserOptions holds: stream types, lookup presets and other stream options
Source code in pyjelly/parse/ioutils.py
lookup
Classes:
Name | Description |
---|---|
LookupDecoder |
Shared base for RDF lookup encoders using Jelly compression. |
LookupDecoder(*, lookup_size)
Shared base for RDF lookup encoders using Jelly compression.
Tracks the last assigned and last reused index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lookup_size
|
int
|
Maximum lookup size. |
required |
Source code in pyjelly/parse/lookup.py
serialize
Modules:
Name | Description |
---|---|
encode |
|
flows |
|
lookup |
|
streams |
|
encode
Classes:
Name | Description |
---|---|
TermEncoder |
|
Functions:
Name | Description |
---|---|
split_iri |
Split iri into prefix and name. |
encode_spo |
Encode the s/p/o of a statement. |
encode_triple |
Encode one triple. |
encode_quad |
Encode one quad. |
encode_namespace_declaration |
Encode namespace declaration. |
encode_options |
Encode stream options to ProtoBuf message. |
TermEncoder(lookup_preset=None)
Methods:
Name | Description |
---|---|
encode_iri_indices |
Encode lookup indices for IRI. |
encode_iri |
Encode iri. |
encode_default_graph |
Encode default graph. |
encode_literal |
Encode literal. |
encode_quoted_triple |
Encode a quoted triple. |
get_iri_field |
Get IRI field directly based on slot. |
get_literal_field |
Get literal field directly based on slot. |
set_bnode_field |
Set bnode field directly based on slot. |
get_triple_field |
Get triple term field directly based on slot. |
Source code in pyjelly/serialize/encode.py
encode_iri_indices(iri_string)
Encode lookup indices for IRI.
Args: iri_string (str): full iri in string format.
Returns: tuple[Rows, int, int]: additional rows (if any) and indices in prefix and name tables.
Source code in pyjelly/serialize/encode.py
encode_iri(iri_string, iri)
Encode iri.
Args: iri_string (str): full iri in string format. iri (jelly.RdfIri): iri to fill
Returns: Rows: extra rows for prefix and name tables, if any.
Source code in pyjelly/serialize/encode.py
encode_default_graph(g_default_graph)
Encode default graph.
Returns: Rows: empty extra rows (for API consistency)
Source code in pyjelly/serialize/encode.py
encode_literal(*, lex, language=None, datatype=None, literal)
Encode literal.
Args: lex (str): lexical form/literal value language (str | None, optional): langtag. Defaults to None. datatype (str | None, optional): data type if it is a typed literal. Defaults to None. literal (jelly.RdfLiteral): literal to fill.
Raises: JellyConformanceError: if datatype specified while datatable is not used.
Returns: Rows: extra rows (i.e., datatype entries).
Source code in pyjelly/serialize/encode.py
encode_quoted_triple(terms, quoted_statement)
Encode a quoted triple.
Notes: Although a triple, it is treated as a part of a statement. Repeated terms are not used when encoding quoted triples.
Args: terms (Iterable[object]): triple terms to encode. quoted_statement (jelly.RdfTriple): quoted triple to fill.
Returns: Rows: additional stream rows with preceeding information (prefixes, names, datatypes rows, if any).
Source code in pyjelly/serialize/encode.py
get_iri_field(statement, slot)
Get IRI field directly based on slot.
Source code in pyjelly/serialize/encode.py
get_literal_field(statement, slot)
Get literal field directly based on slot.
Source code in pyjelly/serialize/encode.py
set_bnode_field(statement, slot, identifier)
Set bnode field directly based on slot.
Source code in pyjelly/serialize/encode.py
get_triple_field(statement, slot)
Get triple term field directly based on slot.
Source code in pyjelly/serialize/encode.py
split_iri(iri_string)
Split iri into prefix and name.
Args: iri_string (str): full iri string.
Returns: tuple[str, str]: iri's prefix and name.
Source code in pyjelly/serialize/encode.py
encode_spo(terms, term_encoder, repeated_terms, statement)
Encode the s/p/o of a statement.
Args: terms (Iterator[object]): iterator for original terms to encode term_encoder (TermEncoder): encoder with lookup tables repeated_terms (list[object | None): list of repeated terms. statement (Statement): Triple/Quad to fill.
Returns: list[jelly.RdfStreamRow] extra rows to append.
Source code in pyjelly/serialize/encode.py
encode_triple(terms, term_encoder, repeated_terms)
Encode one triple.
Args: terms (Iterable[object]): original terms to encode term_encoder (TermEncoder): current encoder with lookup tables repeated_terms (list[object | None]): list of repeated terms.
Returns: list[jelly.RdfStreamRow]: list of rows to add to the current flow.
Source code in pyjelly/serialize/encode.py
encode_quad(terms, term_encoder, repeated_terms)
Encode one quad.
Args: terms (Iterable[object]): original terms to encode term_encoder (TermEncoder): current encoder with lookup tables repeated_terms (list[object | None]): list of repeated terms.
Returns: list[jelly.RdfStreamRow]: list of messages to append to current flow.
Source code in pyjelly/serialize/encode.py
encode_namespace_declaration(name, value, term_encoder)
Encode namespace declaration.
Args: name (str): namespace prefix label value (str): namespace iri term_encoder (TermEncoder): current encoder
Returns: list[jelly.RdfStreamRow]: list of messages to append to current flow.
Source code in pyjelly/serialize/encode.py
encode_options(lookup_preset, stream_types, params)
Encode stream options to ProtoBuf message.
Args: lookup_preset (options.LookupPreset): lookup tables options stream_types (options.StreamTypes): physical and logical types params (options.StreamParameters): other params.
Returns: jelly.RdfStreamRow: encoded stream options row
Source code in pyjelly/serialize/encode.py
flows
Classes:
Name | Description |
---|---|
FrameFlow |
Abstract base class for producing Jelly frames from RDF stream rows. |
ManualFrameFlow |
Produces frames only when manually requested (never automatically). |
BoundedFrameFlow |
Produce frames automatically when a fixed number of rows is reached. |
GraphsFrameFlow |
|
DatasetsFrameFlow |
|
Functions:
Name | Description |
---|---|
flow_for_type |
Return flow based on logical type requested. |
FrameFlow(initlist=None, *, logical_type=None, **__kwargs)
Bases: UserList[RdfStreamRow]
Abstract base class for producing Jelly frames from RDF stream rows.
Collects stream rows and assembles them into RdfStreamFrame objects when ready.
Allows for passing LogicalStreamType, required for logical subtypes and non-delimited streams.
Methods:
Name | Description |
---|---|
frame_from_graph |
Treat the current rows as a graph and produce a frame. |
frame_from_dataset |
Treat the current rows as a dataset and produce a frame. |
to_stream_frame |
Create stream frame from flow content. |
Source code in pyjelly/serialize/flows.py
frame_from_graph()
Treat the current rows as a graph and produce a frame.
Default implementation returns None.
frame_from_dataset()
Treat the current rows as a dataset and produce a frame.
Default implementation returns None.
to_stream_frame()
Create stream frame from flow content.
Notes: Clears flow content after creating the frame.
Returns: jelly.RdfStreamFrame | None: stream frame
Source code in pyjelly/serialize/flows.py
ManualFrameFlow(initlist=None, *, logical_type=None, **__kwargs)
Bases: FrameFlow
Produces frames only when manually requested (never automatically).
Warning
All stream rows are kept in memory until to_stream_frame()
is called.
This may lead to high memory usage for large streams.
Used for non-delimited serialization.
Source code in pyjelly/serialize/flows.py
BoundedFrameFlow(initlist=None, logical_type=None, *, frame_size=None)
Bases: FrameFlow
Produce frames automatically when a fixed number of rows is reached.
Used for delimited encoding (default mode).
Methods:
Name | Description |
---|---|
frame_from_bounds |
Emit frame from flow if full. |
Source code in pyjelly/serialize/flows.py
frame_from_bounds()
Emit frame from flow if full.
Returns: jelly.RdfStreamFrame | None: stream frame
Source code in pyjelly/serialize/flows.py
GraphsFrameFlow(initlist=None, *, logical_type=None, **__kwargs)
Bases: FrameFlow
Methods:
Name | Description |
---|---|
frame_from_graph |
Emit current flow content (one graph) as jelly frame. |
Source code in pyjelly/serialize/flows.py
frame_from_graph()
Emit current flow content (one graph) as jelly frame.
Returns: jelly.RdfStreamFrame | None: jelly frame or none if flow is empty.
Source code in pyjelly/serialize/flows.py
DatasetsFrameFlow(initlist=None, *, logical_type=None, **__kwargs)
Bases: FrameFlow
Methods:
Name | Description |
---|---|
frame_from_dataset |
Emit current flow content (dataset) as jelly frame. |
Source code in pyjelly/serialize/flows.py
frame_from_dataset()
Emit current flow content (dataset) as jelly frame.
Returns: jelly.RdfStreamFrame | None: jelly frame or none if flow is empty.
Source code in pyjelly/serialize/flows.py
flow_for_type(logical_type)
Return flow based on logical type requested.
Note: uses base logical type for subtypes (i.e., SUBJECT_GRAPHS uses the same flow as its base type GRAPHS).
Args: logical_type (jelly.LogicalStreamType): logical type requested.
Raises: NotImplementedError: if (base) logical stream type is not supported.
Returns: type[FrameFlow]: FrameFlow for respective logical type.
Source code in pyjelly/serialize/flows.py
lookup
Classes:
Name | Description |
---|---|
Lookup |
Fixed-size 1-based string-to-index mapping with LRU eviction. |
LookupEncoder |
Shared base for RDF lookup encoders using Jelly compression. |
Lookup(max_size)
Fixed-size 1-based string-to-index mapping with LRU eviction.
- Assigns incrementing indices starting from 1.
- After reaching the maximum size, reuses the existing indices from evicting the least-recently-used entries.
- Index 0 is reserved for delta encoding in Jelly streams.
To check if a key exists, use .move(key)
and catch KeyError
.
If KeyError
is raised, the key can be inserted with .insert(key)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_size
|
int
|
Maximum number of entries. Zero disables lookup. |
required |
Source code in pyjelly/serialize/lookup.py
LookupEncoder(*, lookup_size)
Shared base for RDF lookup encoders using Jelly compression.
Tracks the last assigned and last reused index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lookup_size
|
int
|
Maximum lookup size. |
required |
Methods:
Name | Description |
---|---|
encode_entry_index |
Get or assign the index to use in an entry. |
Source code in pyjelly/serialize/lookup.py
encode_entry_index(key)
Get or assign the index to use in an entry.
Returns:
Type | Description |
---|---|
int or None
|
|
If the return value is None, the entry is already in the lookup and does not
|
|
need to be emitted. Any integer value (including 0) means the entry is new
|
|
and should be emitted.
|
|
Source code in pyjelly/serialize/lookup.py
streams
Classes:
Name | Description |
---|---|
Stream |
|
TripleStream |
|
QuadStream |
|
GraphStream |
|
Functions:
Name | Description |
---|---|
stream_for_type |
Give a Stream based on physical type specified. |
Stream(*, encoder, options=None)
Methods:
Name | Description |
---|---|
infer_flow |
Return flow based on the stream options provided. |
enroll |
Initialize start of the stream. |
stream_options |
Encode and append stream options row to the current flow. |
namespace_declaration |
Add namespace declaration to jelly stream. |
for_rdflib |
Initialize stream with RDFLib encoder. |
Source code in pyjelly/serialize/streams.py
infer_flow()
Return flow based on the stream options provided.
Returns: FrameFlow: initialised FrameFlow object.
Source code in pyjelly/serialize/streams.py
enroll()
stream_options()
Encode and append stream options row to the current flow.
Source code in pyjelly/serialize/streams.py
namespace_declaration(name, iri)
Add namespace declaration to jelly stream.
Args: name (str): namespace prefix label iri (str): namespace iri
Source code in pyjelly/serialize/streams.py
for_rdflib(options=None)
Initialize stream with RDFLib encoder.
Args: options (SerializerOptions | None, optional): Stream options. Defaults to None.
Raises: TypeError: if Stream is passed, and not a Stream for specific physical type.
Returns: Stream: initialized stream with RDFLib encoder.
Source code in pyjelly/serialize/streams.py
TripleStream(*, encoder, options=None)
Bases: Stream
Methods:
Name | Description |
---|---|
triple |
Process one triple to Protobuf messages. |
Source code in pyjelly/serialize/streams.py
triple(terms)
Process one triple to Protobuf messages.
Note: Adds new rows to the current flow and returns StreamFrame if frame size conditions are met.
Args: terms (Iterable[object]): RDF terms to encode.
Returns: jelly.RdfStreamFrame | None: stream frame if flow supports frames slicing and current flow is full
Source code in pyjelly/serialize/streams.py
QuadStream(*, encoder, options=None)
Bases: Stream
Methods:
Name | Description |
---|---|
quad |
Process one quad to Protobuf messages. |
Source code in pyjelly/serialize/streams.py
quad(terms)
Process one quad to Protobuf messages.
Args: terms (Iterable[object]): terms to encode.
Returns: jelly.RdfStreamFrame | None: stream frame if flow supports frames slicing and current flow is full
Source code in pyjelly/serialize/streams.py
GraphStream(*, encoder, options=None)
Bases: TripleStream
Methods:
Name | Description |
---|---|
graph |
Process one graph into a sequence of jelly frames. |
Source code in pyjelly/serialize/streams.py
graph(graph_id, graph)
Process one graph into a sequence of jelly frames.
Args: graph_id (object): graph id (BN, Literal, iri, default) graph (Iterable[Iterable[object]]): iterable of triples (graph's content)
Yields: Generator[jelly.RdfStreamFrame]: jelly frames.
Source code in pyjelly/serialize/streams.py
stream_for_type(physical_type)
Give a Stream based on physical type specified.
Args: physical_type (jelly.PhysicalStreamType): jelly stream physical type.
Raises: NotImplementedError: if no stream for requested physical type is available.
Returns: type[Stream]: jelly stream