Sophia

A Rust toolkit for RDF and Linked Data.

It comprises the following crates:

Core crates:
- sophia_api defines a generic API for RDF and linked data, as a set of core traits and types; more precisely, it provides traits for describing
  - terms, triples and quads,
  - graphs and datasets,
  - parsers and serializers
- sophia_iri provides functions, types and traits for validating and resolving IRIs.
- sophia_term defines various implementations of the Term trait from sophia_api.
- sophia_inmem defines in-memory implementations of the Graph and Dataset traits from sophia_api.
Parsers and serializers
- sophia_turtle provides parsers and serializers for the Turtle-family of concrete syntaxes.
- sophia_jsonld provides parsers and serializers for JSON-LD.
- sophia_xml provides parsers and serializers for RDF/XML.
- sophia_rio is a lower-level crate, used by the ones above.
Other
- sophia_c14n implements RDF canonicalization.
- sophia_isomorphism provides functions to determine if two graphs or datasets are isomorphic.
- sophia_sparql_client provides a client for the SPARQL 1.1 Protocol.
- sophia_resource provides a resource-centric API.
All-inclusive
- sophia re-exports symbols from all the crates above, with the following provisio:
  - sophia_jsonld is only available with the jsonld feature
  - sophia_sparql_client is only available with the http_client feature
  - sophia_xml is only available with the xml feature

In addition to the API documentation, a high-level user documentation is available (although not quite complete yet).

Citation

When using Sophia, please use the following citation:

Champin, P.-A. (2020) ‘Sophia: A Linked Data and Semantic Web toolkit for Rust’, in Wilde, E. and Amundsen, M. (eds). The Web Conference 2020: Developers Track, Taipei, TW. Available at: https://www2020devtrack.github.io/site/schedule.

Bibtex:

@misc{champin_sophia_2020,
        title = {{Sophia: A Linked Data and Semantic Web toolkit for Rust},
        author = {Champin, Pierre-Antoine},
        howpublished = {{The Web Conference 2020: Developers Track}},
        address = {Taipei, TW},
        editor = {Wilde, Erik and Amundsen, Mike},
        month = apr,
        year = {2020},
        language = {en},
        url = {https://www2020devtrack.github.io/site/schedule}
}

Third-party crates

The following third-party crates are using or extending Sophia

hdt provides an implementation of Sophia's traits based on the HDT format.
manas is a modular framework for implementing Solid compatible servers
nanopub is a toolkit for managing nanopublications

History

An outdated comparison of Sophia with other RDF libraries is still available here.

Introduction

The sophia crate aims at providing a comprehensive toolkit for working with RDF and Linked Data in Rust.

RDF is a data model designed to exchange knowledge on the Web in an interoperable way. Each piece of knowledge in RDF (a statement) is represented by a triple, made of three terms. A set of triples forms an RDF graph. Finally, several graphs can be grouped in a collection called a dataset, where each graph is identified by a unique name.

In Sophia, each of these core concepts is modeled by a trait, which can be implemented in multiple ways (see for example the Graph trait and some of the types implementing it). Sophia is therefore not meant to provide the "ultimate" implementation of RDF in Rust, but a generic framework to help various implementations to interoperate with each other (in the spirit of Apache Commons RDF for Java or [RDFJS] for Javascript/Typescript).

Generalized vs. Strict RDF model

The data model supported by this Sophia is in fact a superset of the RDF data model as defined by the W3C. When the distinction matters, they will be called, respectively, the generalized RDF model, and the strict RDF model. The generalized RDF model extends RDF as follows:

In addition to standard RDF terms (IRIs, blank nodes and literals), Sophia supports
- RDF-star quoted triples
- Variables (a concept borrowed from [SPARQL] or [Notation3])
Sophia allows any kind of term in any position (subject, predicate, object, graph name).
Sophia allow IRIs to be relative IRI references (while in strict RDF, IRIs must be absolute).

Getting Started

Below is a short example demonstrating how to build a graph, mutate it and serialize it back.

Add the sophia crate to your dependencies in Cargo.toml

[dependencies]
sophia = "0.8.0"

Add these lines of code and run the program.

use sophia::api::prelude::*;
use sophia::api::ns::Namespace;
use sophia::inmem::graph::LightGraph;
use sophia::turtle::parser::turtle;
use sophia::turtle::serializer::nt::NtSerializer;

// loading a graph
let example = r#"
    @prefix : <http://example.org/>.
    @prefix foaf: <http://xmlns.com/foaf/0.1/>.
    :alice foaf:name "Alice";
           foaf:mbox <mailto:alice@work.example> .
    :bob foaf:name "Bob".
"#;
let mut graph: LightGraph = turtle::parse_str(example).collect_triples()?;

// mutating the graph
let ex = Namespace::new("http://example.org/")?;
let foaf = Namespace::new("http://xmlns.com/foaf/0.1/")?;
graph.insert(
    ex.get("bob")?,
    foaf.get("knows")?,
    ex.get("alice")?,
)?;

// serializing the graph
let mut nt_stringifier = NtSerializer::new_stringifier();
let example2 = nt_stringifier.serialize_graph(&graph)?.as_str();
println!("The resulting graph:\n{}", example2);
Ok::<(), Box<dyn std::error::Error>>(())

You should get the following output:

The resulting graph:
<http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Alice".
<http://example.org/alice> <http://xmlns.com/foaf/0.1/mbox> <mailto:alice@work.example>.
<http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Bob".
<http://example.org/bob> <http://xmlns.com/foaf/0.1/knows> <http://example.org/alice>.

RDF Terms

The Term trait defines how you interact with RDF terms in Sophia.

Using terms

The first thing you usually need to know about a term is its kind (IRI, Literal...). The kind is described by the TermKind enum, and available from the Term::kind method.

use sophia::api::term::{SimpleTerm, Term, TermKind};
use TermKind::*;
let some_term: SimpleTerm = "foo".into_term();
match some_term.kind() {
    Iri => { /* ... */ }
    Literal => { /* ... */ }
    BlankNode => { /* ... */ }
    _ => { /* ... */ }
}

Alternatively, when only one kind is of interest, you can use Term::is_iri, Term::is_literal, Term::is_blank_node, etc.

If you are interested in the "value" of the term, the trait provides the following methods. All of them return an Option, which will be None if the term does not have the corresponding kind.

If the term is an IRI, Term::iri returns that IRI¹.
If the term is a blank node, Term::bnode_id returns its blank node identifier.
If the term is a literal:
- Term::lexical_form returns its lexical form (the "textual value" of the literal),
- Term::datatype returns its datatype IRI¹,
- Term::language_tag returns its language tag, if any.
If the term is a quoted triple:
- Term::triple returns its 3 components in an array of terms,
- Term::constituents iterates over all its constituents,
- Term::atoms iterates over all its atomic (i.e. non quoted-triple) constituents.
- (those three methods also have a to_X version that destructs the original term instead of borrowing it)
If the term is a variable², Term::variable returns its name.

Finally, the method Term::eq can be used to check whether two values implementing Term represent the same RDF term. Note that the == operator may give a different result than Term::eq on some types implementing the Term trait.

Useful types implementing `Term`

Below is a list of useful types implementing the Term trait:

Iri<T> and IriRef<T>, where T: Borrow<str>, representing IRIs
BnodeId<T>, where T: Borrow<str>, representing blank nodes
str, representing literals of type xsd:string,
i32, isize and usize representing literals of type xsd:integer,
f64 representing literals of type xsd:double,
SimpleTerm(see below).

SimpleTerm is a straightforward implementation of Term, that can represent any kind of term, and can either own its own underlying data or borrow it from something else.

Any term can be converted to a SimpleTerm using the Term::as_simple method. This method borrows as much as possible from the initial term to avoid spurious memory allocation. Alternatively, to convert any term to a self-sufficient SimpleTerm, you can use Term::into_term

Borrowing terms with `Term::borrow_term`

In Sophia, all functions accepting terms as parameters are expecting a type T: Term -- not &T, but the type T itself. So what happens when you want to call such a function with a term t, but still want to retain ownership of t?

The solution is to pass t.borrow_term() to the function. This method returns something implementing Term, representing the same RDF term as t, without waiving ownership. This is a very common pattern in Sophia.

More precisely, the type returned by t.borrow_term() is the associated type Term::BorrowTerm. In most cases, this is a reference or a copy of t.

Recipes for constructing terms

Constructing IRIs

fn main() -> Result<(), Box<dyn std::error::Error>> {

use sophia::{iri::IriRef, api::ns::Namespace};
let some_text = "http://example.org";
// construct an IRI from a constant
let iri1 = IriRef::new_unchecked("http://example.org");

// construct an IRI from an untrusted string
let iri2 = IriRef::new(some_text)?;

// construct multiple IRIs from a namespace
let ns = Namespace::new_unchecked("http://example.org/ns#");
let iri3 = ns.get_unchecked("foo");
let iri4 = ns.get(some_text)?;

// standard namespaces
use sophia::api::ns::{rdf, xsd};
let iri5 = rdf::Property ;
let iri6 = xsd::string ;

Ok(()) }

Constructing literals

fn main() -> Result<(), Box<dyn std::error::Error>> {

use sophia::api::{ns::xsd, term::{LanguageTag, SimpleTerm, Term}};
// use native types for xsd::string, xsd::integer, xsd::double
let lit_string = "hello world";
let lit_integer = 42;
let lit_double = 1.23;

// construct a language-tagged string
let fr = LanguageTag::new_unchecked("fr");
let lit_fr = "Bonjour le monde" * fr;

// construct a literal with an arbitrary datatype
let lit_date = "2023-11-15" * xsd::date;

Ok(()) }

Constructing blank nodes

fn main() -> Result<(), Box<dyn std::error::Error>> {

use sophia::api::term::BnodeId;
let b = BnodeId::new_unchecked("x");

Ok(()) }

Converting terms into a different type

use sophia::api::{ns::xsd, term::{SimpleTerm, Term}};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let some_term = "42" * xsd::integer;
let t1: SimpleTerm = "hello".into_term();
let t2: i32 = some_term.try_into_term()?;
Ok(()) }

Note that in Sophia's generalized RDF model, IRIs can be relative IRI reference.

Note that this kind only exist in Sophia's generalized RDF model.

RDF Statements

The Triple and Quad traits define how you interact with RDF statements in Sophia.

Note that in Sophia's generalized RDF model, terms of any kind can occur in any position in a statement. This contrasts to strict RDF where only IRIs can occur in predicate position, and where literals can only occur in the object position.

Using triples

Triples in RDF are made of a subject, a predicate and an object. They can be obtained respectively via the methods Triple::s, Triple::p and Triple::o, or all at once (as an array of three terms) via the method Triple::spo. These methods also have a to_X version that destructs the original triple instead of borrowing it.

use sophia::api::{ns::rdf, prelude::*};
// Example: yield all the terms being used as types in the given triples
fn all_types<IT, T>(triples: IT) -> impl Iterator<Item=T::Term>
where
  IT: IntoIterator<Item=T>,
  T: Triple,
{
  triples
    .into_iter()
    .filter(|t| rdf::type_ == t.p())
    .map(|t| t.to_o())
}

Using quads

Quads are used to represent triples in the context of an optional named graph. Like triples, they have methods Quad::s, Quad::p and Quad::o, but also Quad::g to access the optional graph name, and Quad::spog to obtain all four components all at once. These methods also have a to_X version that destructs the original quad instead of borrowing it.

use sophia::api::{ns::rdf, prelude::*};
// Example: yield all the triples in the default graph, from a list of quads
fn all_types<IQ, Q>(quads: IQ) -> impl Iterator<Item=[Q::Term; 3]>
where
  IQ: IntoIterator<Item=Q>,
  Q: Quad,
{
  quads
    .into_iter()
    .filter(|q| q.g().is_none())
    .map(|q| { let (spo, _g) = q.to_spog(); spo })
}

Comparing triples or quads

To check whether two values implementing Triple (resp. Quad) represent the same RDF statements, the method Triple::eq (resp. Quad::eq) must be used. It will compare each component of the statements using the Term::eq method. Note that the == operator may give a different result than Triple::eq or Quad::eq on some types implementing the Triple or the Quad trait.

Useful types implementing `Triple`

While the Triple and Quad traits can be implemented by multiple types, in most situations the following types will be used:

[T; 3] where T: Term implements Triple
([T; 3], Option<T>) where T: Term implements Quad

RDF Graphs

The Graph and MutableGraph traits define how you interact with RDF graphs in Sophia.

Using graphs

RDF graphs are sets of triples, so the most common thing you need to do with a graph is to iterate over its triples. This is achieved with the Graph::triples method:

use sophia::api::prelude::*;
use sophia::inmem::graph::LightGraph;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let g = LightGraph::new();
for result in g.triples() {
	let triple = result?;
	// do something with t;
}
Ok(()) }

Notice that Graph::triples yields Results, as some implementations of Graph may fail at any point of the iteration.

When only a subset of the triples in the graph are of interest, you will want to use the Graph::triples_matching method:

use sophia::api::{ns::rdf, prelude::*, term::SimpleTerm};
use sophia::inmem::graph::LightGraph;

let graph = LightGraph::new();
// Utility closure to recognize IRIs in the schema.org namespace
let in_schema_org = |t: SimpleTerm| -> bool {
	t.iri()
	 .map(|iri| iri.as_str().starts_with(("http://schema.org/")))
	 .unwrap_or(false)
};
// Iter over all instances of schema.org types
graph
	.triples_matching(Any, [rdf::type_], in_schema_org)
	.map(|res| { let [s, _, o] = res.unwrap().to_spo(); (s, o)})
	.for_each(|(instance, typ)| {
		// do something 
	})

Graph::triples_matching accepts a large variety of parameters, which will be described in more detail in the next chapter.

Graph also provide methods to iterate over all subjects, predicate and object in the graph, as well as over all unique terms of a certain kind (Graph::iris, Graph::blank_nodes, Graph::literals, etc.).

Finally, it is possible to check whether a graph contains a specific triple with the method Graph::contains.

Mutating graphs

Any implementation of Graph that can be mutated should also implement MutableGraph, which comes with additional methods for modifying the graph. Individual triples can be added to the graph (resp. removed from the graph) with MutableGraph::insert (resp. MutableGraph::remove). Inserting (resp. removing) a triple that is already (resp. not) present in the graph will be essentially a noop.

use sophia::{api::{ns::rdf, prelude::*}, iri::*};
/// Example: increment the rdf:value of a given subject
fn f<G: MutableGraph>(mut g: G) -> Result<(), Box<dyn std::error::Error>> {
let s = Iri::new_unchecked("https://example.org/foo");
let old_value: i32 = g.triples_matching([s], [rdf::value], Any)
	.next()
	.unwrap()?
	.o()
	.try_into_term()?;
g.remove(s, rdf::value, old_value)?;
g.insert(s, rdf::value, old_value + 1)?;
Ok(()) }

Batch modifications can also be performed on mutable graphs:

MutableGraph::insert_all inserts all the triples from a triple source¹;
MutableGraph::remove_all removes all the triples from a triple source¹;
MutableGraph::remove_matching removes all the triples matching the parameters;
MutableGraph::retain_matching removes all the triples except those matching the parameters.

The parameters of remove_matching and retain_matching are similar to those of Graph::triples_matching and are described in more detail in the next chapter.

Useful types implementing `Graph`

slices of triples implement Graph;
standard collections (Vec, HashSet and BTreeSet) of triples implement Graph and MutableGraph;
sophia::inmem::LightGraph provides a Graph and MutableGraph implementation with a low memory footprint;
sophia::inmem::FastGraph provides a Graph and MutableGraph implementation designed for fast retrieval of any given triple.

Recipes for constructing graphs

Constructing and populating an empty graph

use sophia::{api::{ns::{Namespace, rdf}, prelude::*}, inmem::graph::FastGraph};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut g = FastGraph::new();
let ex = Namespace::new_unchecked("https://example.org/ns#");
let alice = ex.get_unchecked("alice");
let s = Namespace::new_unchecked("http://schema.org/");
g.insert(
	&alice,
	rdf::type_,
	s.get_unchecked("Person")
)?;
g.insert(
	&alice,
	s.get_unchecked("name"),
  "Alice"
)?;
Ok(()) }

Constructing a graph from a triple source 1

use sophia::{api::prelude::*, inmem::graph::FastGraph, iri::Iri};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let big_graph = FastGraph::new();
// Extract all triples about 'alice' from big_graph in a new graph
let alice = Iri::new_unchecked("https://example.org/ns#alice");
let graph: FastGraph = big_graph
	.triples_matching([alice], Any, Any)
	.collect_triples()?;
Ok(()) }

NB: Only types implementing CollectibleGraph can actually be constructed with the collected_triples method as above. However, most types implementing Graph should implement CollectibleGraph.

Constructing a graph from a file

use sophia::{api::prelude::*, inmem::graph::FastGraph, iri::Iri};
use std::{io::BufReader, fs::File};
use sophia::turtle::parser::turtle;

fn main() -> Result<(), Box<dyn std::error::Error>> {
dbg!(std::env::current_dir());
let f = BufReader::new(File::open("../sophia_doap.ttl")?);
let graph: FastGraph = turtle::parse_bufread(f)
	.collect_triples()?;
Ok(()) }

For more about parsing (and serializing), see the corresponding chapter.

a TripleSource is a fallible stream of triples, such as those returned by Graph::triples or Graph::triples_matching, or those returned by parsers. In particular, any iterator of Result<T, E> where T: Triple is a TripleSource.

Term matchers

TODO

RDF Datasets

TODO explain briefly how the Dataset and MutableDataset traits are similar to Graph and MutableGraph, replacing Triple's with Quad's and TripleSource's with QuadSource's.

TODO add a chapter on how to access and manipulate individual named graphs, union graphs.

TODO add recipes

Parsing and Serializing

TODO describe the different parsers and serializers available in Sophia (mentioning the feature gates)

TODO explain how to use the options (e.g. to produce pretty Turtle)

Changes since version 0.7

Sophia has been heavily refactored between version 0.7 and 0.8. This refactoring was triggered by the use of Generic Ascciated Types (GATs), that have finally landed in stable Rust. But this was also an opportunity to make a number of other changes.

The benefit of GATs

The main benefit of GATs is to get rid of odd patterns that were introduced in Sophia in order to keep it generic enough to support multiple implementation choices. The drawback of this approach was that implementing Sophia's traits (especially Graph and Dataset) could be cumbersome.

As an example, the Graph trait used to be

pub trait Graph {
    type Triple: TripleStreamingMode;
    // ...
}

Given a type MyGraph implementing that trait, the actual type of triples yielded by MyGraph::triples could not be immediately determined, and was quite intricate. This could be inconvenient for some users of MyGraph, and was usually cumbersome for the implementer.

Compare to the new definition of the Graph trait:

pub trait Graph {
    type Triple<'x>: Triple where Self: 'x;
    // ...
}

where Graph::triples now yield triples whose type is exactly Graph::Triple<'_>. Much easier.

The same pattern existed for Dataset, TripleSource, and QuadSource, where GATs have now also replaced it.

The new `Term` trait

The old TTerm trait has been replaced by a new Term trait, with a significantly different API, that serves several purposes:

it now supports RDF-star
it now allows atomic types (such as &str or i32) to be used directly as terms (they are interpreted as xsd:string and xsd:integer literals, respectively).

Any code that handles terms will need some significant rewriting. See the chapter on RDF terms for more detail.

The end of the "IRI zoo"

Historically, a number of different types have been created in Sophia for representing IRIs, which was causing some confusion. Most of them have now disappeared, in favor of the types defined in sophia_iri.

Reducing the `sophia_term` crate

The sophia_term crate, from which most term implementations came in 0.7, has been significantly reduced. The most general types that it provided (BoxTerm, RefTerm) are now subsumed by SimpleTerm, a straightforward implementation of the Term trait, provided by sophia_api. More specific types (such as RcTerm or ArcTerm) are still provided by sophia_term.

Simplification of the `Graph` and `Dataset` traits

In version 0.7, the Graph trait had a number of specialized methods for retrieving selected triples, such as triples_with_s or triples_with_po (and similarly for Dataset: quads_with_s, etc.).

All these methods have disappeared in favor of triples_matching, so that instead of:

for t in g.triples_with_s(mys) {
    // ...
}

one should now write

extern crate sophia;
use sophia::api::prelude::*;
let g: Vec<[i32; 3]> = vec![]; // dummy graph type
let mys = 42;
for t in g.triples_matching([mys], Any, Any) {
    // ...
}

and the performances will be the same (depending, of course, of how carefully the Graph/Dataset was implemented, but that was already the case with the previous API).

The `sophia` crate

As before, Sophia is still made of several specialized crates (sophia_api, sophia_iri, sophia_turtle...) that are all packaged in a one-stop-shop crate named sophia. Note however that the structure of that crate as changed significantly. In version 0.7, it re-exported symbols from the smaller crates in its own module hierarchy, mostly for historical reason. In version 0.8, it simply exposes the smaller crates into a corresponding module, e.g. sophia::api re-exports the root module of sophia_api, and so on.

Requesting help

As migration from version 0.7 to version 0.8 can be challenging, a dedicated tag has been added on the github repository of Sophia to mark migration issues and request assistance.

Sophia

Reducing the sophia_term crate

Reducing the `sophia_term` crate