Sophia
A Rust toolkit for RDF and Linked Data.
It comprises the following crates:
- Core crates:
sophia_api
defines a generic API for RDF and linked data, as a set of core traits and types; more precisely, it provides traits for describing- terms, triples and quads,
- graphs and datasets,
- parsers and serializers
sophia_iri
provides functions, types and traits for validating and resolving IRIs.sophia_term
defines various implementations of theTerm
trait fromsophia_api
.sophia_inmem
defines in-memory implementations of theGraph
andDataset
traits fromsophia_api
.
- Parsers and serializers
sophia_turtle
provides parsers and serializers for the Turtle-family of concrete syntaxes.sophia_jsonld
provides parsers and serializers for JSON-LD.sophia_xml
provides parsers and serializers for RDF/XML.sophia_rio
is a lower-level crate, used by the ones above.
- Other
sophia_c14n
implements RDF canonicalization.sophia_isomorphism
provides functions to determine if two graphs or datasets are isomorphic.sophia_sparql_client
provides a client for the SPARQL 1.1 Protocol.sophia_resource
provides a resource-centric API.
- All-inclusive
sophia
re-exports symbols from all the crates above, with the following provisio:sophia_jsonld
is only available with thejsonld
featuresophia_sparql_client
is only available with thehttp_client
featuresophia_xml
is only available with thexml
feature
In addition to the API documentation, a high-level user documentation is available (although not quite complete yet).
Licence
CECILL-B (compatible with BSD)
Citation
When using Sophia, please use the following citation:
Champin, P.-A. (2020) ‘Sophia: A Linked Data and Semantic Web toolkit for Rust’, in Wilde, E. and Amundsen, M. (eds). The Web Conference 2020: Developers Track, Taipei, TW. Available at: https://www2020devtrack.github.io/site/schedule.
Bibtex:
@misc{champin_sophia_2020,
title = {{Sophia: A Linked Data and Semantic Web toolkit for Rust},
author = {Champin, Pierre-Antoine},
howpublished = {{The Web Conference 2020: Developers Track}},
address = {Taipei, TW},
editor = {Wilde, Erik and Amundsen, Mike},
month = apr,
year = {2020},
language = {en},
url = {https://www2020devtrack.github.io/site/schedule}
}
Third-party crates
The following third-party crates are using or extending Sophia
hdt
provides an implementation of Sophia's traits based on the HDT format.manas
is a modular framework for implementing Solid compatible serversnanopub
is a toolkit for managing nanopublications
History
An outdated comparison of Sophia with other RDF libraries is still available here.
Introduction
The sophia crate aims at providing a comprehensive toolkit for working with RDF and Linked Data in Rust.
RDF is a data model designed to exchange knowledge on the Web in an interoperable way. Each piece of knowledge in RDF (a statement) is represented by a triple, made of three terms. A set of triples forms an RDF graph. Finally, several graphs can be grouped in a collection called a dataset, where each graph is identified by a unique name.
In Sophia, each of these core concepts is modeled by a trait, which can be implemented in multiple ways (see for example the Graph
trait and some of the types implementing it). Sophia is therefore not meant to provide the "ultimate" implementation of RDF in Rust, but a generic framework to help various implementations to interoperate with each other (in the spirit of Apache Commons RDF for Java or [RDFJS] for Javascript/Typescript).
Generalized vs. Strict RDF model
The data model supported by this Sophia is in fact a superset of the RDF data model as defined by the W3C. When the distinction matters, they will be called, respectively, the generalized RDF model, and the strict RDF model. The generalized RDF model extends RDF as follows:
- In addition to standard RDF terms (IRIs, blank nodes and literals),
Sophia supports
- RDF-star quoted triples
- Variables (a concept borrowed from [SPARQL] or [Notation3])
- Sophia allows any kind of term in any position (subject, predicate, object, graph name).
- Sophia allow IRIs to be relative IRI references (while in strict RDF, IRIs must be absolute).
Getting Started
Below is a short example demonstrating how to build a graph, mutate it and serialize it back.
Add the sophia crate to your dependencies in Cargo.toml
[dependencies]
sophia = "0.8.0"
Add these lines of code and run the program.
use sophia::api::prelude::*;
use sophia::api::ns::Namespace;
use sophia::inmem::graph::LightGraph;
use sophia::turtle::parser::turtle;
use sophia::turtle::serializer::nt::NtSerializer;
// loading a graph
let example = r#"
@prefix : <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
:alice foaf:name "Alice";
foaf:mbox <mailto:alice@work.example> .
:bob foaf:name "Bob".
"#;
let mut graph: LightGraph = turtle::parse_str(example).collect_triples()?;
// mutating the graph
let ex = Namespace::new("http://example.org/")?;
let foaf = Namespace::new("http://xmlns.com/foaf/0.1/")?;
graph.insert(
ex.get("bob")?,
foaf.get("knows")?,
ex.get("alice")?,
)?;
// serializing the graph
let mut nt_stringifier = NtSerializer::new_stringifier();
let example2 = nt_stringifier.serialize_graph(&graph)?.as_str();
println!("The resulting graph:\n{}", example2);
Ok::<(), Box<dyn std::error::Error>>(())
You should get the following output:
The resulting graph:
<http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Alice".
<http://example.org/alice> <http://xmlns.com/foaf/0.1/mbox> <mailto:alice@work.example>.
<http://example.org/bob> <http://xmlns.com/foaf/0.1/name> "Bob".
<http://example.org/bob> <http://xmlns.com/foaf/0.1/knows> <http://example.org/alice>.
RDF Terms
The Term
trait defines how you interact with RDF terms in Sophia.
Using terms
The first thing you usually need to know about a term is its kind (IRI, Literal...).
The kind is described by the TermKind
enum,
and available from the Term::kind
method.
use sophia::api::term::{SimpleTerm, Term, TermKind};
use TermKind::*;
let some_term: SimpleTerm = "foo".into_term();
match some_term.kind() {
Iri => { /* ... */ }
Literal => { /* ... */ }
BlankNode => { /* ... */ }
_ => { /* ... */ }
}
Alternatively, when only one kind is of interest, you can use Term::is_iri
, Term::is_literal
, Term::is_blank_node
, etc.
If you are interested in the "value" of the term, the trait provides the following methods. All of them return an Option
, which will be None
if the term does not have the corresponding kind.
-
If the term is a blank node,
Term::bnode_id
returns its blank node identifier. -
If the term is a literal:
Term::lexical_form
returns its lexical form (the "textual value" of the literal),Term::datatype
returns its datatype IRI1,Term::language_tag
returns its language tag, if any.
-
If the term is a quoted triple:
Term::triple
returns its 3 components in an array of terms,Term::constituents
iterates over all its constituents,Term::atoms
iterates over all its atomic (i.e. non quoted-triple) constituents.- (those three methods also have a
to_X
version that destructs the original term instead of borrowing it)
-
If the term is a variable2,
Term::variable
returns its name.
Finally, the method Term::eq
can be used to check whether two values implementing Term
represent the same RDF term. Note that the ==
operator may give a different result than Term::eq
on some types implementing the Term
trait.
Useful types implementing Term
Below is a list of useful types implementing the Term
trait:
Iri
<T>
andIriRef
<T>
, whereT: Borrow<str>
, representing IRIsBnodeId
<T>
, whereT: Borrow<str>
, representing blank nodesstr
, representing literals of typexsd:string
,i32
,isize
andusize
representing literals of typexsd:integer
,f64
representing literals of typexsd:double
,SimpleTerm
(see below).
SimpleTerm
is a straightforward implementation of Term
, that can represent any kind of term, and can either own its own underlying data or borrow it from something else.
Any term can be converted to a SimpleTerm
using the Term::as_simple
method.
This method borrows as much as possible from the initial term to avoid spurious memory allocation.
Alternatively, to convert any term to a self-sufficient SimpleTerm
, you can use Term::into_term
See also the list of recipes below.
Borrowing terms with Term::borrow_term
In Sophia, all functions accepting terms as parameters are expecting a type T: Term
-- not &T
, but the type T
itself. So what happens when you want to call such a function with a term t
, but still want to retain ownership of t
?
The solution is to pass t.borrow_term()
to the function. This method returns something implementing Term
, representing the same RDF term as t
, without waiving ownership. This is a very common pattern in Sophia.
More precisely, the type returned by t.borrow_term()
is the associated type Term::BorrowTerm
. In most cases, this is a reference or a copy of t
.
Recipes for constructing terms
Constructing IRIs
fn main() -> Result<(), Box<dyn std::error::Error>> {
use sophia::{iri::IriRef, api::ns::Namespace};
let some_text = "http://example.org";
// construct an IRI from a constant
let iri1 = IriRef::new_unchecked("http://example.org");
// construct an IRI from an untrusted string
let iri2 = IriRef::new(some_text)?;
// construct multiple IRIs from a namespace
let ns = Namespace::new_unchecked("http://example.org/ns#");
let iri3 = ns.get_unchecked("foo");
let iri4 = ns.get(some_text)?;
// standard namespaces
use sophia::api::ns::{rdf, xsd};
let iri5 = rdf::Property ;
let iri6 = xsd::string ;
Ok(()) }
Constructing literals
fn main() -> Result<(), Box<dyn std::error::Error>> {
use sophia::api::{ns::xsd, term::{LanguageTag, SimpleTerm, Term}};
// use native types for xsd::string, xsd::integer, xsd::double
let lit_string = "hello world";
let lit_integer = 42;
let lit_double = 1.23;
// construct a language-tagged string
let fr = LanguageTag::new_unchecked("fr");
let lit_fr = "Bonjour le monde" * fr;
// construct a literal with an arbitrary datatype
let lit_date = "2023-11-15" * xsd::date;
Ok(()) }
Constructing blank nodes
fn main() -> Result<(), Box<dyn std::error::Error>> {
use sophia::api::term::BnodeId;
let b = BnodeId::new_unchecked("x");
Ok(()) }
Converting terms into a different type
use sophia::api::{ns::xsd, term::{SimpleTerm, Term}};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let some_term = "42" * xsd::integer;
let t1: SimpleTerm = "hello".into_term();
let t2: i32 = some_term.try_into_term()?;
Ok(()) }
Note that in Sophia's generalized RDF model, IRIs can be relative IRI reference.
Note that this kind only exist in Sophia's generalized RDF model.
RDF Statements
The Triple
and Quad
traits define how you interact with RDF statements in Sophia.
Note that in Sophia's generalized RDF model, terms of any kind can occur in any position in a statement. This contrasts to strict RDF where only IRIs can occur in predicate position, and where literals can only occur in the object position.
Using triples
Triples in RDF are made of a subject, a predicate and an object.
They can be obtained respectively via the methods Triple::s
, Triple::p
and Triple::o
,
or all at once (as an array of three terms) via the method Triple::spo
.
These methods also have a to_X
version that destructs the original triple instead of borrowing it.
use sophia::api::{ns::rdf, prelude::*};
// Example: yield all the terms being used as types in the given triples
fn all_types<IT, T>(triples: IT) -> impl Iterator<Item=T::Term>
where
IT: IntoIterator<Item=T>,
T: Triple,
{
triples
.into_iter()
.filter(|t| rdf::type_ == t.p())
.map(|t| t.to_o())
}
Using quads
Quads are used to represent triples in the context of an optional named graph.
Like triples, they have methods Quad::s
, Quad::p
and Quad::o
,
but also Quad::g
to access the optional graph name,
and Quad::spog
to obtain all four components all at once.
These methods also have a to_X
version that destructs the original quad instead of borrowing it.
use sophia::api::{ns::rdf, prelude::*};
// Example: yield all the triples in the default graph, from a list of quads
fn all_types<IQ, Q>(quads: IQ) -> impl Iterator<Item=[Q::Term; 3]>
where
IQ: IntoIterator<Item=Q>,
Q: Quad,
{
quads
.into_iter()
.filter(|q| q.g().is_none())
.map(|q| { let (spo, _g) = q.to_spog(); spo })
}
Comparing triples or quads
To check whether two values implementing Triple
(resp. Quad
)
represent the same RDF statements, the method Triple::eq
(resp. Quad::eq
)
must be used.
It will compare each component of the statements using the Term::eq
method.
Note that the ==
operator may give a different result than Triple::eq
or Quad::eq
on some types implementing the Triple
or the Quad
trait.
Useful types implementing Triple
While the Triple
and Quad
traits can be implemented by multiple types,
in most situations the following types will be used:
RDF Graphs
The Graph
and MutableGraph
traits define how you interact with RDF graphs in Sophia.
Using graphs
RDF graphs are sets of triples,
so the most common thing you need to do with a graph is to iterate over its triples.
This is achieved with the Graph::triples
method:
use sophia::api::prelude::*;
use sophia::inmem::graph::LightGraph;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let g = LightGraph::new();
for result in g.triples() {
let triple = result?;
// do something with t;
}
Ok(()) }
Notice that Graph::triples
yields Result
s,
as some implementations of Graph
may fail at any point of the iteration.
When only a subset of the triples in the graph are of interest,
you will want to use the Graph::triples_matching
method:
use sophia::api::{ns::rdf, prelude::*, term::SimpleTerm};
use sophia::inmem::graph::LightGraph;
let graph = LightGraph::new();
// Utility closure to recognize IRIs in the schema.org namespace
let in_schema_org = |t: SimpleTerm| -> bool {
t.iri()
.map(|iri| iri.as_str().starts_with(("http://schema.org/")))
.unwrap_or(false)
};
// Iter over all instances of schema.org types
graph
.triples_matching(Any, [rdf::type_], in_schema_org)
.map(|res| { let [s, _, o] = res.unwrap().to_spo(); (s, o)})
.for_each(|(instance, typ)| {
// do something
})
Graph::triples_matching
accepts a large variety of parameters,
which will be described in more detail in the next chapter.
Graph
also provide methods to iterate over all subjects,
predicate
and object
in the graph,
as well as over all unique terms of a certain kind
(Graph::iris
, Graph::blank_nodes
, Graph::literals
, etc.).
Finally, it is possible to check whether a graph contains a specific triple with the method Graph::contains
.
Mutating graphs
Any implementation of Graph
that can be mutated should also implement MutableGraph
,
which comes with additional methods for modifying the graph.
Individual triples can be added to the graph (resp. removed from the graph)
with MutableGraph::insert
(resp. MutableGraph::remove
).
Inserting (resp. removing) a triple that is already (resp. not) present in the graph will be essentially a noop.
use sophia::{api::{ns::rdf, prelude::*}, iri::*};
/// Example: increment the rdf:value of a given subject
fn f<G: MutableGraph>(mut g: G) -> Result<(), Box<dyn std::error::Error>> {
let s = Iri::new_unchecked("https://example.org/foo");
let old_value: i32 = g.triples_matching([s], [rdf::value], Any)
.next()
.unwrap()?
.o()
.try_into_term()?;
g.remove(s, rdf::value, old_value)?;
g.insert(s, rdf::value, old_value + 1)?;
Ok(()) }
Batch modifications can also be performed on mutable graphs:
MutableGraph::insert_all
inserts all the triples from a triple source1;MutableGraph::remove_all
removes all the triples from a triple source1;MutableGraph::remove_matching
removes all the triples matching the parameters;MutableGraph::retain_matching
removes all the triples except those matching the parameters.
The parameters of remove_matching
and retain_matching
are similar to those of Graph::triples_matching
and are described in more detail in the next chapter.
Useful types implementing Graph
- slices of triples implement
Graph
; - standard collections (
Vec
,HashSet
andBTreeSet
) of triples implementGraph
andMutableGraph
; sophia::inmem::LightGraph
provides aGraph
andMutableGraph
implementation with a low memory footprint;sophia::inmem::FastGraph
provides aGraph
andMutableGraph
implementation designed for fast retrieval of any given triple.
Recipes for constructing graphs
Constructing and populating an empty graph
use sophia::{api::{ns::{Namespace, rdf}, prelude::*}, inmem::graph::FastGraph};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut g = FastGraph::new();
let ex = Namespace::new_unchecked("https://example.org/ns#");
let alice = ex.get_unchecked("alice");
let s = Namespace::new_unchecked("http://schema.org/");
g.insert(
&alice,
rdf::type_,
s.get_unchecked("Person")
)?;
g.insert(
&alice,
s.get_unchecked("name"),
"Alice"
)?;
Ok(()) }
Constructing a graph from a triple source1
use sophia::{api::prelude::*, inmem::graph::FastGraph, iri::Iri};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let big_graph = FastGraph::new();
// Extract all triples about 'alice' from big_graph in a new graph
let alice = Iri::new_unchecked("https://example.org/ns#alice");
let graph: FastGraph = big_graph
.triples_matching([alice], Any, Any)
.collect_triples()?;
Ok(()) }
NB: Only types implementing CollectibleGraph
can actually be constructed with the collected_triples
method as above.
However, most types implementing Graph
should implement CollectibleGraph
.
Constructing a graph from a file
use sophia::{api::prelude::*, inmem::graph::FastGraph, iri::Iri};
use std::{io::BufReader, fs::File};
use sophia::turtle::parser::turtle;
fn main() -> Result<(), Box<dyn std::error::Error>> {
dbg!(std::env::current_dir());
let f = BufReader::new(File::open("../sophia_doap.ttl")?);
let graph: FastGraph = turtle::parse_bufread(f)
.collect_triples()?;
Ok(()) }
For more about parsing (and serializing), see the corresponding chapter.
a TripleSource
is a fallible stream of triples,
such as those returned by Graph::triples
or Graph::triples_matching
,
or those returned by parsers.
In particular, any iterator of Result<T, E>
where T:
Triple
is a TripleSource
.
Term matchers
TODO
RDF Datasets
TODO explain briefly how the Dataset and MutableDataset traits are similar to Graph and MutableGraph, replacing Triple's with Quad's and TripleSource's with QuadSource's.
TODO add a chapter on how to access and manipulate individual named graphs, union graphs.
TODO add recipes
Parsing and Serializing
TODO describe the different parsers and serializers available in Sophia (mentioning the feature gates)
TODO explain how to use the options (e.g. to produce pretty Turtle)
Changes since version 0.7
Sophia has been heavily refactored between version 0.7 and 0.8. This refactoring was triggered by the use of Generic Ascciated Types (GATs), that have finally landed in stable Rust. But this was also an opportunity to make a number of other changes.
The benefit of GATs
The main benefit of GATs is to get rid of odd patterns that were introduced in Sophia in order to keep it generic enough to support multiple implementation choices. The drawback of this approach was that implementing Sophia's traits (especially Graph
and Dataset
) could be cumbersome.
As an example, the Graph
trait used to be
pub trait Graph {
type Triple: TripleStreamingMode;
// ...
}
Given a type MyGraph
implementing that trait, the actual type of triples yielded by MyGraph::triples
could not be immediately determined, and was quite intricate. This could be inconvenient for some users of MyGraph
, and was usually cumbersome for the implementer.
Compare to the new definition of the Graph
trait:
pub trait Graph {
type Triple<'x>: Triple where Self: 'x;
// ...
}
where Graph::triples
now yield triples whose type is exactly Graph::Triple<'_>
. Much easier.
The same pattern existed for Dataset
,
TripleSource
, and
QuadSource
,
where GATs have now also replaced it.
The new Term
trait
The old TTerm
trait has been replaced by a new Term
trait, with a significantly different API, that serves several purposes:
- it now supports RDF-star
- it now allows atomic types (such as
&str
ori32
) to be used directly as terms (they are interpreted asxsd:string
andxsd:integer
literals, respectively).
Any code that handles terms will need some significant rewriting. See the chapter on RDF terms for more detail.
The end of the "IRI zoo"
Historically, a number of different types have been created in Sophia for representing IRIs,
which was causing some confusion.
Most of them have now disappeared, in favor of the types defined in sophia_iri
.
Reducing the sophia_term
crate
The sophia_term
crate,
from which most term implementations came in 0.7, has been significantly reduced.
The most general types that it provided (BoxTerm
, RefTerm
)
are now subsumed by SimpleTerm
,
a straightforward implementation of the Term
trait, provided by
sophia_api
.
More specific types (such as
RcTerm
or
ArcTerm
)
are still provided by sophia_term
.
Simplification of the Graph
and Dataset
traits
In version 0.7, the Graph
trait had a number of specialized methods for retrieving selected triples,
such as triples_with_s
or triples_with_po
(and similarly for Dataset
: quads_with_s
, etc.).
All these methods have disappeared in favor of triples_matching
,
so that instead of:
for t in g.triples_with_s(mys) {
// ...
}
one should now write
extern crate sophia;
use sophia::api::prelude::*;
let g: Vec<[i32; 3]> = vec![]; // dummy graph type
let mys = 42;
for t in g.triples_matching([mys], Any, Any) {
// ...
}
and the performances will be the same
(depending, of course, of how carefully the Graph
/Dataset
was implemented,
but that was already the case with the previous API).
The sophia
crate
As before, Sophia is still made of several specialized crates
(sophia_api
, sophia_iri
, sophia_turtle
...)
that are all packaged in a one-stop-shop crate named sophia
.
Note however that the structure of that crate as changed significantly.
In version 0.7, it re-exported symbols from the smaller crates in its own module hierarchy, mostly for historical reason.
In version 0.8, it simply exposes the smaller crates into a corresponding module,
e.g. sophia::api
re-exports the root module of sophia_api
, and so on.
Requesting help
As migration from version 0.7 to version 0.8 can be challenging, a dedicated tag has been added on the github repository of Sophia to mark migration issues and request assistance.