This vignette describes how to create ldf resources from SPARQL queries or RDF files. You might like to first read the Working with LDF Resources vignette to understand how RDF resources are represented in LDF.
You can create resources by downloading a table of descriptions with a SPARQL SELECT
query.
As an example, lets download some music genres from dbpedia. This query will find 100 things, identified by their uri
that are music genres, along with their label
and a descriptive comment
. We’ll look for the English version of the latter two strings.
music_genres_query <- " PREFIX : <http://dbpedia.org/ontology/> SELECT * WHERE { ?uri a :MusicGenre; rdfs:label ?label; rdfs:comment ?comment . FILTER langMatches(lang(?label), 'EN') FILTER langMatches(lang(?comment), 'EN') } LIMIT 100 "
We can use the query()
function to execute the query and parse the results:
music_genre_results <- query(music_genres_query, endpoint="http://dbpedia.org/sparql/")
This is what the first few results look like:
head(music_genre_results) #> # A tibble: 6 x 3 #> uri label comment #> <chr> <chr> <chr> #> 1 http://dbpedia.org/resou… Art rock "Art rock is a subgenre of rock music t… #> 2 http://dbpedia.org/resou… Bebop "Bebop or bop is a style of jazz develo… #> 3 http://dbpedia.org/resou… Britpop "Britpop was a UK based music and cultu… #> 4 http://dbpedia.org/resou… Bubblegum … "Bubblegum pop (also known as bubblegum… #> 5 http://dbpedia.org/resou… Fighting g… "A fighting game is a video game in whi… #> 6 http://dbpedia.org/resou… Free impro… "Free improvisation or free music is im…
We can then create resources for these:
music_genres <- resource(music_genre_results$uri, description=music_genre_results)
Which we can then manipulate within R:
# find music genres where the description mentions "dance" music_genres[grep("dance", property(music_genres, "comment"))] #> <ldf_resource[12]> #> [1] Polka Trance music #> [3] Vaudeville Zarzuela #> [5] Afro/Cosmic music Benga music #> [7] Bubblegum dance Waltz (International Standard) #> [9] Logobi Sega (genre) #> [11] K-pop Western swing #> Description: uri, label, comment
We can create resources from serialised RDF files too.
To read RDF into R we can use the rdflib package. This in turn uses the redland package to provide bindings to the C library of the same name, and the jsonld package for JSON-LD serialisations.
Let’s load up an example from that package.
library(rdflib) article_rdf <- rdf_parse(system.file("extdata", "ex.xml", package="rdflib", mustWork=TRUE))
This creates a list containing pointers to a redland world and model objects. We can take a peak at the statements with rdflib::print.rdf()
(this serialises the data again and prints back the result):
print(article_rdf, format="turtle") #> Total of 35 triples, stored in hashes #> ------------------------------- #> @base <localhost://> . #> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . #> #> <http://id.crossref.org/contributor/benjamin-l-phillips-2etprmps2zm1a> #> a <http://xmlns.com/foaf/0.1/Person> ; #> <http://xmlns.com/foaf/0.1/familyName> "Phillips" ; #> <http://xmlns.com/foaf/0.1/givenName> "Benjamin L." ; #> <http://xmlns.com/foaf/0.1/name> "Benjamin L. Phillips" . #> #> <http://id.crossref.org/contributor/carl-boettiger-2etprmps2zm1a> #> #> ... with 25 more triples
The contents is too big to display here, but you can see from the rdf file itself, that the data describes a journal article: https://doi.org/10.1002/ece3.2314.
The description is a graph, not a table. It’s not a tidy collection of similarly shaped objects. We’ve got the article itself, and nested descriptions of related resources.
We can identify the different resource types with a query:
rdf_query(article_rdf, "SELECT * WHERE { ?s a ?type }") #> # A tibble: 4 x 2 #> s type #> <chr> <chr> #> 1 http://id.crossref.org/contributor/t-alex-perkins… http://xmlns.com/foaf/0.1/… #> 2 http://id.crossref.org/contributor/benjamin-l-phi… http://xmlns.com/foaf/0.1/… #> 3 http://id.crossref.org/contributor/carl-boettiger… http://xmlns.com/foaf/0.1/… #> 4 http://id.crossref.org/issn/2045-7758 http://purl.org/ontology/b…
Here we can see the description also includes the journal in which the article is published and the creators.
We could gather all of these entities into a single resource vector, but the descriptions wouldn’t overlap. The creators don’t have prism:issn
identifiers and the journal doesn’t have a foaf:familyName
.
Instead it makes more sense to split these entities into separate vectors. We’ll focus on the creators, since there are several of them. We can do this with a query:
creators_triples <- rdf_query(article_rdf, "SELECT * WHERE { ?s a <http://xmlns.com/foaf/0.1/Person>; ?p ?o }")
Now we have a table of statements about the creators:
creators_triples #> # A tibble: 12 x 3 #> s p o #> <chr> <chr> <chr> #> 1 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… Boettiger #> 2 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… Carl #> 3 http://id.crossref.org/contributo… http://www.w3.org/1999/… http://xmlns.com… #> 4 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… Carl Boettiger #> 5 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… Perkins #> 6 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… T. Alex Perkins #> 7 http://id.crossref.org/contributo… http://www.w3.org/1999/… http://xmlns.com… #> 8 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… T. Alex #> 9 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… Benjamin L. Phil… #> 10 http://id.crossref.org/contributo… http://www.w3.org/1999/… http://xmlns.com… #> 11 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… Benjamin L. #> 12 http://id.crossref.org/contributo… http://xmlns.com/foaf/0… Phillips
We can tabulate these statements into a tidy data frame with one row per creator, and one column per property.
library(tidyr) (creators_description <- creators_triples %>% spread("p","o")) #> # A tibble: 3 x 5 #> s `http://www.w3.or… `http://xmlns.c… `http://xmlns.c… `http://xmlns.c… #> <chr> <chr> <chr> <chr> <chr> #> 1 http://… http://xmlns.com/… Phillips Benjamin L. Benjamin L. Phi… #> 2 http://… http://xmlns.com/… Boettiger Carl Carl Boettiger #> 3 http://… http://xmlns.com/… Perkins T. Alex T. Alex Perkins
We could proceed using the full URIs as properties, but it’s nicer to replace these with shorter strings that don’t need escaping with backticks:
library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union (creators_description <- creators_description %>% rename(uri=s, type=`http://www.w3.org/1999/02/22-rdf-syntax-ns#type`, family_name=`http://xmlns.com/foaf/0.1/familyName`, given_name=`http://xmlns.com/foaf/0.1/givenName`, name=`http://xmlns.com/foaf/0.1/name`)) #> # A tibble: 3 x 5 #> uri type family_name given_name name #> <chr> <chr> <chr> <chr> <chr> #> 1 http://id.crossref.org/cont… http://xmlns.c… Phillips Benjamin … Benjamin … #> 2 http://id.crossref.org/cont… http://xmlns.c… Boettiger Carl Carl Boet… #> 3 http://id.crossref.org/cont… http://xmlns.c… Perkins T. Alex T. Alex P…
We could have done the same transformation within the select query:
describe_creator = " PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT * WHERE { ?uri a <http://xmlns.com/foaf/0.1/Person>; a ?type; foaf:familyName ?family_name; foaf:givenName ?given_name; foaf:name ?name; . } " (creators_description <- rdf_query(article_rdf, describe_creator)) #> # A tibble: 3 x 5 #> uri type family_name given_name name #> <chr> <chr> <chr> <chr> <chr> #> 1 http://id.crossref.org/cont… http://xmlns.c… Boettiger Carl Carl Boet… #> 2 http://id.crossref.org/cont… http://xmlns.c… Perkins T. Alex T. Alex P… #> 3 http://id.crossref.org/cont… http://xmlns.c… Phillips Benjamin … Benjamin …
We can then use this to create the resource vector:
(creators <- resource(creators_description$uri, creators_description)) #> <ldf_resource[3]> #> [1] http://id.crossref.org/contributor/carl-boettiger-2etprmps2zm1a #> [2] http://id.crossref.org/contributor/t-alex-perkins-2etprmps2zm1a #> [3] http://id.crossref.org/contributor/benjamin-l-phillips-2etprmps2zm1a #> Description: uri, type, family_name, given_name, name