Apache Jena to learn RDF and SPARQL

RDF is one of the semantic web technology as well as the foundation for Turtle, N-Triples including JSON-LD. SPARQL is the query language for RDF. Use Apache Jena tools to learn RDF.

For example, if you see the web page, that is human readable, because the end-user is human. However, there are search engines who choose the page on behalf of the consumer. Therefore, a search engine is a machine who wants to read the web page metadata. There should be well-structured data in the web page to be understood by the search engines by semantic parsing.

GitHub



Some of the machine-readable metadata are:

  • meta tag
  • Microsdta
  • Microformats
  • RDFa
  • JSON-LD

It is necessary to know where the Resource Description Framework(RDF)1 and SPARQL fit in the semantic web2. The semantic web is Web of data. RDF provides a foundation for publishing and linking data of all OWL3, SKOS, RDFS4 and so on. If the semantic web is a global database, SPARQL5 is the query language for that.

Introduction to RDF

For example, as explained in this article6, I can write a very simple RDF Turtle (rdf_example1.ttl) as follows:

@prefix dc:   <http://purl.org/dc/elements/1.1/> .
@prefix :     <http://ojitha.github.io/blog/> .
:post  dc:title  "Learn SPARQL for RDF" .

In the RDF/XML, when you convert TTL to XML using riot7 tool available in the Apache Jean8

<JENA_HOME>/bin/riot --formatted=rdfxml rdf_example1.ttl

You get the output similar to the following:

<rdf:RDF
    xmlns="http://ojitha.github.io/blog/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="http://ojitha.github.io/blog/post">
    <dc:title>Learn SPARQL for RDF</dc:title>
  </rdf:Description>
</rdf:RDF>

Here find the simple SPARQL query (rd_example1.rq) which is to find the title of a blog post of the given graph.

SELECT ?title
WHERE
{
  <http://ojitha.github.io/blog/post> <http://purl.org/dc/elements/1.1/title> ?title .
}    

to execute this query on the RDF data, I used Apache Jean

<JENA home>/bin/sparql --data=rdf_example1.ttl --query=rdf_example1.rq  

The output of this is

--------------------------
| title                  |
==========================
| "Learn SPARQL for RDF" |
--------------------------

The RDF triple contains three components.

graph LR
	s((Subject)) -- predicate ---o((Object))

As shown in the above diagram:

  1. Subject (s): RDF URI reference9 or blank node
  2. Predicate (p): RDF URI reference
  3. Object (o): RDF URI reference, a literal or blank node

As shown in the above graph (RDF Graph), it is a set of RDF triples.

RDF intended to represent metadata about web resources such as web page (author, title and so on). For example, Jekyll Front Matter10 suppose to define for static web pages. However, web resource means not only web page, although I take it as an example.

In the above RDF/XML shows,

  • this is about http://ojitha.github.io/blog/post (say post) which is kind of description
  • this post has a property called title with the value of literal

We can say it is in one statement :

📝 http://ojitha.github.io/blog/post has a title whose value is "Learn...."

To understand the RDF triple, execute the SPARQL query SELECT * WHERE { ?s ?p ?o } using the following command:

sparql --data=rdf_example1.ttl --query=rdf_example1.rq

And your output show subject, predicate and object clearly.

----------------------------------------------------------------------------------------------------------
| s                                   | p                                       | o                      |
==========================================================================================================
| <http://ojitha.github.io/blog/post> | <http://purl.org/dc/elements/1.1/title> | "Learn SPARQL for RDF" |
----------------------------------------------------------------------------------------------------------

Now you can add another statement to the same data model

📝 http://ojitha.github.io/blog/post has a creator who value is "Ojitha"

For example:

<rdf:RDF
    xmlns="http://ojitha.github.io/blog/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="http://ojitha.github.io/blog/post">
    <dc:creator>Ojitha</dc:creator>
    <dc:title>Learn SPARQL for RDF</dc:title>
  </rdf:Description>
</rdf:RDF>

Now I am using foaf to define the owner for the ojitha.github.io web site.

@prefix dc:   <http://purl.org/dc/elements/1.1/> .
@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
@prefix :     <http://ojitha.github.io/blog/> .

:post  dc:title  "Learn SPARQL for RDF" .
:post dc:creator :owner .
:owner foaf:given "Ojitha".
:owner foaf:family "Kumanayaka" .

When you run the riot --formatted=rdfxml rdf_example2.ttl command, the resulting RDF/XML is:

<rdf:RDF
    xmlns="http://ojitha.github.io/blog/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="http://ojitha.github.io/blog/post">
    <dc:creator>
      <rdf:Description rdf:about="http://ojitha.github.io/blog/owner">
        <foaf:family>Kumanayaka</foaf:family>
        <foaf:given>Ojitha</foaf:given>
      </rdf:Description>
    </dc:creator>
    <dc:title>Learn SPARQL for RDF</dc:title>
  </rdf:Description>
</rdf:RDF>

You can depict this as:

Triples of the Data Model

Number Subject Predicate Object
1 http://ojitha.github.io/blog/post http://purl.org/dc/elements/1.1/creator http://ojitha.github.io/blog/owner
2 http://ojitha.github.io/blog/owner http://xmlns.com/foaf/0.1/family "Kumanayaka"
3 http://ojitha.github.io/blog/owner http://xmlns.com/foaf/0.1/given "Ojitha"
4 http://ojitha.github.io/blog/post http://purl.org/dc/elements/1.1/title "Learn SPARQL for RDF"

RDF family

You can find few serializations for RDF as explained in the RDF Primer11.

  • Turtle12 (used above) and TRIG
  • RDF/XML (used above)
  • JSON-LD
  • RDFa
  • N-Triples

The most fundamental concept is International Resource Identifier(IRI) which is a global identifier can be reused. This can appear in subject, predicate or object in the RDF triple.

RDF is consist of IRIs and Literals(not only string, but there are other datatypes). The blank node is special, which is possible to use only in subject and object in the RDF triple.

RDF statements together create multi-graph. To form a multi-graph, you have to use RDF vocabulary, which is based on RDF11-SCHEMA. For example, RDF Class, use to classify resources into categories. Another one is Property, which defines the relationship between the two Classes. In addition to that, you can have sub-classes and sub-properties as well. For more information, you can find in RDF Vocabularies13. You can find number of vocabularies such as FOAF14, Dublin Core15, schema.org16 and [SKOS]17.

Embedding Turtle in HTML

you can use <script> tag to embed Turtle document to the existing HTML page. For example then contents of RDF_example3.ttl can be embed as follows:

<script type="text/turtle">
  @prefix :     <http://ojitha.github.io/~ojitha/contact.rdf#> .
  @prefix foaf: <http://xmlns.com/foaf/0.1/> .
  @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

  :oj a foaf:Person ;
    foaf:givenname "Ojitha" ;
    foaf:family_name "Kumanayaka" ;
    foaf:homepage <http://ojitha.github.io/> ;
    foaf:mbox <mailto:ojithak@gmail.com> .
</script>

As well as see the complexity of the RDF/XML, when you execute riot --output=rdfxml rdf_example3.ttl

<rdf:RDF
    xmlns="http://ojitha.github.io/~ojitha/contact.rdf#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/" > 
  <rdf:Description rdf:about="http://ojitha.github.io/~ojitha/contact.rdf#oj">
    <foaf:mbox rdf:resource="mailto:ojithak@gmail.com"/>
    <foaf:homepage rdf:resource="http://ojitha.github.io/"/>
    <foaf:family_name>Kumanayaka</foaf:family_name>
    <foaf:givenname>Ojitha</foaf:givenname>
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
  </rdf:Description>
</rdf:RDF>

Here the JSON-LD code, when you execute the command: riot --output=json-ld rdf_example3.ttl

{
  "@id" : "http://ojitha.github.io/~ojitha/contact.rdf#oj",
  "@type" : "foaf:Person",
  "family_name" : "Kumanayaka",
  "givenname" : "Ojitha",
  "homepage" : "http://ojitha.github.io/",
  "mbox" : "mailto:ojithak@gmail.com",
  "@context" : {
    "mbox" : {
      "@id" : "http://xmlns.com/foaf/0.1/mbox",
      "@type" : "@id"
    },
    "homepage" : {
      "@id" : "http://xmlns.com/foaf/0.1/homepage",
      "@type" : "@id"
    },
    "family_name" : {
      "@id" : "http://xmlns.com/foaf/0.1/family_name"
    },
    "givenname" : {
      "@id" : "http://xmlns.com/foaf/0.1/givenname"
    },
    "@vocab" : "http://ojitha.github.io/~ojitha/contact.rdf#",
    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "foaf" : "http://xmlns.com/foaf/0.1/"
  }
}

you can use <script type="application/ld+json">...</script> to embed the above Json-LD to the HTML web page.

N-Triples is similar to this. You can use the [RDF Distiller][distiller] to generate.

<http://ojitha.github.io/~ojitha/contact.rdf#oj> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://ojitha.github.io/~ojitha/contact.rdf#oj> <http://xmlns.com/foaf/0.1/family_name> "Kumanayaka" .
<http://ojitha.github.io/~ojitha/contact.rdf#oj> <http://xmlns.com/foaf/0.1/givenname> "Ojitha" .
<http://ojitha.github.io/~ojitha/contact.rdf#oj> <http://xmlns.com/foaf/0.1/mbox> <mailto:ojithak@gmail.com> .
<http://ojitha.github.io/~ojitha/contact.rdf#oj> <http://xmlns.com/foaf/0.1/homepage> <http://ojitha.github.io/> .

you can create RDFa HTML file as follows using [RDF Distiller][distiller]

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE html>
<html prefix='foaf: http://xmlns.com/foaf/0.1/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#'
    xmlns='http://www.w3.org/1999/xhtml'>

<body>
    <div resource='http://ojitha.github.io/~ojitha/contact.rdf#oj' typeof='foaf:Person'>
        <span class='type'>foaf:Person</span>
        <div class='property'>
            <span class='label'>
                foaf:family_name
            </span>
            <span property='foaf:family_name'>Kumanayaka</span>
        </div>

        <div class='property'>
            <span class='label'>
                foaf:givenname
            </span>
            <span property='foaf:givenname'>Ojitha</span>
        </div>

        <div class='property'>
            <span class='label'>
                foaf:homepage
            </span>
            <a href='http://ojitha.github.io/' property='foaf:homepage'>http://ojitha.github.io/</a>
        </div>

        <div class='property'>
            <span class='label'>
                foaf:mbox
            </span>
            <a href='mailto:ojithak@gmail.com' property='foaf:mbox'>mailto:ojithak@gmail.com</a>
        </div>

    </div>

</body>

</html>

As shown in the above listings Turtle is the most easy formats to follow to write RDF documents.

Reference:

  1. RDF, https://www.w3.org/RDF/

  2. Semantic Web, https://www.w3.org/standards/semanticweb/
    [distiller]: http://rdf.greggkellogg.net/distiller?command=serialize

  3. OWL, https://www.w3.org/OWL/

  4. RDFS, https://www.w3.org/2001/sw/wiki/RDFS

  5. SPARQL, https://www.w3.org/TR/rdf-sparql-query/

  6. SPARQL 1.1 Query Language

  7. Reading and Writing RDF in Apache Jena

  8. SPARQL Tutorial - A First SPARQL Query

  9. RDF URI reference, https://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref

  10. Jekyll Front Matter (https://jekyllrb.com/docs/step-by-step/03-front-matter/)

  11. RDF 1.1 Primer, http://www.w3.org/TR/rdf11-primer/

  12. Turtle, https://www.w3.org/TR/turtle/

  13. RDF Vocabularies, https://www.w3.org/TR/rdf11-primer/#section-vocabulary

  14. FOAF, http://xmlns.com/foaf/spec/

  15. Dublin Core, http://dublincore.org/documents/dcmi-terms/

  16. schema.org

  17. SKOS, https://www.w3.org/2004/02/skos/

Comments

Popular posts from this blog

How To: GitHub projects in Spring Tool Suite

Spring 3 Part 7: Spring with Databases

Parse the namespace based XML using Python