Structured data meaning

Google uses structured data such as RDFa, Microdata or JSON-LD to understand the contents of the page. RDFa is based on the RDF and HTML5 extension. JOSN-LD can be created from the same RDF Turtle. Learn RDF Turtle to create structured data for SEO.

RDFa1 (Resource Description Framework in Attributes) is attribute-based, for example, a, href in the HTML/XHTML. RDFa does not affect the HTML code which appears in the HTML. If you violate General structured data guidelines2; as a result, your page ranked lower, therefore, it is important to know how to use structured data properly. You can use Google Rich Result test3 to test and verify your page. Valid structured data can be eligible to visible in graphical search results. RDFa can have the number of vocabularies (as I show in this blog), but currently, Google support only schema.org4.

⚠️ Currently google permits only to use the schema.org vocabulary for the structured data.

Google has recommended to use JSON-LD. Turtle is an easy format to write structured data. You can use that Turtle file to create RDFXML which is the standard format for the RDF. From the RDFXML, can create JSON-LD.

Microdata5 in use because it is less expressive than RDFa but easy to learn. Microdata main disadvantage is poor support of internationalization. Therefore RDFa and JSON-LD are preferable for structured data.

GitHub



RDF

To represent facts on the web, we have to use RDF. As shown in the following pyramid, RDF Schema is the extension to the XML schema with relationships. On top of RDF, you can find the OWL (Web Ontology Language) where expressivity dominate the web.

As explained in the [Jena to learn RDF and SPARQL][myblog1], RDF-Triple is based on Subject, Predicate and Object. Subject and Predicate can only be URI type, but Object can be either URI or literal.

Typed literals can be expressed via XML schema datatypes as follows:

"bla bla"^^<http://www.w3.org/2001/XMLSchema#string>

and language tags denote the (natural) language of the text as

"bla bla"@en

Above shows the the the word is in the English language.

☑️ As explained in the [Jena to learn RDF and SPARQL][myblog1], here also we are using Turtle (Terse RDF Triple Language) for the examples.

For example, form the following HTML+RDFa

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE html>
<html prefix='foaf: http://xmlns.com/foaf/0.1/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#'
    xmlns='http://www.w3.org/1999/xhtml'>

   <body resource='http://ojitha.github.io/contact.rdf#oj' typeof='foaf:Person'>
      <span property='foaf:family_name'>Kumanayaka</span>        
      <span property='foaf:givenname'>Ojitha</span>
      <a href='http://ojitha.github.io/' property='foaf:homepage'>http://ojitha.github.io/</a>
      <a href='mailto:ojithak@gmail.com' property='foaf:mbox'>mailto:ojithak@gmail.com</a>
   </body>
  
</html>

this is one of the simplest Turtle you can extract:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://ojitha.github.io/contact.rdf#>.

:oj
   rdf:type foaf:Person;
   foaf:family_name "Kumanayaka";
   foaf:givenname "Ojitha";
   foaf:homepage <http://ojitha.github.io/>;
   foaf:mbox <mailto:ojithak@gmail.com> .

There is only one statement in the above Turtel. Home page, mbox are URIs, but the given name and the title are literals. The base prefix is given in the line# 3.

👍🏼 This list of predefined prefixes have been defined for RDFa. Initial context defines a set of default prefixes6 maintained by W3C. You don't need to declare the prefixes for the defaults. However, in this post, we have used the prefixes to explain the concepts.

You can understand the data types from the RDF/XML for the above Turtle:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/" 
    xml:base="http://ojitha.github.io/contact.rdf">
 
  <rdf:Description rdf:about="#oj">
    <foaf:mbox rdf:resource="mailto:ojithak@gmail.com"/>
    <foaf:homepage rdf:resource="http://ojitha.github.io/"/>
    <foaf:givenname>Ojitha</foaf:givenname>
    <foaf:family_name>Kumanayaka</foaf:family_name>
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
  </rdf:Description>
</rdf:RDF>

I already explained the tools to use Apache [Jena to learn RDF and SPARQL][myblog1].

RDFa Lite

RDFa can be used in the header as well as the body of the HTML page. There are five simple attributes belongs to RDFa Lite7:

  1. vocab: references structured data vocabularies
  2. typeof
  3. property
  4. resource
  5. prefix

For example, following HTML5 RDFa can be translated to Turtle

<p>
    Blog:
    <span property="http://purl.org/dc/terms/title">Apache Jena to learn RDF and SPARQL</span>
    Created Date:
    <span property="http://purl.org/dc/terms/created">2020-08-14</span>
</p>

This will generate the following Trutle

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:ns1="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="">
    <ns1:title>Apache Jena to learn RDF and SPARQL</ns1:title>
    <ns1:created>2020-08-14</ns1:created>
  </rdf:Description>
</rdf:RDF>

Here the N3:

Number Subject Predicate Object
1 http://www.w3.org/RDF/Validator/run/1597839703450 http://purl.org/dc/terms/title "Apache Jena to learn RDF and SPARQL"
2 http://www.w3.org/RDF/Validator/run/1597839703450 http://purl.org/dc/terms/created "2020-08-14"

When I ran the command riot --base=http://ojitha.blogspot.com/blog/post -syntax=RDFXML --out=Turtle rdf_example5.rdf

@prefix ns1:   <http://purl.org/dc/terms/> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://ojitha.blogspot.com/blog/post>
        ns1:title    "Apache Jena to learn RDF and SPARQL" ;
        ns1:created  "2020-08-14" .

The resource is the attribute to create a context (There is an alternative about to do the some thing). For example,

<div resource="http://ojitha.github.io/blog">
    <p>
        Blog:
        <span property="http://purl.org/dc/terms/title">Apache Jena to learn RDF and SPARQL</span>
        Created Date:
        <span property="http://purl.org/dc/terms/created">2020-08-14</span>
    </p>
</div>

This will add <rdf:Description rdf:about="http://ojitha.github.io/blog"> to the RDFXML:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:ns1="http://purl.org/dc/terms/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="http://ojitha.github.io/blog">
    <ns1:title>Apache Jena to learn RDF and SPARQL</ns1:title>
    <ns1:created>2020-08-14</ns1:created>
  </rdf:Description>
</rdf:RDF>

The vocab attribute reference structured data vocabularies. You can create multiple items per page as follows:

<body vocab=x resource=r>
  <p property=p1> v1 </p>
  <p property=p2> v2 </p>
  <p vocab=y> <a property=p3 ...> v3 </a></p>
   ... 
</body>

As shown in the following graph, vocabulary y will override the vocabulary x when defining the nested vocabulary.

graph LR
      r((r)) -->|x:p1|v1[v1]
      r((r)) -->|x:p2|v2[v2]
      r((r)) -->|y:p3|v3[v3]

Another attribute is resource to specify the context. You can have multiple contexts within the same page:

<body vocab=x>
  
  <div resource=r1>
    <p property=p1> v1 </p>
    <p property=p2> v2 </p>
  </div>
  
  <div resource=r2>
    <p property=p1> v3 </p>
    <p property=p2> v4 </p>    
  </div>
</body>  

This can be depicted as:

graph LR
      r1((r1)) -->|x:p1|v1[v1]
      r1((r1)) -->|x:p2|v2[v2] 
      r2((r2)) -->|x:p1|v3[v3]
      r2((r2)) -->|x:p2|v4[v4]

Parent nodes are the resources; therefore, there are two root nodes for two resources r1 and r2.

Blank Node

If resource attribute is not specified (mean root missing), the blank node will be created.

For example:

<div vocab="http://xmlns.com/foaf/0.1/" typeof="Person"><p>
  <p>
    <span property="name">Ojitha Kumanayaka</span>,
    Email: <a property="mbox" href="mailto:ojithak@gmail.com">ojithak@gmail.com</a>,
    Phone: <a property="phone" href="tel:+1-617-234-5678">+1-617-234-5678</a>
  </p>
</div>

Using RDF Distiller8, you can find the RDF/XML as follows

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:ns1="http://xmlns.com/foaf/0.1/" xmlns:ns2="http://www.w3.org/ns/rdfa#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:nodeID="N05502b8a8a5d41d6b6da805fea8d849b">
    <ns1:phone rdf:resource="tel:+1-617-234-5678" />
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" />
    <ns1:name>Ojitha Kumanayaka</ns1:name>
    <ns1:mbox rdf:resource="mailto:ojithak@gmail.com" />
  </rdf:Description>
  <rdf:Description rdf:about="">
    <ns2:usesVocabulary rdf:resource="http://xmlns.com/foaf/0.1/" />
  </rdf:Description>
</rdf:RDF>

As shown in the above XML, the rdf:nodeID="N05502b8a8a5d41d6b6da805fea8d849b" is the blank node. The RDF Turtle is similar to the following(rdf_example6.ttl):

@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://ojitha.github.io/blog/>
   rdfa:usesVocabulary foaf: .
_:1 
   rdf:type foaf:Person;
   foaf:name "Ojitha Kumanayaka";
   foaf:mbox <mailto:ojithak@gmail.com>;
   foaf:phone <tel:+1-617-234-5678> .

In line# 7, you can find the blank node.

You can use the squire brackets as shown in line# 9 and 12:

@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://ojitha.github.io/blog/>.

<http://ojitha.github.io/blog/>
   rdfa:usesVocabulary foaf: .

[  rdf:type foaf:Person;
   foaf:name "Ojitha Kumanayaka";
   foaf:mbox <mailto:ojithak@gmail.com>;
   foaf:phone <tel:+1-617-234-5678>] .

The blank node is shown as a green circle in the above diagram.

You can use blank nodes to create aggregations. Aggregation is nonthing more than aggregating number of blank nodes as shown in the following HTML+RDFa code:

<... vocab="http://xmlns.com/foaf/0.1/">
  ... define parent element...
  <ul>
    <li typeof=T> ... </li>
    <li typeof=T> ... </li>
    <li typeof=T> ... </li>
  </ul>
</...>  

Above can be depicted as:

graph LR
b1((:_1)) -->|typeof|T
b2((:_2)) -->|typeof|T
b3((:_3)) -->|typeof|T

The advantage of the above structure is that you can create a relationship from the parent element to these three blank elements to explain the different state that element can have.

RDFa property copying can be used to resue the same element.

For example:

<body vocab="http://purl.org/dc/terms/">
  
  <div resource=r1>
  	<p property=.....>...</p>
    <link property="rdfa:copy" href="#r3"/>
  </div>

  <div resource=r2>
  	<p property=.....>...</p>
    <link property="rdfa:copy" href="#r3"/>
  </div>
  
  <div resource="#r3" typeof="rdfa:Pattern">
    <p vocab="...">
      
    </p>
  </div>
</body>

In the above code, r3 will be replaced in the place of link in the r1 and r2.

In the following example, the use of #me seems to be similar but to distribute (not for the copying):

<div vocab="http://purl.org/dc/terms/">
  <div resource=blog1>
    <... property="creator" resource="#me"></...>
    ...
  </div>
  ...
  <div resource=blog2>
    <... property="creator" resource="#me"></...>
    ...
  </div>
  ...
</div> 
</div> 

<div vocab="http://xmlns.com/foaf/0.1/" resource="#me" typeof="Person">
  ...
</div>

This case, resource attribute at line# 3 indicates the target of the relation. This allows distributing the various parts of the structured data such as #me. This is same as the use of href int he web page but not clickable as well as can be used in any HTML element (not only anchor).

Mixed-use of vocabularies

It is very common to use multiple vocabularies in the same HTML element. It is hard to use full URI of the vocabulary in the element; therefore, you can prefix the vocabulary as follows in the HTML element:

<... prefix="dc: http://purl.org/dc/terms/ schema: http://schema.org/" ></..> 

You can mix use the vocab and prefix in the elements.

👍🏼 Better to place all the prefixes in the <body> element.

Still, there is a problem: if multiple vocabularies has the same property to use. Then you have to create separate HTML for each vocabulary which is not accepted in HTML. To avoid this conflict, in the same element, you can use the multiple vocabularies as follows:

<body prefix="vo1:http://... vo2:http://...">
  <span property="vo1:prop vo2:prop" resource="#me">...</span> 
</body>

📝 Both property and typeof supports list of values.

There is a way to overried the property value using the content:

<span property="pf:prop" content="v1">v2</span>

In this case, RDFa processor uses the v1 as a value for the prop instead of v2. This approach plays a major role when it comes to meta9 tag.

Data types

The datatype can be used to explicitly define the data type of the property using http://www.w3.org/2001/XMLSchema, for example.

<... property="dc:givenname" datatype="xsd:string">Ojitha</...>

Alternatives

The about can be used as an alternative to the resource when setting the context. The only difference is about is only used to create context.

The rel can be used to avoid the repeating property within the context. For example

<ul>
  <li property=prop resource=r1 typeof=T>v1</li>
  <li property=prop resource=r2 typeof=T>v2</li>
  <li property=prop resource=r1 typeof=T>v3</li>
  ...
</ul>

You can rewrite the above code typically.

<ul rel=prop>
  <li  resource=r1 typeof=T>v1</li>
  <li  resource=r2 typeof=T>v2</li>
  <li  resource=r1 typeof=T>v3</li>
  ...
</ul>

As shown in the above code line# 1, rel replace the repeating property=prop.

⚠️ The property is always safer to use than the rel.

This blog post is completely based on the RDFa 1.1 Primer10 and tried to create note in abstract level. I have used the real-time RDFa 1.1 editor11 to extract Turtle from the HTML, and used the EasyRDF12 tool to translate Turtle to RDF/XML format.

  1. RDFa Core, https://www.w3.org/TR/rdfa-primer/#bib-rdfa-core

  2. Understand how structured data works  |  Search for Developers, https://developers.google.com/search/docs/guides/intro-structured-data#structured-data-format

  3. Rich Results Test

  4. schema.org

  5. HTML Microdata, https://www.w3.org/TR/microdata/

  6. RDFa Core Initial Context, https://www.w3.org/2011/rdfa-context/rdfa-1.1

  7. RDFa Lite, https://www.w3.org/TR/rdfa-lite/

  8. RDFa 1.1 Distiller and Parser, https://www.w3.org/2012/pyRdfa/Overview.html
    [myblog1]: https://ojitha.blogspot.com/2020/08/apache-jena-to-learn-rdf-and-sparql.html

  9. The Open Graph protocol, https://ogp.me/

  10. RDFa 1.1 Primer - Third Edition, https://www.w3.org/TR/rdfa-primer/

  11. real-time RDFa 1.1 editor, http://rdfa.info/play

  12. EasyRDF, https://www.easyrdf.org/converter

Comments

Popular posts from this blog

How To: GitHub projects in Spring Tool Suite

Spring 3 Part 7: Spring with Databases

Parse the namespace based XML using Python