Structured Data for a Smarter Internet

Ever done a search with your favorite search engine and notice that you suddenly have a wealth of information on the results page, and you haven’t even clicked on a link yet! For example if I search for “pizza recipe” in Google, I get this helpful bit of information right here at the top?

google-snippetSuddenly I have a pizza recipe right at my fingertips. The red arrow is showing the cook time (how awesome is that?), the green arrow is providing us with related information, and the blue arrow is of course displaying the author’s info and giving us a chance to read more by them. How does the search engine understand this query? This magic is accomplished by the use of what is called “semantic markup.”

Without getting into too technical of an explanation, basically this is code that’s included inside an HTML document that provides search engines with metadata to extract meaningful information from the resources on your page (images, videos, links, etc). There are a few different markup standards  and some are more compatible with the others than others. While they all are attempting to solve the same problem, namely giving context to web resources, as Schema.org continues to be developed, much to the chagrin to some in the web community, it may likely grow to have large market acceptance.

There are several different semantic markup standards, but  I’ll briefly describe the most popular and show a piece of each’s markup:

RDFa (Resource Description Framework in Attributes): This was one of the first structured data standards and reached recommendation status in 2008.  RDFa is currently maintained by the W3C and it currently the most widely used.

<body about="http://example.org/john-d/#me">
    <h1>John's Home Page</h1>
    <p>My name is <span property="foaf:nick">John D</span> and I like
      <a href="http://www.neubauten.org/" rel="foaf:interest"
        xml:lang="de">Einstürzende Neubauten</a>.
    </p>
    <p>
      My <span rel="foaf:interest" resource="urn:ISBN:0752820907">favorite
      book is the inspiring <span about="urn:ISBN:0752820907"><cite
      property="dc:title">Weaving the Web</cite> by
      <span property="dc:creator">Tim Berners-Lee</span></span></span>.
    </p>
  </body>

Schema.org: A standard microdata format that was created between the tech giants, Google, Yahoo!, and Microsoft. As mentioned previously it may likely became the industry standard in the future if only because the search engines want to use their own proprietary markup.

<div itemscope itemtype="http://schema.org/Movie">
  <h1 itemprop="name">Avatar</h1>
  <div itemprop="director" itemscope itemtype="http://schema.org/Person">
  Director: <span itemprop="name">James Cameron</span> 
(born <time itemprop="birthDate" datetime="1954-08-16">August 16, 1954</time>)
  </div>
  <span itemprop="genre">Science fiction</span>
  <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
</div>

JSON-LD: (JavaScript Object Notation for Linked Data): Also developed by the good folks at W3C. This markup is perfect for REST based web services and unstructured databases. Like the other markup, JSON is very easy for humans to read, but even more so in my opinion because all of the data is entered in block at the top of the HTML file instead of embedding it throughout a document.

** Note ** The original intention for the development JSON-LD was NOT to help create the “semantic web” per se, but to improve the readability of code between web app developers using APIs.

{
  "@context": "http://json-ld.org/contexts/person.jsonld",
  "@id": "http://dbpedia.org/resource/John_Lennon",
  "name": "John Lennon",
  "born": "1940-10-09",
  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
}

Learn More

http://rdfa.info/
http://schema.org/
http://json-ld.org/