Semantic Analytics? One step at a time


Way back in 2001, the founder of the Web, Tim Berners-Lee published his vision of the future Web. (In this book he states that the Web is designed for being read by humans and not for being understood by machines.

However, the “Semantic Web” adds a consistent layer of metadata to all Web resources so that intelligent agents are able to understand the meaning of data and thus infer new meaning across distributed data sources. The Semantic Web turns individual Web pages into one relational database that can be queried to fulfill almost any analytics inspired information need.

The technology stack is here - e.g. Web Ontology Language (OWL) and RDF Schema (RDFS) - to represent data and standards to query the Semantic Web like SPARQL are also established. But where is the Semantic Web?

Where are those semantic websites and intelligent analytics agents? While the Web today is still not ready for large scale semantic analytics, several starting points are emerging:

  1. Brute forcing semantics out of regular webpages and mashing this information up at a central location. This approach uses regularity patterns of web pages: for example, knowing that a given table on a page holds the stock quotes adds meaning to otherwise unstructured numbers without requiring explicit semantic annotation.
  2. Semantically enhanced knowledge bases like DBpedia come into existence. DBpedia is “a community effort to extract structured information from Wikipedia and to make this information available on the Web [and which] allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data”. The DBpedia on its own can answer queries like “Find me all countries with a GDP of more than x and less than y inhabitants” and becomes even more powerful when linked to other semantic data sources (cf. here).
  3. On the tools front, Semantic Wikis might become the enabling technology for the corporate Semantic Web since they have the power to transform typically unstructured user generated content into structured content. The rationale behind this idea is that if users are willing to contribute a Wiki article, then they will definitely be willing to add some structured metadata. For more information on Semantic Wikis see the (regular and “unsemantic”) Wikipedia entry which also contains a product comparison.

The Web of meaning and reasoning has not arrived on a large scale yet, but that does not mean that it cannot work. Critical voices claim that adoption [of the “Semantic Web”?]on a wide scale will never happen.

To those, Tom Scott, a BBC manager involved with BBCs Earth’s Semantic Web project, adequately responds: “It’s a bit like saying car ownership will never get there, because not everyone has a car. The reality is that the future is uneven and some people will get there sooner than others”.

Find your next job with computerworld UK jobs