We have just published a report about a new tool that semi-automatically generates taxonomies by using text analytics and linked data from the Semantic Web.
Pingar, a company headquartered in New Zealand, has developed a technology that allows the creation of one or more custom-built taxonomies in a more or less automatic fashion by reading documents, extracting key terms and phrases and then organising those key terms and phrases with the help of semantic Web data coming from sources like Wikipedia, Freebase and the hundreds of other authoritative Web services.
Pingar's taxonomy generation services use state of the art text analytics and semantic Web technologies based on authoritative information sources to automate and streamline the process of generating custom taxonomies which organisations can then use to improve search, retrieval and overall information access and re-use.
Pingar's Taxonomy Generator makes heavy use of Linked Data, which is an emerging Semantic Web capability. According to linkeddata.org, the definition of Linked Data is as follows:
"The Web enables us to link related documents. Similarly it enables us to link related data. The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. Key technologies that support Linked Data are URIs (a generic means to identify entities or concepts in the world), HTTP (a simple yet universal mechanism for retrieving resources, or descriptions of resources), and RDF (a generic graph-based data model with which to structure and link data that describes things in the world)."
Linked Data provides a framework and methodology for organising and sharing information between organisations. It provides a mechanism for organisations to identify and authenticate pieces of information using an open source data model. There are linked data sets for hundreds of different types of data and more are emerging every day. The diagram below shows the current state of Linked Data resources as of 2011.
Applications using linked data can provide authentication and relationship extraction at a fraction of the cost that it would take a human to do, even considering the lower labor costs of off-shoring the work as has been done for several years now. While the Linked Data concept has been around for a while, it is increasingly being used in real world applications. The BBC is using linked data for automatic tagging of sport stories, storing the semantic annotations to improve news accessibility and findability.
Increasingly, the Semantic Web and Linked Data are being used for real world production applications. Facebook is now using Wikipedia and DBPedia to expand its understanding of the world around us.
Graph Search queries like "Find pictures of my friends at national parks" require that Facebook understand what national parks are, what their names are, and where they are located throughout the world before you can even start sorting through the pictures in your albums.
Google's Knowledge Graph uses the Semantic Web and linked data as a starting point and then builds on top of it with more than 500 million objects and 3.5 billion facts about and relationships between these objects. Vision Systems & Technology Inc., a subsidiary of SAS Inc., developed the Luminary prototype system.
The Luminary system extracts information from news articles on public websites, RSS feeds, or documents in Word or PDF format. It runs through several automatic analysis steps, and pushes the results of the text analysis into a semantic wiki.
Increasingly, as we discuss the concept of unified information access and the need to link disparate information facts and nuggets together to create a holistic view of a topic, customer or event, the Semantic Web and linked data are seen as necessary tools to help solidify and authenticate these linkages.
I believe that we are going to see more and more products and services that rely on the Semantic Web and linked data in the future to help us understand and manage the linkages and relationships between people, places, events and things in order to make better decisions, ranging from medical diagnosis and decision making to product research and development.
What are your thoughts about this?
Posted by Dave Schubmehl, Research Director IDC