Opening up Semantic Search to Ordinary Users

Share

As the joke goes, this is the year of GNU/Linux on the desktop – just like last year, and the year before that. Similarly, semantic search – the idea that systems can apply “intelligence” to searches to find more relevant information – is an evergreen idea that has been around for what seems like forever.

It was given a big boost when Sir Tim Berners-Lee and others started promoting what they called the semantic Web, in 2001. But, like the GNU/Linux desktop, it has stubbornly failed to materialise.

Could open source help realise this long-held dream? That's the question a project called the “Interactive Knowledge Stack” (IKS), funded by the European Commission to the tune of 6.57 Euros, is trying to answer.

Here's how IKS describes itself and its aims:

A Semantic-based Open Source Platform for Small to Medium CMS Providers

IKS will raise the semantic capability of European software houses to develop intelligent content management solutions for their customers.

What is IKS?

IKS is an integrating project targeted at the hundreds of SMEs in Europe, which are providing technology platforms for content and knowledge management to thousands of end user organisations. Downstream, hundred-thousands of corporate end users and millions of content consumers are affected by the quality of service provided through these platforms. The majority of these platforms lack the capability for making use of semantic web enabled, intelligent content, and therefore, lack the capacity for users to interact with the content at the user‘s knowledge level!

The major technological result of the project will be the 'Interactive Knowledge Stack', a layered set of software components and specifications which will make traditional content management platforms capable of dealing with the future 'Semantic Web'.

As this makes clear, this is an *open source* project, and that has a number of important consequences. It means that several of the open source CMS vendors in Europe are able to participate. Indeed, as a page on the project explains:

We are looking for CMS vendors as early adopters who want to shape IKS with us and who meet the following requirements:

CMS based on open source

Firm is a small to medium enterprise

Interested in including some IKS ideas about semantic CMS in product

Because the project is open source, the work is being conducted in public, not in some secretive private consortium. This means it's possible to enjoy the benefits of one of the key features of open source: its modularity. One of the reasons that open source has gone so far, so fast, is that it's highly modular, with clean interfaces between different elements. This allows many different groups to work independently on different aspects, according to their interests and strengths, and for everything to fit together at the end – perfect for this kind of pan-European collaboration.

Interestingly, this feature arose because of Linus' isolation in Finland, and the distributed nature of kernel development right from the start. As he told me in 1996:

“The way the whole system is set up, which also happened kind of by itself, is that if you change one driver it's really so localised that it should never impact anything else. It has to be so, when there's people all over the world doing this. There's no weekly brainstorming session where everybody gets together and discusses things.”

Another good example of this modularisation (at a higher level) is the LAMP-based CMS stack, and the IKS site draws an interesting parallel between that and its own modular approach. The IKS layers move up from the operating system at the bottom layer, through things like knowledge representation and presentation, to user-centred interaction with knowledge objects at the top.

The IKS project began at the beginning of this year, and will run for four years. A workshop was held in Rome last week, and I went along (as a guest of the project) to see how things were shaping up after the first year.

Alongside some general presentations on the basic ideas behind the IKS stack, there were a series of short demonstrations of work from the projects given by eleven groups, drawn from open source, proprietary and academic members of the collaboration:

Deri

Trialox

Kiwi

Yahoo Research

Salsadev

Scribo/Nuxeo

Zemanta

Trezorix

Sourcesense

Semantic Technology Lab

Semantic MediaWiki

This was probably the most interesting part of the workshop, because it showed real-life attempts to realise the semantic search dream. The value of the demos was enhanced by some commentary from a small panel of judges, drawn from the projects.

I was impressed by the frankness and justness of the criticism, notably that from Wernher Behrendt, who works at the non-profit research organisation Salzburg Research Forschungsgesellschaft, and the overall leader of the IKS project. Again, I think this was a typical open source virtue, whereby you are judged and respected on the quality of what you do, not on the nominal hierarchies implicit in who you are.

Most of the judges' comments centred on the fact that the work being presented was insufficiently geared towards making the benefits of semantic search immediately available to end users: too often, some kind of programming input would be needed – hardly practical for the vast majority of people, who struggle to formulate even a Google query. No wonder, then, that the winner of an informal competition for the best semantic search demo during the workshop was Zemanta – also the project that I found most interesting.

This is how Zemanta describes its “blogging assistant”:

Zemanta is a tool that looks over your shoulder while you blog and gives you tips and advice, suggests related content and pictures and makes sure your posts get promoted as they deserve to be. We at Zemanta are thinking hard to help make blogging easier for you. We're engineering better creative tools to help you get the most out of your blogging time.

Basically, it analyses text you enter into a blog post, finds relevant information and images in real time as you're typing, and places these alongside the post. You can either drag and drop images, say (which come with licensing information), or click on links for further details. There's an online demo, as well as a free add-on for Firefox, a bookmarklet for Chrome and Safari, and server-side modules for Wordpress, Movable Type, Drupal and Joomla.

Zemanta was the clear winner because its software really does start to make semantic search work by placing it *discreetly* at the service of the user. The other demos were interesting in an academic sort of way, but it was hard to see many of them being taken up by huge numbers of people.

One of the key things about the IKS project is that one of its end-products will be an open source reference implementation of its stack - in other words, something concrete that CMS vendors and projects will be able to build on, not just theoretical results. But as Zemanta's work and popularity shows, alongside this technical framework, it's vitally important to keep in mind the user in all this. Paradoxically, semantic search will only ever really take off once it has receded so far into the fabric of computing that people aren't even aware it's there.

Follow me @glynmoody on Twitter or identi.ca.

Find your next job with computerworld UK jobs