Handling the Big Data tsunami of unstructured information

Unstructured information is everywhere. It is the largest and most rapidly growing part of Big Data, both within enterprises and on the web. There are a range of tools for handling unstructured information, but probably the most important are the...

Share

Unstructured information is everywhere. It is the largest and most rapidly growing part of Big Data, both within enterprises and on the web. There are a range of tools for handling unstructured information, but probably the most important are the tools that allow organisations and users to find organise and extract knowledge from this great tsunami of data.

Search and discovery applications create access to unstructured information. They also provide alternative access to structured data. This group of software applications and technologies analyses, tags, and searches multi-lingual text and rich media such as audio, video, and image files. This group also includes unified information access platforms, search engines, question-answering applications, categorisation/metadata tagging tools, categorisers and clustering engines, visualisation tools, filtering and alerting tools, and text analytics applications.

Advanced tools that aggregate unstructured information and then allow users to filter, mine and explore it are increasingly valuable commodities in organization awash in such data.  In fact, one of the dirty little secrets of search today is that it's the pieces around the basic search enginethat are the best differentiators among search products.   Gathering information and normalising it across multiple sources in real time is a major challenge. Once information has been gathered, indexed and key knowledge nuggets extracted, tools that enable users to explore a large collection easily can make the difference between users who can find and work with what they are looking for and those who won't be able to.  Increasingly organisations are turning towards these types of tools.

At the same time, IDC's research is showing increasing trends such as:

  • Continuing emphasis and migration to open source technologies, especially Lucene- and Solr-based offerings

  • Continuing and increasing popularity of cloud and SaaS licensing arrangements, particularly in outward-facing search applications

  • Rapid development of InfoApps and

  • Emergence of unified information access platforms

Unified information access platforms form the foundation for a new category of software applications that IDC calls InfoApps. These applications are typically built on a unified information access and management platform to create comfortable easy to use applications that:

  • Are tailored to fit a specific task or workflow

  • Combine multiple technologies and tools, particularly search, collaboration, authoring tools, content management, and analytics

  • Integrate information from multiple sources

  • Incorporate domain- and organisation-specific term lists, taxonomies, and knowledge bases

  • Hide technical complexity below an easy, compelling UI that may have dashboard-like qualities

InfoApps provide a unified or customised view of a particular domain, subject, or topic for single or multiple audiences. LinkedIn and UrbanSpoon offer good examples of InfoApps. LinkedIn's IPAD application combines search, current awareness and alerts in a simple and pleasant interface. The UrbanSpoon application combines restaurant listings, menus, reviews, location search and categorisation to make a simple yet complete interface that helps users find good places to eat.

The growth of Big Data and unstructured information will increase the visibility and value of these types of tools and applications. IDC expects this to be an increasingly important category of software and we expect it to become a basic building block in dealing with Big Data today and in the future.

Posted by Dave Schubmehl, Research Manager, IDC
Enhanced by Zemanta

Find your next job with computerworld UK jobs