?If you've been following the Jeopardy-IBM Watson faceoff this week, then you have witnessed a breakthrough in analytics and in new architectures for mining and analysing diverse types of information in a single application. Watson and its successors may usher in a new approach to computing, combining as it does, so many disparate techniques to create a "thinking" machine. IBM has combined deep NLP with machine learning, a voting algorithm, a method of interpreting the questions and assessing them by formulating parallel hypotheses, and Hadoop and UIMA for preprocessing, as well as the usual search, fuzzy matching software and of course an in-memory caching system to save time in retrieval. To me, the strength of this system is the combination of all of these, and it is remarkable in that it doesn't rely on just one. In Watson, the whole is greater than the sum of its parts.
Watson also tackles some of the greatest problems in understanding text: It's open domain, not specific to a particular industry or topic areas.
- It parses questions well, one of the biggest failings in search and text analytics systems
- It gathers evidence for each of its hypotheses, and offers a confidence level, thereby tackling the question of how much to trust a computer-based answer
- It searches and analyzes text in multiple formats from multiple sources, without taxonomies, schemas or controlled vocabularies
- It scales
- It's blindingly fast
- It handles language in all its ambiguity, including puns, metaphors, jokes and sly allusions.
IBM's Watson is a great example of harnessing the essence of search architecture, and combining it with other non-database technologies in order to tackle problems like answering questions not only for Jeopardy, but also for detecting patterns and suggesting solutions in healthcare, terrorism detection, fraud detection, or reputation monitoring and risk mitigation. Decision support has been a goal for decades, and this kind of system will make it possible.