For most areas of enterprise open source, there is one clear leader. The business intelligence (BI) space is unusual in that there are two strong players: JasperSoft and Pentaho.
Here Doug Moran, one of Pentaho's founders, offers a fascinating description of how the company was created in part by bringing on board the chief architects of several other open source projects, offers not one but two explanations of the Pentaho name, says more on the recent adoption of the GNU GPLv2 for its BI platform, and explains why – of course – he thinks Pentaho's solutions are the best.
GM: What's the background to the company's formation?
DM: Pentaho was started in October 2004 by Richard Daley, Adrian Marshall, James Dixon, Marc Batchelor and myself. We have been working together in business intelligence (BI) since the 90s and Pentaho is our third startup.
We started AppSource in 1995 and created "Wired for OLAP", a desktop visualization tool for multi-dimensional databases. AppSource was bought by Arbor Software who merged with Hyperion and was eventually absorbed by Oracle. In 2001 we started Keyola and developed "Smart Notification", a server and framework for proactive reporting, analysis and information delivery. Lawson Software bought Keyola. After successfully building desktop applications and a server-based framework we set out to create a complete and integrated BI suite using the power of the open source development model
GM: Where does the name come from?
DM: One of the hardest parts of doing a startup is coming up with a good name. I think it's actually harder than developing the product. Obviously, the name has to be distinctive and available but we also wanted to make sure it didn't tie us into a specific area of BI or technology. We wanted it to be short, easy to remember, have no implied meaning and generate unique results on a Google search. We settled on Pentaho, pronounced like "Lake Tahoe," with the "penta" for five guys and "ho" because it just sounded right.
If you like, there is a better story about derivation of the name in our FAQ, which demonstrates that our ability to be creative is much greater than our ability to name a company.
GM: What business problem are you trying to solve?
DM: In the simplest sense, business intelligence is about providing better access to information for business users. It’s a need that spans nearly every industry, and companies of all sizes. We have customers who use our software to analyse historical data to set prices for tickets, optimise customer loyalty programs, improve their employee training programs, or to measure the effectiveness of their business partners. As you can see on this page about customers, there’s a lot of diversity of applications and our customers find new ways to use us every day.
GM: Could you say a little about the individual projects that go to make up the complete Pentaho offering – what they are, what exactly they do and their current licences?
DM: Building an entire suite of BI tools along with a platform framework from scratch seemed like a lot of work for what was a small team originally. With all of the very good and mature BI applications in open source, it made sense for Pentaho to be an open source project that leverages existing open source technology. Our BI platform is an integration point for existing applications and provides common capabilities like security, scheduling, auditing and workflow. The platform core is GPL v2 and the plugins, presentation layer and client tools are Mozilla MPL.
As we worked with existing open source BI applications and created good working relationships with their teams, it began to make sense for us to join forces to share vision and resources. The first open source project to join Pentaho was Mondrian and its chief architect, Julian Hyde. Mondrian is an OLAP (online analytical processing) server that enables you to interactively analyse very large datasets stored in SQL databases. It gives you the ability to do dimensional exploration of data, for example analysing sales by product line, by region, by time period without having to use SQL. Mondrian was started in 2001 and the current licence is CPL - Common Public licence.
The JFreeReport project and architect Thomas Morgner joined Pentaho next. JFreeReport, now called Pentaho Reporting, is an embeddable Java based engine for generating rich and sophisticated report content from different sources of data. It is used as the report generation engine for for both our production and ad-hoc reporting products. JFreeReport was started in 2002 and is released under the GNU LGPL licence.
In 2006, Matt Casters and Kettle, his open source data integration project, joined Pentaho. Kettle is a powerful Extract, Transform, and Load (ETL) tool with Enterprise Information Integration (EII) capabilities. Matt had been working on Kettle for 3 or 4 years before going open source in 2005. Kettle is released under the GNU LGPL licence.
That latest project to come under the Pentaho umbrella is the Weka open source data mining project developed by the University of Waikato in New Zealand. Data Mining is the process of running data through sophisticated algorithms to uncover meaningful patterns and correlations that may otherwise be hidden. Mark Hall, one of the core Weka developers has also joined the Pentaho team. Weka, created in 1993 and available on SourceForge since 2000, is licensed under the GNU GPL.
GM: What other (external) open source projects do you build on/work with?
DM: We work with and build on many open source projects. Too many to mention but here are some if the larger ones:
* The Apache Software Foundation - commons libraries, FOP, batik, log4j
* JBoss projects including AS, Hibernate, Portal, JBPM
* HSQLDB database engine for demo sample data
* OpenSymphony Quartz scheduler engine
* Eclipse IDE, EMF, GEF and BIRT
* IText - PDF generation
* MySQL - Database engine for demo installer sample data
* JFreeChart - Pentaho reporting chart engine
* Metastuff dom4j XML, XPath, XSLT parser
* Mozilla Rhino Javascript processor
* JPivot - OLAP slice and dice jsp tag library
* Atlassian JIRA and Confluence - Pentaho case tracking and wiki
* Subversion - source code control system for Pentaho
GM: What are the advantages of an open source business intelligence solution over proprietary offerings?
DM: Many open source customers of Pentaho and other leading open source companies cite open source benefits like lower total cost of ownership, more flexibility to extend the products, reduced vendor lock-in, and aggressive support of open standards.
From a BI perspective, almost every BI solution requires some kind of interfacing with existing systems. Issues arise in every area from getting the right data, using the right business rules and creating output in a usable or useful format. With an open source BI solution, there is always the possibility of making a modification to code, creating a new plug-in or even writing an extension to solve a tough problem.
I have seen many engagements in the proprietary world that ended with a process where you have to create a batch process or have someone manually run a report from a legacy system, save it to a file system, parse it and load it into the BI tool because there is some incompatibility that needed to be overcome externally to the software.
Another tremendous advantage with open source in general is the speed at which we get user feedback. By allowing users to participate in the development process, and releasing "early and often" we get critical feedback very early in the process.
With a proprietary model, there is a deliberate structure to isolate developers from end-users. Product Management talks to the users about requirements, support talks to users about problems, this information filters down to the developers in the form of bug reports and requirements. Changes get scheduled for a release, code gets written over a period of time until all the changes are in, the release is made, maybe a beta programme happens but by the time an end-user gets to try it out, it may be 6 months or more down the road. If there are design defects with a feature, they may not get addressed until the next release. With open source, as changes go in, interested users have the option to try it out in the latest nightly build and provide instant feedback in a forum that the developer has direct access.