Talend has released what it says is the first open source data profiler:
Talend today announced the first open source data profiler, which enables companies to assess the quality of data and decide which actions must be taken to correct “dirty data” that irritate customers and cost companies time and money. Data profiling is the first step to achieving reliable, trustworthy data.
Data quality is a major concern for companies of all sizes. Faulty or conflicting data such as missing zip codes, incomplete addresses or wrong phone numbers can lead to multiple mailings to the same customer, lost product shipments and missed sales opportunities. All waste companies’ time and money and exasperate customers.
The first step in improving the quality of a company’s data is to “profile” or evaluate the data. Talend Open Profiler provides project teams with the ability to understand the characteristics of their data and discover its quality level. Accurate profiling reduces the time and resources needed to find problematic data and allows companies to identify potential problems before beginning data-intensive projects such as data integration or new application development. It also allows business analysts to have more control over the maintenance and management of the data.
Talend has chosen to use the GNU GPLv2, which it already employs for its Open Studio product. Good to see both this choice, and the extension of open source to yet another enterprise market sector.