RSS FeedWhite Papers

White Paper Download

Supporting data warehouse design best practices with Kalido

Achieving return on investments in enterprise data warehousing

Category: Information Management

Date: , 11:00

Company: kalido

Adapting to change gracefully, without warehouse reorganization, is a suggested design goal by Kimball, and Inmon introduces the concepts of semantic change and content change to achieve the same ends.

This is Kalido’s number one design goal. By driving the physical structures from a logical business model that Kalido manages directly, it is possible to change the logical model and allow Kalido to re-generate and populate new physical structures appropriately. Because Kalido also holds master data in a highly generic schema as well as in the dimensional tables, the dimensional tables can be regenerated and populated when a structural change occurs with no manual intervention. This allows graceful evolution of both schema and data.

Supporting Data Warehouse Design Best Practices with KalidoAchieving Return on Investments in Enterprise Data WarehousingCliff Longman, Chief Technology Of cer, KalidoWhite PaperUntitled DocumentSupporting Data Warehouse Design Best Practices with Kalido  2Executive summaryKalido automates the implementation of 20 data warehouse best practices as de ned by Ralph Kimball and Bill Inmon, the two best known data warehouse gurus.This results in three major bene ts:  " Time to value Kalido data warehouses are developed 50% faster than conventional data warehouses. In addition, Kalido data warehouses are exible, so adapting to new requirements is also much faster than with conventional data warehouses.    " Reduction in total cost The total number of person days required to deliver and then maintain a Kalido data warehouse is 75% less than for a conventional data warehouse.   "  Risk mitigation Human error is signi cantly reduced, resulting in accurate business reports even through complex business change.This paper addresses the detailed explanation of how Kalido implements these best practices. It is written with the project manager, data architect and data warehouse developer in mind as they consider how to use Kalido in their information management projects.Untitled Document3Table of Contents 1. Introduction 42. Design goals 4   2.1. Incremental development 4   2.2. Ef ciency and ease of understanding 5   2.3. Graceful adaptation to change 53. Architectural structure and features 5   3.1. Warehouse structure 5   3.2. Meta data 6   3.3. Terminology 64. Design techniques and how Kalido supports them 7   4.1. Dimensional models 7   4.2. Surrogate keys 8   4.3. Slowly changing dimensions 8   4.4. Meta data-driven data warehouse 9   4.5. Roles 9   4.6. Aggregates and summaries 9   4.7. Time as a dimension 10   4.8. Sparse facts 11   4.9. Additive, semi-additive,  non-additive facts/measures 11   4.10. Currency and other UOM conversions 11   4.11. Multiple grain transaction data 11   4.12. Many-Many Dimensions 12   4.13. Degenerate dimensions 13   4.14. Junk dimensions 13   4.15. Factless fact tables 13   4.16. Organization and parts hierarchies 13   4.17.   Ragged hierarchies/ unpredictably deep hierarchies 14   4.18. Audit dimension 14   4.19. Staging 14   4.20. Building end user applications 155. Summary table 166. Conclusion 177. About Kalido 17Untitled Document1. IntroductionWe are frequently asked how Kalido supports industry best practices for building and running data warehouses. This paper answers that question. Kalido delivers a packaged software solution for implementing tried and tested industry best practices in a way that eliminates human error and greatly speeds the development and man-agement of data warehouse environments.We have developed a list of industry best practices culled directly from The Data Warehouse Lifecycle Toolkit by Ralph Kimball, Laura Reeves, Margy Ross and Warren Thornthwaite (Wiley, 1998). We have enhanced this list to include techniques recommended by Bill Inmon in his speci cation for DW 2.0 (www.inmoncif.com/registration/news/dw2.php) and also some more recent Kimball design tips from www.ralphkimball.com. For each technique, we have paraphrased a description and given references to The Data Warehouse Lifecycle Toolkit book, DW 2.0 speci cation (page numbers in angle brackets ) or design tip at which you can nd a full description of the best practice advice. We have then described how Kalido enforces or supports the practice through its software products.This paper is divided into three main sections. The rst section highlights the major design goals that Kimball and Inmon recommend as drivers for data warehouse development. The second section describes how Kalido ts into the overall architecture of a data warehouse. The third section itemizes 20 best practices one by one and describes how Kalido enforces or supports these best practices. Finally, the summary gives a quick reference table of all the items discussed with a rating for Kalido s support for each.2. Design goalsThree primary data warehouse design goals stand out from the literature. These design goals have been at the forefront of the Kalido design team s consideration from the earliest stages of the software s initial development. Having been developed to address the needs of the Royal Dutch Shell Corporation as it attempted to reconcile performance data from its world-wide operations during the early 1990s, Kalido software has been developed with the following in mind: 2.1. Incremental developmentKimball recommends satisfying business needs incrementally towards enterprise coverage by building one data mart at a time . Inmon recommends the spiral methodology for DW 2.0 development.Kalido was designed to support incremental development. The most successful Kalido projects have been developed incrementally. Kalido is designed for extreme exibility, and this makes evolution in small steps much easier than would be the case without it.Behind the ability to support incremental development lies Kalido s ability to support a changing requirement without having to modify any of the system you have already developed and without having to unload/reload data stored by previous iterations. New sources (even at different grains) can be added, levels in hierarchies changed, whole new marts added without having to modify previously created loads, updates, extracts, etc. Even archive/purge and restore continue to work through changes to the model to accommodate new requirements.Because of this agility, we frequently nd that Kalido is used to elicit business requirements by having business people develop their own output (almost invariably changing their minds once they see the results). Note that this is almost the opposite of the idea that business requirements can be discovered in advance of building a data warehouse. We believe that requirements are discovered as a data warehouse is developed.Supporting Data Warehouse Design Best Practices with Kalido  4Untitled Document52.2. Ef ciency and ease of understandingKimball is insistent that dimensional modeling is a mandatory data warehouse design pattern to deliver performance and ease of understanding.Kalido automatically delivers a dimensional model to achieve this goal. Each dimension is created as a dimensional table in the underlying relational database by Kalido, with the (sometimes very complex) hierarchical levels structured as columns in the table. This makes it simple for end user query tools to access hierarchies at any level, while still being very ef cient for the DBMS to process queries that aggregate to levels in the hierarchies.2.3. Graceful adaptation to changeAdapting to change gracefully, without warehouse reorganization, is a suggested design goal by Kimball 10>, and Inmon introduces the concepts of semantic change and content change to achieve the same ends.This is Kalido s number one design goal. By driving the physical structures from a logical business model that Kalido manages directly, it is possible to change the logical model and allow Kalido to re-generate and populate new physical structures appropriately. Because Kalido also holds master data in a highly generic schema as well as in the dimensional tables, the dimensional tables can be regenerated and populated when a structural change occurs with no manual intervention. This allows graceful evolution of both schema and data.Kalido shares the three most important design goals laid down by Ralph Kimball and supported by Bill Inmon. The Kalido difference, however, lies in the way that best practice techniques that have emerged from these goals have been packaged into a robust product that either enforces or provides guidance and support. The scope for manual error is signi cantly reduced and risk reduced as a result.3. Architectural structure and features This section describes how Kalido ts overall in the Kimball data warehouse architecture, and into the integrated sector in DW 2.0.3.1. Warehouse structureKimball advocates a scheme in which a number of subject-speci c data marts are developed independently but fed from a common staging area and with common data (dimensions and facts) conformed between the marts. In this scheme, there is no one place that all data is stored. Inmon advocates an integrated store of all data he calls the data warehouse, from which data is copied into various data marts.Kalido offers two alternative architectures in support of the Kimball and Inmon architectures. These can be safely mixed if desired. Kalido works on the basis of Kalido Instances. A Kalido instance is a set of tables in a data-base (running on one of the supported DBMS platforms). Each instance contains a body of master data, a body of transaction data, a star schema and a number of mini-star schemas called result sets. In the case of a single Kimball data mart, this would be implemented as a single Kalido instance. This would also support the Inmon architecture (for a single mart). For multiple marts, there are two options:    " A single Kalido instance, spinning off multiple result sets.    " Multiple Kalido instances connected through Kalido s federation facilities. It is of course possible to have a mixture (for example 10 marts supported by three Kalido instances).Untitled DocumentThe decision regarding which Kalido architecture to choose is driven by the disparity of the marts (similar marts tend to favor a single Kalido instance with the marts supported by results sets, dissimilar marts tend to favor multiple Kalido instances).If the marts have to be supported by different DBMSs, then there must be at least one Kalido instance for each DBMS to be used.3.2. Meta dataBoth Kimball and Inmon stress the importance of meta data. A meta-data-driven data warehouse is more exible than one that assumes a xed data structure (though it is typically more dif cult to design and build).Kalido is heavily meta data driven. Kalido de nes data in a real-world business model (i.e., a description of real-world objects rather than descriptions of database tables). This business model is used by Kalido to auto-matically generate physical database structures, as well as load and query SQL statements. In this way, Kalido data warehouses are relatively easy to change because there is relatively little traditional coding involved. Kalido keeps track of all this meta data as well as meta data to do with the system operation (such as batch load statistics) and makes it available through its API.3.3. TerminologyThe following table shows the Kalido terminology equivalents to the Kimball/Inmon terms.Supporting Data Warehouse Design Best Practices with Kalido  6Untitled Document74. Design techniques and how Kalido supports them4.1. Dimensional modelsKimball and Inmon both advocate presentation of data warehouse data to the end user query tool in a de-normalized form. The accepted best practice for OLAP and hierarchical aggregation queries is the star schema. In a star schema, fact data in one or more fact tables is joined to dimensional data stored in dimensional tables. The most important feature of this design is that the dimensional tables pre-join hierarchies that the user might wish to navigate and report on, unlike a normalized design that would separate individual levels of each hierarchy and have the end user query tool generate the join at output time. The dimensional approach makes the output to the user both easier to understand (the end user does not have to worry about joins), and faster (dimensional data is pre-joined). Pre-joining speeds up aggregations and also speeds up drill-down operations as users navigate hierarchies.Kalido enforces a strict dimensional model for the output of data through what it calls mapping tables. These mapping tables are created and populated by Kalido without the designer s having to design or write code. (See Figure 4.1) Kalido also relieves the data warehouse designer from many of the complexities of dimensional design by:    " Automatically collapsing multiple hierarchies into a single dimensional table   " Providing a dimensional loader that automatically takes data fed via staging tables or les and populates dimensional tables (note Kalido does this via a generic store that also records the historical state of dimensional data)   " Managing change to the dimensional schema over time (if you add or modify hierarchies to the data warehouse, Kalido will automatically re-generate and upgrade the rows in the relevant dimensional tables making it easy to evolve even complex schemas over time)DIMENSIONAL MODELSKalido enforces a strict dimensional model for the output of data through what it calls mapping tables. These mapping tables are created and populated by Kalido without the designer s having to design or write code. Figure 4.1.Untitled DocumentSupporting Data Warehouse Design Best Practices with Kalido  84.2. Surrogate keysSurrogate keys are system-generated unique identi ers that are used in place of real world identi ers (e.g., the surrogate key value 14289564 identi es the person we know as Tom Smith rather than a social security number, or a combination of name or date of birth, etc.) Kimball strongly advocates the use of surrogate keys <pages 191 193 >.Kalido enforces surrogate key identi cation for all master data elements and automatically allocates these surrogate keys at creation time. Kalido provides utilities for loading meta data, dimensional data and fact data, and all three automatically assign and use surrogate key values as necessary. Kalido s fact loader also provides lookup facilities (selecting one of a number of alternative algorithms for best performance) so that as fact data is loaded, the real-world identi ers are converted into the appropriate surrogates. Kalido replaces the surrogates with the relevant real-world identi ers (also called natural keys ) as data is output from the system so that the end user is typically unaware of the surrogate keys.One difference between the Kimball and Kalido surrogate keys is seen in building the dimension tables. As explained in section 5.3, Slowly changing dimensions, Kimball considers that when there is an update on any dimensional entity, like an attribute change or a hierarchical parent change, a new version of that entity is created, hence this new version should be given a new surrogate key. Kalido, however, keeps the original surrogate key and expires the old mapping table record with an end date, creating a new record with same surrogate key and a start date. In Kimball design, creating a new surrogate key for each version keeps the fact table to dimension cardinality as many-to-one, whereas in Kalido, the cardinality of the surrogate key join (alone) between fact and mapping tables is many-to-many. For this reason, Kalido also uses effective date predicates between the fact and mapping tables to ensure many-to-one aggregation of facts.The Kimball method requires an ETL tool to maintain the natural-to-surrogate key lookup tables in the ETL platform. When there is a change in dimensional data, the new surrogate keys are created in the lookup tables. Translating the natural keys in a fact record with the new surrogate keys is also an ETL responsibility. As a result, we can see different surrogate keys in the Kimball fact table for the same dimensional entity (e.g., a product). In Kalido, surrogate-to-natural key mappings are kept within Kalido, which means that the ETL tool can bring the source transactions in their local natural keys, without a need for translation. This is also useful for maintaining a time variant mapping to surrogate keys. 4.3. Slowly changing dimensionsSlowly changing dimensions refers to a scheme for recording changes to master data over time so that history can be correctly queried for output. Kimball invented the term and describes 3 alternative schemes (type 1, type 2, type 3) for implementing slowly changing dimensions, each with different accuracy, performance and ease of implementation characteristics . Inmon describes discrete and continuous time variance with special treatment for transaction snapshots, e.g., end of period balances.Kalido automatically implements and enforces advanced time variance for all categories of master data. In a sense what Kalido does is beyond type 3 slowly changing dimensions, combining the bene ts of types 2 and 3, because every attribute value is separately identi able in Kalido s generic store (giving simple traceability of every attribute and relationship at the atomic level). In other words, Kalido s time variance is at the individual eld level, whereas it is at the record level in Third Normal Form or in De-normalized form schemas (like Kimball dimension tables). Kalido also maintains a replica of the master data in a dimensional form with what it calls mapping tables. These mapping tables are in type 3 form. Kalido automatically manages change to the mapping tables so if a hierarchy is rearranged, the correctly time-stamped rows are created in the mapping table by the system.Untitled Document9Kalido also ensures that other features of the system as a whole are kept intact alongside slowly changing dimensions, for example automated summaries, queries that run across periods of change, currency and other unit of measure conversions, etc. All these other features are executed in conjunction with the slowly changing dimensions so that the business user gets the right information when Kalido produces output.4.4. Meta data-driven data warehouseInmon and Kimball strongly suggest driving as much data warehouse behavior as possible from meta data so that the system s behavior changes when the meta data is changed .Kalido is heavily meta data driven. Kalido offers the user what is called a business model as a con guration tool. Classes, attributes, associations between classes, the relationships between activities (business transactions) and their business context and KPIs can be created and maintained by business users. The necessary database schema, loader functionality and mart structure and content are all generated by Kalido automatically driven by the business model. In addition, the change management is handled automatically, so that a change to the business model results in potentially radically different schemas and SQL can be used the next time the system is used. The business model also de nes most of the analytical business rules.4.5. RolesRoles describe a situation in which the same class is used for two different purposes. An example would be the Company class being used in the role of Customer and also Supplier (i.e., we sell to companies; we also buy from companies). Kimball describes roles focusing especially on examples of dates/times as a class with many roles.Kalido has roles built in. Part of the business model involves specifying the roles that a class can play, and also the role that is being played when the class is referred to by a fact (e.g., the customer and the supplier for a purchase fact). These roles apply to relationships de ned in the business model.Kalido also implements subtypes which assist with validation (sometimes confused with roles). For example, a customer may have a mandatory credit-rating attribute, whereas a supplier would not but may have a maximum lead time attribute. This is achieved through class of business entity subtypes with rich cardinality rules like a super-type entity can be zero, one or many of its sub-types. Kalido would also validate the multiple role elds in each transaction record against whether that company is allowed to play a particular (e.g., supplier) role.4.6. Aggregates and summariesWhen data warehouses get large, particularly in terms of the number of facts and the number of users accessing it, it becomes inef cient to continually calculate the same summaries over and over again. It is common practice (Kimball ) to calculate a summary data set (e.g., the sum of total order value by sales person, by customer by month). These summaries are stored physically, so that if an answer can be derived from a sum-mary without going back to the raw data, it is (saving signi cant re-calculation).Kalido automates summary implementation and management. All DBA s have to do is indicate which summaries they want created (by selecting the pieces of the business model) and when they should be calculated following new data loads, and Kalido does the rest. (See Figure 4.6) When Kalido generates a query that it knows can be satis ed in whole or in part from a summary, then it will use the summary as the source. Kalido is summary aware. In addition, Kalido addresses the management of change so that summaries invalidated after a business model change are not used incorrectly. Kalido also provides a scheme for compressing summaries following incremental loads (aggregating multiple summary rows with the same values for the dimensions to be summarized).Untitled DocumentSupporting Data Warehouse Design Best Practices with Kalido  10Kalido also allows result sets (data generated as output) to be physically stored and labeled as part of the business model (by creating a new transaction dataset). This allows complex algorithmically de ned summaries (such as allocations) to be calculated using custom algorithms and then stored under the scope of the Kalido business model, affording them the ability to be selected and evolved under business model control.Time variance needs to be taken into account when creating summaries. Each summarization process requires a choice of aggregation using current parents or with parents at the time that each transaction took place. Kalido automates this for you.4.7. Time as a dimensionKimball speci cally advocates the use of time as a dimension .Kalido also requires time as a dimension and embellishes it with a number of complex time features such as corresponding time periods (this month last year) or relative time periods (year-to-date) that can be extended by business users in business terms. (See Figure 4.7) Kalido generates the (often scary and highly complex) SQL needed to produce the right output. Kalido also allows multiple calendars that are customizable (so if your accounting month called January actually starts Jan 3rd in 2006, this is catered for).Kalido comes with default calendar data and also has built-in functionality for absolute, relative and typical periods of time, allowing complex calendar and time functions to be customized for each Kalido implementation.AGGREGATES AND SUMMARIESKalido automates summary implementation and management. All DBA s have to do is indicate which summaries they want created (by selecting the pieces of the business model) and when they should be calculated following new data loads, and Kalido does the rest.Figure 4.6.TIME AS A DIMENSIONKalido also requires time as a dimension and embellishes it with a number of complex time features such as corresponding time periods (this month last year) or relative time periods (year-to-date) that can be extended by business users in business terms.Figure 4.7.Untitled Document114.8. Sparse factsSparse facts are facts in which there are a large number of measures, many of which have no value in each fact record .Kalido provides two alternative ways to support the sparse fact. The rst is the way that Kimball suggests in his design tip that is, to model a generic fact type (as a transaction data set in Kalido) and a special dimension whose members represent the type of the fact. Kalido then automatically allows ltering on the relevant fact types when a result set is generated. In this case, Kalido provides a concept called an aggregated measure, which is a de nition that re-establishes the speci c business name for the measure (e.g., lubricant usage ), but refers to the generic underlying sparse fact implementation (e.g., value where value type is lubricant usage ). Aggregated measures are stored predicates such as cost where cost type = labor cost. This way, the technical implementation of sparse fact is a manageable one, and business users get to use their familiar terms. There is another alternative with Kalido, which is to model each measure with its own transaction dataset (note they can all share the same physical table if it is a Kalido generic warehouse section). This will result in a larger number of transaction datasets, which may add complexity to the load processing but may be an acceptable trade-off.4.9. Additive, semi-additive, non-additive facts/measuresSome fact types (e.g., sales volumes) can be summarized against any dimension. Others (such as inventory levels) can be summed against any dimension except time. Yet others (e.g., temperature of a liquid in a manufacturing process) cannot be summed against any dimension. Kimball describes these properties of facts at .Kalido supports all the above and includes an additional option which is summable against only a single dimension (useful for nancial summaries against account). Note that the de nition in Kalido is done at the business model level and the necessary technical implementation is generated (in this case Kalido will only generate valid SQL statements to calculate the sum depending on the measures chosen). Kalido will never generate SQL that sums facts inappropriately, even if the sum is buried in a complex calculated measure a comforting piece of risk reduction for business and technical users alike.4.10. Currency and other UOM conversionsKimball describes currency conversion and multi-currency data warehouses at .Kalido has built-in currency recording and conversion facilities. Kalido identi es special transaction data sets as exchange rate data sets, and these are used to provide (multiple alternative) exchange rates with which to convert currency units. This is useful if there are standard accounting exchange rates used for a nancial year, true current exchange rates, agreed exchange rates for a contract and so on. Exchange rate data is loaded as any other transaction data set would be. Currency conversions can also be de ned as meta data de nitions called converted measures. When transactions are loaded in any currency, a converted measure (e.g., cost in Euros ) can then be included in result set de nitions without re-de ning the currency conversion each time. Kalido extends the general conversion principle to other units of measure (not just currencies) which can be satis ed by a at conversion rate (e.g., volume to weight at standard temperature and pressure).4.11. Multiple grain transaction dataKimball explains that when data is available at a higher grain than the grain required for analysis, a form of allocation is required . An example would be when sales revenues are available for each packed product on the invoice lines and product cost is available at product group and monthly levels, and we need Untitled DocumentSupporting Data Warehouse Design Best Practices with Kalido  12margin by individual sales. Note that the algorithms and schemes for allocating to a lower grain are speci c to each company and sometimes have multiple alternatives within a single company.Kalido supports allocation by de ning a new measure and transaction data set for each allocation method (so, for example, there may be a measure called costs allocated by headcount and a TDS called daily headcount allocated costs ). A script, program or ETL tool is used to implement the allocation algorithm depending on the complexity of the allocation algorithm. With Kalido, implementers should bear in mind that models may change, so the most exible (but most dif cult) allocation algorithms are driven from the Kalido meta data rather than being hard coded to expect certain classes, attributes and associations in a de ned con guration. Implementers should also be careful to re-run an allocation should any of the relevant master data change historically (e.g., if an employee entity is updated on 1/1/2006 to be in a different department historically, such as between 1/1/2005 and 6/1/2005.Sometimes, the same type of transaction data may be available from multiple sources with different granularity. For example, we may have a legacy sales order processing system capturing the revenue and volume by product, customer and day and another ERP system with sales transactions capturing revenue, volume, discount and distribution cost by product, customer-shipped-to (as a child of customer-sold-to), sales representative and day. Having some different measures and referring to different columns in the dimension tables would make the aggregation at common levels more dif cult, resulting in complex SQL unions. Kalido handles such multi-granular transactions automatically to resolve the common granularity from its meta data and adds up measure values correctly while more granular totals are still available as much as the individual source granularities.Kalido also provides a construct known as the coding structure which allows facts that may be recorded against one of a number of levels in a hierarchy (e.g., a budget that may be for a team, a department, a division or a company). Kalido automatically generates the right SQL to aggregate facts recorded against a coding structure.4.12. Many-many dimensionsThis occurs where a fact effectively has multiple entries for a single dimension . If the facts are aggregated using the dimension for which there are multiple entries, they will be double counted (i.e., the grand total reported will be greater than the sum of all the facts). What is more frequently required is a query in which the aggregate is created from a subset of facts limited by one or more values in the multi-valued dimension. Kimball uses the example of a billable patient encounter fact for which there is a diagnosis dimension, and there can be many diagnoses on each encounter. If one aggregated the billable value by diagnosis, the sum of the sub-totals would be greater than the total billable value. What is more likely is a query that aggregates billable value for all encounters that include a speci c diagnosis. Kalido does not support double counted many-many dimensions. What Kalido does have is a many-many association capability, so that a diagnosis group can be de ned (as in Kimball s solution to the problem) and a many-many association created between the group and the diagnosis class. Kalido then allows a lter to be applied (when building a query de nition) by dragging individual diagnoses into the query constraints. When there are just a few possible multiple parents of a dimensional entity, Kalido also provides a solution to dissolve each parent into multiple binary parents called a dimension set. So, for example, if a road accident can be attributed to many reasons including road condition, car failure, weather conditions and driver error, these possibilities would be automatically modeled as separate parents like car failure yes/no? type binary and many-to-one relationships. Kalido allows the data to be loaded with a simple list as though it were a many to many, but reported on as though it were a set of many-to-one relationships.Untitled Document134.13. Degenerate dimensionsKimball describes the degenerate dimension at . This occurs when a eld such as an order number is left over because the only data we are interested in for our fact is the order line data. These are generally used to link a chain of different types of fact to each other.Kalido allows any additional elds to be added to the measure and dimension data in a transaction. In this way, the invoice number for each invoice line transaction can be stored and referenced in group by statements generated by a BI tool. There are also some special elds that Kalido ascribes special behavior to, such as a transaction identi er (such as a point of sale system transaction ID), which Kalido ensures uniqueness for. This helps prevent accidentally loading the same transaction twice and can help track transactions back to the source system if need be.4.14. Junk dimensionsKimball describes junk dimensions as dimensions used to accumulate various status and switch ags (such as yes/no ags) .With Kalido, junk dimensions are simply modeled as dimensions and classes of business entities in the normal way (as they would be in a Kimball implementation ).4.15. Factless fact tablesSome facts simply record business events (such as a person attending a training class) in which there is no measure. Kimball describes these as factless facts . There are also some facts that represent a useful combination of dimensional items, which Kimball calls coverage facts (an example here would be that a certain product is sold through a certain channel in a certain country this represents a way of recording the valid combinations from the members of a number of dimensions).With Kalido, factless facts can be implemented as a Transaction data set with a count measure that always has the value 1 (much as Kimball advises). Kalido also has a concept called mapped business entities, which allow valid combinations of business entities to be collected and treated as a single unit (for example, you might collect a product, a channel and a country as an allowed combination). This approach is only used for relatively small (up to 100,000) combinations such as market segmentations.Kalido s mapped business entity feature is also used for modeling business rules as a function of multiple dimensions. For example, if the product class is services and if the state is New Hampshire, then the tax attribute is 5%. This could be more complex, such as if the product class is services, except the product sub-class education, and if the state is New Hampshire, then tax is 5%. 4.16. Organization and parts hierarchiesKimball describes the organization and other any level parts hierarchies and how SQL provides inadequate means of aggregating such networks. Kimball suggests the implementation of a helper table to atten the hierarchy so that facts can be aggregated to any level with a single join.Kalido attens any level dimensional hierarchies also but does so in the same way that it deals with xed level hierarchies, by generating columns in the mapping tables. The result is a mapping table that has one column for each level in the hierarchy. Kalido of course generates the correct SQL to aggregate to the level requested by the user (the user can also optionally name the levels, which in the case of organization structures is often attempted). This results in ef cient query time processing as there are no joins required at all. What it does mean, however, Untitled DocumentSupporting Data Warehouse Design Best Practices with Kalido  14is that when the master data changes, more (or fewer) columns in the mapping table may appear (based on the number of levels in the data). Kalido automatically handles this upgrade to both table and data, but it does mean that table structure may change in a production environment, unless this type of change is handled through the development/test/QA/release environment (as it typically is in most non-Kalido systems). It is worth noting that with Kalido this is simply modeled as an involution on the business model, and although this never results in a SQL CONNECT BY statement (to Kimball s point) it does simply work by providing the right results for the user (and does this in a way that continues to work despite changes in the hierarchical structure such as re-organizations, including aggregations that span a reorganization).4.17. Ragged hierarchies/unpredictably deep hierarchiesSome hierarchies are not a xed number of levels. They may have some nodes that are 3 levels deep, some that are 8, some that are 2. Kimball gives examples at .Kalido handles these in the same way that it would a xed level hierarchy (i.e., the number of columns in a mapping table equals the largest number of levels in the whole hierarchy). Levels that are not appropriate are populated with a user-de ned phrase such as Not available at this level, so the user will always have the correct data and always return the right number of rows.4.18. Audit dimensionKimball suggests an optional audit table to be populated with useful execution information each time a batch of fact data is loaded .Kalido automatically provides this data as part of the transaction loader logging the data against a system object called a batch. In cases where the batch is broken into separate chunks (e.g., erroneous records being corrected and reloaded), Kalido generates sub batches.4.19. StagingKimball describes staging in great depth . The description deals with approaches to surrogate key management, database versus at le staging, incremental loading, nal data loading, error handling/ agging, dimensional change logic, historic data loads, data quality and cleansing. In the Kimball model, the staging area is a signi cant part of the data warehouse architecture. Kalido provides either automation or support for each of the items mentioned above, and as a result, the staging area in a Kalido implementation tends to be somewhat simpler than would be the case in a Kimball implementation. That is not to trivialize what can be a very complex and dif cult part of a data warehouse project poor quality data in particular still needs to be sorted out it is just that Kalido automates or obviates the need for a signi cant proportion of standard Kimball-style staging activity. Here are the main points:  " Database versus at le staging Kalido supports both. Files can be on the server running the load utility or via an ftp connection.   " Incremental loading Kalido will do delta detection for dimensional data if required (so a complete dump of a dimension can be loaded each period, allowing Kalido to do the incremental update as necessary). If source system or ETL tools are to do the change data detection, Kalido will accept a le of deltas.Untitled Document15  " Final data loading Kalido does this automatically. It is not possible to do the nal load any way other than with the Kalido loaders. For fact data, Kalido will optionally use the DBMS native loader as an option.   " Error  handling/ agging Kalido has an error management subsystem for presenting errors in fact data loads. This allows the user to examine the error and either correct for reload or reject. For master data loads, errors are rejected but may be managed as part of the data cleansing process (see below).   " Dimensional change logic Kalido automates this for Kalido-controlled dimensions. No code is necessary. Kalido also offers an option to manage dimensions manually if there is some specialist processing required. If this option is taken for a dimension, the dimensional change logic must be handled in ETL or other processing external to Kalido.   " Historic data loads Kalido treats these as normal loads, though there are special options on master data loaders allowing correction of history if an anomaly in historical data is discovered.   " Data quality and cleansing Kalido provides no automated data cleansing or standardization but does provide a complete application to support data governance, allowing business users to develop authorized golden copies where manual decisions are required (such as missing classi cations) of master data before it is used in the data warehouse itself.4.20. Building end user applicationsKimball describes a number of techniques to address building end user applications .Kalido does not replace or reduce the need for these techniques but does provide a means of automating the production of the con guration for certain BI tools (and manages the incremental update of these con gurations when the business model changes). Support is currently available for BusinessObjects, Excel and Analysis services (of ine cubes).Untitled DocumentSupporting Data Warehouse Design Best Practices with Kalido  165. Summary tableThe table below summarizes Kalido s support for the industry best practices for developing data warehouses. For each feature we have given an indication of whether Kalido supports the best practice or not, and if it does support the best practice, the level of support Kalido offers.In the Kalido support column:Enforced (E) means Kalido provides explicit support for this feature, and the use of the feature is not optional. Supported (S) means Kalido provides explicit support for the feature, but use of the feature is optional.  Allowed (A) means that Kalido provides no explicit support for the feature, but using other Kalido features can achieve the same results. Prevented (P) means that Kalido will not allow or cannot support the feature.In the level of support column:Exceeds (E) means that Kalido incorporates the best practice but also goes further in offering additional  functionality. Supports (S) means that Kalido automates the approach suggested by industry best practices. No Support (N) means that Kalido does not support the approach.Untitled Document176. ConclusionWhile additional data warehousing best practices are advocated by both Ralph Kimball and Bill Inmon, as well as by many other practitioners, this paper has focused on the 20 forming the bulk of the published literature for data warehouse design as de ned by Kimball and Inmon. From its inception, Kalido has kept the most critical of these best practices rmly in mind while developing and improving its software products: incremental development, ef ciency and ease of understanding and graceful adaptation to change.By using Kalido to build and manage their data warehouse, organizations can rest assured that it conforms to the best practices as advocated by the industry s best known experts. More important, a Kalido data warehouse will deliver the accurate, consistent, accessible information your company needs to manage and run your business over time, as it changes.For more information about Kalido software, please visit our web site at http://www.kalido.com. To learn more about how Kalido supports industry best practices, please contact us at info@kalido.com. 7. About KalidoKalido delivers active information management for business. With Kalido s unique business-model-driven technology, decisions are fueled by accurate, accessible and consistent information, delivered in real time, to dramatically improve corporate performance. Kalido software can be deployed at a fraction of the time and cost of traditional information management methods.Kalido software is installed at over 250 locations in more than 100 countries with market leading companies. Headquartered in Burlington, Massachusetts, Kalido is backed by Atlas Venture, Benchmark Capital and Matrix Partners. More information about Kalido can be found at: http://www.kalido.com.Copyright 2007 Kalido. All rights reserved. Kalido, the Kalido logo and Kalido s product names are trademarks of Kalido. References to other companies and their products use trademarks owned by the respective companies and are for reference purpose only. No portion hereof may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, record-ing or information storage and retrieval systems, for any purpose other than the recipient s personal use, without the express, written permission of Kalido. The information contained herein is subject to change without notice. Kalido shall not be liable for errors contained herein or conse-quential damages in connection with furnishing, perfor-mance or use hereof. Any Kalido software described herein is licensed exclusively subject to the conditions set forth in the Kalido license agreement.Untitled DocumentContact InformationUS Tel:  +1 781 202 3200European Tel: +44 (0)845 224 1236Email:  info@kalido.comor visit our website atwww.kalido.comWP-DWBP0307Copyright 2007 Kalido. All rights reserved. Kalido, the Kalido logo and Kalido s product names are trademarks of Kalido. References to other companies and their products use trademarks owned by the respective companies and are for reference purpose only.

You must have an account to access this white paper. Please register below. If you already have an account, please login.

Already registered?

Login

Forgot password?

New customer?

White paper download

ComputerworldUK Webcast

ComputerworldUK
Share
x
Open
* *