The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data
D**E
Indispensable How-To for all ETL Architects/Developers/Mgrs
Ralph Kimball has rounded out his complete recipe for building fast, cost effective, robust and durable enterprise dimensional data warehouses with this immensely valuable addition to all IT & Data Warehouse professionals' bookshelves.Without a doubt ETL has been the biggest stumbling block to deploying and maintaining well architected data warehouses that stand the test of time. Ralph draws on his years of experience and engagement with thousands of projects and crystallizes the `Best Practices' into an effective application architecture for all ETL systems regardless of what tools projects use for implementation.In this thorough examination of the Extract, Transform and Load (ETL) process, Ralph identifies 38 critical functions that all ETL systems need to implement for success in the long haul. He thoughtfully lays out simple and practical approaches for how each of these functions can be implemented by projects with any size of budget.For many, the paradoxical nature of ETL (seeming trivial yet replete with endlessly complex details that constantly change) has been the proverbial straw that has broken the bank for many DW projects. Continual customer pressure to grow, improve performance, and quickly deal with changing business conditions have left developers and architects grasping for more powerful and flexible approaches to ETL that meet project timelines, yet evolve and improve with age. Armed with this enlightening roadmap, many DW professionals will be far better equipped to design and build systems that meet the challenges today and tomorrow.
R**I
A field manual for the professional
I've been doing this for 9 years, and this is the best book I've seen on ETL procesing and its role in data warehousing. Before you start your data warehouse effort have your team read this book. Not just the ETL members...both the front/back room technicians will benefit from understanding the clear presentation of what can be in scope (dare I say, "best practice" yikes!) for ETL processing. The ideas for capturing highly valuable data quality and cleansing processes is no less helpful than the emphasis on loading the data unless it is misleading enmasse or so obviously flawed. The bias toward driving the data to the front room for presentation forces data quality issues to the surface where they must be dealt with and the loop to operational systems (or perhaps even flawed ETL transforms!) closed. Illustrates alternatives for dealing with the messy reality of suspended data, late arriving facts and dimensions. Use it as a guide for your ETL efforts. Buy this book, read it and then buy a few more for your team if you agree.
M**D
Top Reference Manual for ETLers
There's a lot of info here and it amounts to a college text. It's very in depth and will require as much time as any college text would. Several chapters stand out like Cleaning and Conforming. Metrics development is discussed in depth in ways that may have been overlooked in other texts like this one. The examples go very deep into generation of surrogate keys, bulk loading of Type 1s, minimum key requirements in dimensions and other ETL concepts concerning the error audit table. The book was written very obviously from practical situations encountered in daily work.The book does not present or attempt to develop the "Kimball Method" of SCD. As it turns out, that's far less important than all the other things required simply to do a day's ETL cycle for which much may go wrong. Numerous tips are worked into this text. It seems that the small things add up greatly throughout a review of the book. For example, it's a mistake to allow NULLs ever to be the prime indicator for an active Type 2 record specifically because a simple SQL call needing a date range controlled by BETWEEN will fail if the active indicator is NULL. It's the fine points like this that rate this book very highly.This is not a trivial one-time-only to read book. It seems that this sort of preparation should be more in evidence in many of the shops. Let's face it. IT but especially ETL is a high end engineering discipline, more so with EDW.As far as criticisms are concerned, the SQL examples are good but are PL/SQL and require time to decode for those developing in MSSQL environments. The book was written in 2002 when Oracle was the main player and there were not so many Teradata shops. That may make the book too back level for some readers. However, the book covers most of the subject matter devoid of specific implementation specifics, enough so that the key concepts would work regardless of the technology chosen.It's not very likely that a course like this one would even be taught at a college level and for that reason the book is a key investment. Further, this book could be used for two full semesters, based on this reviewer's experience in formal course work. It's not likely that on-the-job training would supplant the need for the knowledge contained in this book or one like it.
E**L
Another Kimball Toolkit
In my estimation The Data Warehouse ETL Toolkit is a good source of information for the topic that covers the majority of your Data Warehouse efforts, the ETL process (or ECCD if you prefer, which you probably will after finishing this volume). I took away some good ideas on items that I probably would not have considered, mostly due to my own ignorance, relating to Meta Data, QA and Error Corrections, Data Lineage and Scoring, etc.The Authors (Kimball and Caserta) do a good job of pointing out other source books for items that the user will probably want to look at in depth.There is also a pretty good section explaining how to manage your ETL project, the different roles of people who should be involved and a pretty good project plan / checklist to use as you are getting started.My only complaint is that I did not read this prior to starting my own project and am instead having to correct items as I try to implement these best practices.
L**S
Book arrived in like new condition
This book arrived in very good shape. Exactly what I was hoping for.
N**N
Highly relevant
The toolkit works its way buttom-up but with high level considerations in between. This is one of the things that makes the book highly relevant to developers, technicians and managers alike.
M**A
Five Stars
Read it many times! This was our hymn book for many years
K**R
Complete reference for Data warehouse ETL
Excellent source of information. Detailed in every topic presented. Clearly shows it is a culmination of years of practical experiences
A**R
Ok
Ok
J**O
I recomend this provider
In perfect state. Only downside was delivery time. Everything else was perfect.For me, This book is essential in my bibliography
Trustpilot
2 weeks ago
2 months ago