Extract Transform and Load

Within the Data Warehousing arena, Extract, Transform and Load (ETL) from operational systems to the target data warehouse/data mart is a major system component that consumes 70-80% of multi-million $ budgets. Traditionally, hundreds of programs are written either in 3GL (COBOL) or more recently using ($100K+) tools, but generally the programs are simple - move this to there and change a name, data format, or validate that the data contains a valid code. Even though the programs are usually trivial, they can be very complex, and are constantly breaking by upstream changes. Thus the tools generally provide metadata where-used tracking and some level of productivity gain over 3GL's.

However, because the tools are so expensive, customers take a long time to figure out which one to buy (there are about 5-15), driving up sales costs, and keeping the sales growth pretty flat. Consequently, vendors are financially challenged to deliver the stream of features needed, and maintain decent quality. A side effect is that finding skilled programmers with the tools, is very difficult.

Many of current ETL tools use a flow based GUI/paradigm but suffer from poor underlying architecture, a closed programming model, and incomplete support for the current DW Design Methods. A number of us on the DW mailing list have been talking about an Open Source ETL tool, and I'd like to adopt FBP as the underlying architecture. From my years with SAS (Not exactly FBP) and Pipes (More FBP) I've experienced the flexibility, and performance as well as the "Naturalness" of the FPB paradigm and feel it would make a significant impact on the DW world.

I've recently set up a project on the open source network SourceForge.Net (http://opnsrcetl.sourceForge.net) and hope to gather a cadre of skilled professionals to develop the requirements and God willing an implementation that we can use in our practice. --DougLittle

