This chapter has been excerpted from the book "Flow-Based Programming: A New Approach to Application Development" (van Nostrand Reinhold, 1994), by J.Paul Morrison.
To find out more about FBP, click on FBP.
For definitions of FBP terms, see Glossary.
Material from book starts here:
"Reuse in DFDM is natural. DFDM's technology is unsurpassed in its promotion of reuse as compared to other reuse technologies currently being promoted" (from an evaluation of DFDM performed by an IBM I/S site in the US in 1988).
So far, we have spoken as though components are created "out of thin air" for a specific problem. You may well have suspected that I selected my components to illustrate certain concepts, without worrying about whether they would be useful in a real application. Well, the good news is that the kinds of components we have run into are in fact the ones which experience shows are very useful for building real applications. The bad(?) news is that it requires a large amount of experience in programming and a certain creative flair to come up with useful components. This should not be so surprising when you think about useful tools you are accustomed to using in real life. Where and when was the first hammer invented? Imagine a whole series of "proto-hammers", for instance reindeer horns, rocks attached to sticks, etc., gradually evolving into what we are used to today, with a claw on one side, and balanced just so.... Perhaps we should qualify that by saying "in the West". Different cultures will come up with different tools or different forms of the same tool. I seem to remember from my anthropology classes that the Australian aboriginals have a wonderful tool which is a combination of all the things they find most useful in their everyday life, and yet is highly portable. It is a combination spear-thrower, shield, dish and fire-starter. These kinds of tools are not invented overnight - they take time and experience, requiring progressive refinement by many creative people. The best tools will always evolve in this way. Tools may also pass out of use as the need for them diminishes, or they are replaced by something more effective - buggy whips are the classical example, but usually it happens without our even noticing! When did they stop putting running-boards on cars? No, you don't need to phone right away! Clearly, culture and the tools we use are closely intertwined - we have strong ideas about what is the right tool for a given job, but another culture may in fact have a different definition of what that job is.... In the West we consider using a knife and fork the "proper" way to eat food - a few centuries ago, we were spearing it with the point of a dagger. Knives and forks in turn mean that an acceptable Western meal might include some very large chunks of meat, or even half a bird. In the Orient, on the other hand, people have been using chop-sticks for a very long time, which requires that the food be served in bite-size pieces. Notice that the choice of tools also helps to determine what part of the serving is performed by the diner and what part by the cook behind the scenes.
One other thing we should consider is the need to be able to use the tool in unforeseen situations. A useful tool should not be too restrictive in the ways it can be used - people will always think of more ways to use a tool than its original designer ever imagined. Wayne Stevens (1991) tells a story about an airline attendant using a hearing set (that little plastic stethoscope you plug into the arm of your chair) to tie back some curtains. Elegant? No. Effective? Yes! We don't want to make a hammer so intelligent that it can only be used on nails.... Another example: why do some UNIX(tm) functions have non-obvious names? There are well-known cases where a tool was originally designed for one job, but people found that it was even more useful for some function the original designer did not foresee. This is in fact a testimony to the robustness of these tools. We will run into examples of this kind of thing in FBP.
Just as in the preparation and consumption of food there are the two roles of cook and diner, in FBP application development there are two distinct roles: the component builder and the component user or application designer. The component builder decides the specification of a component, which must also include constraints on the format of incoming data IPs (including option IPs) and the format of output IPs. The specification should not describe the internal logic of the component, although attributes sometimes "leak" from internal to external (restrictions on use are usually of this type). The application designer builds applications using already existing components, or, where satisfactory ones do not exist, s/he will specify a new component, and then see about getting it built.
Component designers and users may of course be the same people, but there are two very different types of skill involved. This is somewhat like the designer of a recent popular game, who admitted he was not particularly fast at solving it - his skill was in designing games, not in playing them. The separation between makers and users is so widespread in real life that we don't pay any attention to it unless it breaks down. In industry, as Wayne Stevens points out, we take for granted the idea that airplane builders do not build their own chairs - they subcontract them to chair manufacturers, who in turn subcontract the cloth to textile manufacturers and so on. In contrast, the world of conventional programming is as if every builder designed his own nails, lumber and dry-wall from scratch. Talk about "reinventing the wheel" - in conventional application development we reinvent the rubber, the nuts and bolts, and even the shape of the wheel!
I'd like to talk a little bit about how useful components are developed. They are unlikely to emerge from a pure "top-down" approach, or from a pure "bottom-up" approach. In the first case, you do not discover dry-wall by progressively breaking down an architect's drawing of a house. In the second case, people who dream up useful components have to be willing to subject them to rigorous testing in real life situations. Even after this has been done, they still may not sell. Nobody in industry would bet the business on some untried tool which had never been evaluated in the field (well, usually not), and yet we do this frequently in application development. Another of Wayne Stevens' recommendations is not to build a generalized tool until you have found yourself doing the same thing three or four times. Otherwise, you may find yourself investing a lot of time and effort in features that nobody will ever use, and you may find yourself unable to respond to customer requests for features they really do want.
In FBP a lot of the basic components have analogues in an area which is no longer well-known, but has been very productive of generalized components over a number of years - namely, Unit Record machines. In those days we had specialized machines, such as sorters, tabulators, collators, etc., and people learned to wire (parametrize) them and link them together into applications very effectively. And you didn't need a college degree to get applications working. In fact, I once figured how to solve a problem with a tabulating machine plug-board, straight out of the bath, over the phone, dripping wet, without even notes or a schematic to look at!
Just as Unit Record machines worked with streams of punched cards, the corresponding FBP components work with streams of IPs. Let's call these "stream-based" components. Examples of such components are:
These all have the characteristic that they process data streams and that they require very little information about the format of their incoming data streams. They typically have well-defined application-independent functions.
We might expand the list with some general-purpose components which get down to the data field level, but still do not "understand" business processing. One such component might be a generalized transform component. I believe such a component, properly parametrized, could in fact do a lot of the processing in a given business application. Nan Shu of IBM in Los Angeles has written extensively about a language which she calls FORMAL (Shu 1985) - its function is to take descriptions of files and the transformations between them and use them to do the transforming automatically. She has found that a large amount of business processing consists of moving data around, changing its coding, and doing table look-ups, e.g. one application might use a number for each US state, while another might use a two-character abbreviation. This suggests that another type of function in this same class is a generalized table look-up function, and in fact we have built several for DFDM.
There is another general class of components known as "technology-dependent". These components usually require specialized knowledge to build, but, once created, can be used by people who are not as technically skilled. They thus encapsulate specialized knowledge. They will usually be written in a lower level language. We had an interesting example of this some years ago: we had a person who was an expert on paper tape. Paper tape is (was?) a medium with its own quirks. It has special codes for various purposes, and in particular has a convention for correcting data (you punch all holes across the offending character, which is then treated as though there was no character there at all). This individual was able to write some components which generated regular IPs, so that no other component in the application needed to know that the input was paper tape. You could build and debug an application using regular I/O, and then, after you had it working, you could unplug the reader module and replace it with the paper tape reader. This meant in turn that testing could be done without the tester having to monopolize a scarce (and slow) piece of equipment.
The two most widely used technology-dependent components are "Read Sequential" and "Write Sequential". As you might expect from the names, these convert file records into IPs, and IPs into file records, respectively. A matching Read/Write pair of components can be used to encode and decode any file format desired. For instance, you might decide that the medium you are using is so expensive that you want to compress the data as it is stored, and expand it as you retrieve it. Another useful function pair might be encryption/decryption. In fact, any Read/Write component pair can be thought of as embodying a data organization. Generalizing this thought, a sequential Read/Write component pair provides a conversion between a format suitable for processing and a linear format on some medium. For instance, we built a Read/Write pair which was specialized for dumping tree structures onto a linear medium and rebuilding them later.
You may have noticed by now that components very often come in matched pairs, e.g. split/merge, read/write, compress/expand, encrypt/decrypt, etc. This is characteristic of many FBP components - what one component does, another one undoes. In fact the combination of a component and its inverse should ideally result in a stream identical to the original input stream, just as in mathematics multiplying a number by its reciprocal results in unity, or composing a function with its inverse results in the Identity function.
Using separate Read and Write processes not only gives the separation between logic and I/O which is recommended by the majority of application development methodologies, but actually reduces elapsed time. The reason for this surprising result is that in FBP an I/O process which has to wait on I/O only suspends itself - other processes can proceed. This means that FBP applications tend to use as much CPU time as they are allowed to by the operating system. We will be talking more about performance later on.
Another interesting group of components deriving originally from Unit Record are those connected with report generation. These include such functions as pagination and generation of page headings and footings, as well as totalling and subtotalling, cross-footing and other report generation functions. Reports are very often the main vehicle of communication between an application and the humans who use it, and the importance of these facilities to the average business is borne out by the remarkable longevity of IBM's RPG, which, while often regarded as old-fashioned, is still fulfilling a real need in the market-place. Later in this book, I will describe a Report Generation component which was used extensively in our shop.
A good guideline for the functionality of a component is that its specification should not exceed about a page. Some FBP experts have gone so far as to say that the summary of a component's function should not exceed one paragraph, and should not have the word "and" in it. Another guideline we have found useful is that generalized components should not have more than 4 ports (array ports only count as one port). Of course, these guidelines are not mutually exclusive, and they are only guidelines - some components bring so much function together that their parameters are essentially mini-languages, but their usefulness may outweigh any awkwardness in parametrization.
The last category of component is that of "business components". These embody business rules, and should hopefully be relatively simple, especially if you have used the other categories of component as far as possible. We can imagine business components for different business areas - banking, oil and gas, and so on. Some will be more mathematical, others less so. In all cases they represent the knowledge of some business expert.
After functionality, one of the major considerations in connection with designing business components is the likelihood of change. There are some types of business logic which hardly ever change, and others which are changed every time they are run. An example of the latter might be the logic to generate an employee's taxable income statement at the end of the year. It is changed every year and run once a year. It would be very nice if our governments could send out a single reusable component every year which companies could then just plug into their own payroll programs. This also gets back to the question of roles: who installs the new module? Application development or operations staff? If the former, you have an ongoing need for application developers indefinitely; if the latter, can you be sure that the new component will be adequately tested? On the other hand, given the backlog of work that application development usually faces, something which can just be loaded up and run by operational staff is certainly attractive.
It would help if such a component does as much validation of its input data as possible to make sure it is being used in the right context. Ideally a component should never crash - in practice, of course, it is almost impossible to prevent one component from destroying another's data, but it is certainly possible to add validation logic to protect against (say) data format errors. The reusable module could also require that incoming data be tagged with a particular descriptor. Then if the required data format changes, you just need to change the descriptor name. Descriptor names are typically part of the specification of a reusable component, so this fits quite nicely.
The above discussion is really another form of the old compile-time versus run-time debate. In FBP, compile-time comes in two flavours: component-level and network-level. Actually parameters can be specified inside a composite component and still be outside the elementary component which they control! I predict that eventually a lot of business logic will be embodied in rules held in rules data bases. Such rules, written in a suitable language, can then be modified by people outside the normal application development group. These rules may not even be expressed in what a programmer would recognize as a programming language. A forerunner of this is the IBM Patient Care System, in which a surprising amount of the system logic (including screen layouts) was held in the form of tables, designed to be updated by senior clerical or nursing staff. This was very effective, as these were usually the people who had to use the system, and had the most operational experience with it. Again we see the separate roles of application-developer and application-user. If it bothers you to put so much control in the hands of end users, either implement authorization systems to make sure only the right people can modify key data, or specify the rules as tables hard-coded in the application definition, but outside the components that refer to them. This way, control remains in the application development group, but systems become much easier to modify and debug. However, we really should be moving away from requiring the DP department to do all systems maintenance.
If I am right that we will eventually see more and more business logic being either imbedded in reusable components or captured as explicit rules on disk, then the role of the current higher level languages in the future should diminish. We found in our experience that, given a powerful set of reusable components, people would go to enormous lengths to avoid writing HLL code. Much of the time, the resulting poorer performance does not matter - the Kendall Report (1977) contrasted the running time of the average program with the person-months required to develop it. Programs that took 6 person-months to develop might run for a few minutes over their entire lifetimes. So, most of the time, minor increases in the amount of CPU time really make no difference. Only in the case of long-running jobs run regularly is it worth-while to do performance tuning, and, as we shall see in a later chapter, it is much better to instrument an FBP application to find out where your real bottle-necks are than to try to guess ahead of time and waste time optimizing code which doesn't affect your system's performance much. There are in fact a number of ways to do performance tuning in the FBP environments, once we figure out where the real leverage is.
Far more important than CPU time is human time, and the fundamental question is really what is the best way to spend valuable human time. When deciding to develop a new component, you must take in account the expected return on your investment ("ROI" for short). Every new component must be documented, tested, supported, advertised and incorporated into educational material (well, ideally - sometimes not all of these happen!). Small wonder, then, that our experience with FBP shows that application developers using FBP avoid writing new code - being responsible people, they are aware of the burden they take on the moment they start coding a new component. However, this realization isn't enough - we have to change the economics of the situation. Code is a cost item, as Dijkstra and others have pointed out, and someone who adds to the total amount of code when it is not justified is costing your company money, now and into the future. I have started suggesting, only half in jest, that programmers should be "penalized" for each line of code they write! In fact, some program improvements involve removing code - is this negative productivity?! It has been pointed out many times that people will modify their behaviour according to how you measure them - and companies which still measure productivity in Kloc (thousands of lines of code) get what they deserve! Conversely, someone who produces a useful reusable component improves the productivity of all of its users, and deserves to be rewarded - some companies have already started trying that - the key word, of course, being "useful". N.P. Edwards, who I mentioned in an earlier chapter, was a key player in getting IBM to move to reusable parts in the hardware area, and he has told me that the key breakthrough there also was in changing the economics of hardware development.
Someone who has talked and written extensively about the importance of reuse is T. Capers Jones (e.g. Jones 1992) - he has also been aware of my work for some years and has been supportive of it. He has been active in promoting the use of code-independent metrics, such as Allan Albrecht's now well-known Function Points, for measuring productivity and has done a lot of work on the potential of reuse for reducing the costs of application development.
How do we know whether a tool is useful? The only way is to measure its use. Will people use it? They will if it fits the hand, and if you provide support and education for it. That in turn means you have to have the infrastructure in place to allow your company to take advantage of this new technology, and measures and incentives to get people moving in the right direction.
There is also the opposite question: what if the tool is "less than perfect"? Just as with real tools, there is no perfect tool - there are only tools which fit your hand more or less conveniently. Like many programmers who tend to be perfectionists, you may be tempted to postpone putting something on the market because you feel it isn't finished yet. The question should be: is it useful as it is? You can always enhance it as time goes on, provided you keep the interfaces stable (or provide "expansion ports" but maintain upward compatibility). In fact, after it has been in use for a while, you may find that the extensions people really want are not at all what you expected. Since your reusable component will hopefully be in widespread use by this time, it is important that you allow extension while maintaining upward compatibility. In FBP, the fact that ports are named helps you to do this; also parameters (described in the next chapter) should be designed to be extendible. Parameters can be in string format, with delimiters, or, if fixed form, it is a good idea to insert a few zero bits - these can always to changed to ones to indicate that there is an extension portion.
Another kind of modification which will happen to your modules occasionally is error correction. It is certainly a pleasurable feeling to know that you have improved a component which many people are or will be using, and you might think that your users will welcome the change with open arms. Unfortunately some of your users may have adjusted to the error, and will need to be convinced that you know what is right for them. The other thing users do is take advantage of undocumented features. I talked about a tool fitting the hand - it may fit the hand, even with an error in it. One team found an error in one of the DFDM components, but instead of telling us about it, they carefully compensated for it. When we fixed it, their programs stopped working! I think they were quite indignant for a while until everybody realized what had happened. We had to spend some time explaining that everyone would be much better off if we fixed the bug rather than leaving it the way it was! There is a very important rule which you should impress on your users: If it isn't documented, don't trust it. IBM learned the value of this one by bitter experience and has accepted its wisdom since the day some bright user discovered an undocumented instruction on one of the 700-series machines. When IBM started making invalid instructions result in exception conditions, I'm told quite a few programs in universities and other places stopped working!
The next question is: how will people find out about these components? There is a common misconception that reusable componentry doesn't work unless you have an elaborate catalogue, which people can interrogate to find the tool they want. On the other hand, Wayne Stevens has pointed out that most examples of reuse in everyday life are done very naturally without any catalogue. We know by heart most of the things we use commonly. Let's say you go into a hardware store because you want to attach a wood base onto a ceramic pot - you will be familiar with half a dozen types of fastener: glue, nails, screws, rivets, etc. Most of the time you will know exactly what kind of glue to use. In this case, let's say you are not quite sure what is best. You still don't have to scan the entire store - most of the time, you can go right to the shelf and start reading labels. What do you do if you are not sure where in the store to go to? You ask a store clerk, who may in turn pass you onto someone who is an expert in a particular area. If your requirements are really unusual, the clerk may have to consult a catalogue, but this is likely to be a rare case. The point is that effective reuse doesn't require catalogues, although they can certainly help.
To try to measure the productivity gains we were getting from DFDM
IBM Canada, we kept statistics on the amount of reuse taking place over
a number of projects. The figures for three projects are shown in the
diagram (the numbers relate to components):
|PROJECT||Type||Unique||Occurrences||Reuse Factor||1 / Figure of Merit|
In this chart, "project" means components coded specifically for the project in question, while "general purpose" means components that are off-the-shelf (already available and officially supported). "Unique" means separate components (separate pieces of code), while "occurrences" means total number of processes (component occurrences or network nodes). Thus project A used 154 distinct components, of which 21 came off the shelf, but accounted for 305 of the 489 processes (about 3/5). GP/T means General Purpose as a fraction of Total, and it is interesting to compare the GP/T for unique components against the GP/T for component occurrences.
The "Figure of Merit", to use Bob Kendall's phrase (Kendall 1988),
calculated as follows: number of project-coded components divided by
total number of processes. Since the first figure represents the amount
of work a programmer has to do (apart from hooking together the
while the second figure represents the amount of work the program is
we felt that the figure of merit was quite a good measure of the amount
of real reuse going on. DFDM had been in use about 2 to 3 years in that
shop, and we had about 40 off-the-shelf components available, so quite
a lot of the common tasks could be done without having to code up any
components. However, when the programmer did have to code up
you will notice that quite often this code could also be reused, giving
reuse factors greater than 1 (Project C had a factor of 27.0). In the
example in the above chart, the programmer only had to write 2
although there were 270 separate processes in his program. (You can
figure out that this project involved running 27 different files
essentially the same 10 processes - so it did a lot of work, with very
little investment of programmer effort!).
[In Bob Kendall's "Figure of Merit", obviously smaller is
better! In the on-line version, I have shown the reciprocal as it
seems to be more intuitive to have the larger number indicate better
Although we thought at first that this last case was just a quirk, we turned up quite a few applications which were not that different from this one (e.g. Rej's letter quoted in the Introduction).
Here are some figures from an evaluation of DFDM quoted from at the beginning of this chapter:
All of the function in the DFDM pilot application is performed by 30 unique coroutines (this is the number of coroutines that an individual would need to be familiar with in order to understand the function of the application).
A total of 95 occurrences of these 30 coroutines make up the application ..... providing a 3:1 reuse ratio.
These 95 coroutines are leveraged through the use of subnets and CNS [Compiled Network Specification] networks to perform the equivalent work of 225 unleveraged coroutines.
Some companies have tried to encourage people to write generalized code by offering them money or kudos. One counsel I would give them is that you need to monitor not how many components someone has written, but how often it is used. An appropriate analogy is the system of royalties in the publishing industry. Every time a module is used, the author should get some kind of token, be it money or recognition. This will ensure that your company will not accumulate a collection of wonderful, Rube Goldbergish gadgets sitting on the proverbial shelf, gathering dust.
Let us say that you are all convinced that reusable code is the way to go - how do we get it adopted in your particular shop? You will find (unless all your people are super-altruists) that the biggest enemy of reuse is not technology - it is human psychology. While many people will accept the new ideas enthusiastically, others will resist them, and for several different reasons. People who have become good at delivering applications under time pressure very often feel that they must at all costs maintain control of everything they use, and in fact all their experience has taught them that this approach works. Components developed by others will be on their critical path, and they will be pulled between the desire to reduce their own effort by using pretested components, and the fear that the components they are relying on will not be ready in time, will break or will not be maintained as the environment changes. They have to become convinced that someone will support these components - if necessary, on a 24-hour basis. This may not be necessary technically, but may be very necessary psychologically!
Another source of resistance is simply that some programmers love the bits and bytes and don't want to become mere combiners of precoded components. There is a role for these people, writing the components to specs. As we said above, two different roles seem to be emerging: component builders and component users. In my view the latter need skills very similar to those required by analysts. They need to be able to talk to users, gather requirements, and even build systems or prototypes of systems. For the more complex parts or parts which have to perform better, they can subcontract parts to the component builders. This is the domain where the programmer's programmers ("Merlins", as a friend of mine calls them) can shine. In some senses, a component becomes an encapsulation of their particular skill or knowledge. I have found that it makes sense to get "tighter" about the external specs and "looser" about how the code is built internally. This lets them express their creativity, while still serving the needs of the organization as a whole. Of course, it must not be so poorly written that it doesn't perform well! And it absolutely must deliver the function according to the specs! Once those are assured, then your only concern is maintainability. Generalized code should be maintainable, but you probably don't have to control the format of every internal label!
A programmer once said to me, "I don't like DFDM because I don't get dumps"! At the time I took this to mean that because programs built using FBP tend not to crash, it is hard for programmers to get a feel of how they work. Does not knowing how the engine of your car works make you nervous? It probably does affect some people that way, but most of us don't care. Later, I realized that it also brings up the very fundamental question of trust - if the users of a package don't trust the package or its vendor (same thing, really), they are not going to be happy... And trust is fragile: hard to build up, and easy to damage.
Let us suppose that your company has become convinced that developers should not keep "reinventing the wheel", but that, like most companies, you have only reached the stage where you are maintaining a library of shared subroutines. How do we get formalized sharing of components in place? Suppose I find out that Julia is working on a module which is pretty close to what I want but it needs some tweaking to fit my needs. In most shops, we don't even know what to call it. Companies that have just started to grapple with naming standards often think it's neat for module names to start with the project code. For instance, if I am managing project ABC, then I can name all my modules ABC-something. This way, I don't have to worry about my module names conflicting with those of other projects. Even the library names will often have ABC built into them! So, even to be able to find the code, we usually have to have some kind of enterprise-wide naming convention. Next question: who does the modification of the code and who pays for it? What if Julia's schedule slips and starts to impact my schedules? Even if everything goes really well, who will maintain it, document it, and support it?
Many writers about reuse agree that the only solution is to set up an independent department to write and maintain components. This department must have enough resources to do the job properly, which also involves publicising and selling their product. One tendency which must be resisted is that such departments often get tied up producing complex, generalized tools for a few users, or even for none - they just figure the component would be neat and they'll worry about selling them afterwards. Remember the principle of ROI: the company as a whole will get more bang for the buck out of a lot of simple tools, especially if they communicate well with each other, rather than from a few very complex ones. Since good tools will often start as special-purpose modules which some other group has found useful, there must be a path for promoting such ad hoc components to a place where other people can find them and rely on them. Our centralized software support department must have ways to beat the bushes for new and interesting components and must then have ways to evaluate whether potential customers are interested (otherwise why go to all that trouble?). It must also avoid getting sucked into writing or upgrading complex tools which have only a small market. It is a service organization, so it must be service quality oriented, not just a group of self-styled experts who know what is best for everyone else. It must become entrepreneurial, but not exclusively bottom-line oriented. In short, it must follow good financial and engineering practices. If this takes a major shake-up in the way your organization is structured, then you should really get started as soon as possible!
I believe that, unless companies start to bring engineering-type disciplines to application development, not only will they fail to take full advantage of the potential of computers, but they will become more and more swamped with the burden of maintaining old systems. You can't do a lot about old systems - I know, I've seen lots of them - but new systems can be built in such a way that they are maintainable. It should also be possible to gradually convert old programs over to the FBP technology piece-meal, rather than "big bang". A director I once worked for called this "converting an iceberg into ice-cubes"!
I believe all true disciplines follow a sort of cycle through time, which may be represented using the following diagram:
Innovation can only be founded on a base of solid knowledge of what went before - otherwise we are doomed to keep rediscovering the same old stale concepts. On the other hand, tradition without innovation cannot result in progress, and innovation is useless unless the word is gotten out to people who can make use of it. As I pointed out above, business application development has not really changed significantly since I started in the business in 1959 - but I really believe that now, at long last, we can start to see the promise of application development becoming a true engineering-style discipline.