Flow-based Programming

Panta rhei (Panta rhei) - Everything flows.

FBP vs. FBP-inspired Systems

Some readers may have arrived at Flow-Based Programming (FBP) by way of NoFlo, which is a JavaScript-based system motivated by my book Flow-based Programming (2nd edition), and which uses a number of the same terms and concepts. This project, started by Henri Bergius in 2012, now called Flowhub, implements a number of the FBP concepts, and has been creating a significant buzz world-wide since then. In the fall of 2016, Flowhub and other flow-based programming assets were purchased from The Grid, and are now up and running as Flowhub UG, a company registered in Berlin, Germany.

While a lot of credit is due the NoFlo team for bringing FBP to the attention of the computer world, what the developers of NoFlo call "FBP" in fact differs in a number of respects from FBP as it has evolved over the last 40+ years. While NoFlo shares with FBP a number of technical and philosophical ideas, NoFlo is much more similar to what we now call "conventional" programming - procedural, algorithmic, one-thing-at-a-time - and does not truly embody the "FBP paradigm shift", in which application development can be thought of as like designing a data processing "factory". The latter is a very different way of looking at application development.

Ali Razeen, at Duke University, has pointed out in an insightful 2015 note that a number of people have now built software which has the componentry and "configurable modularity" features only of FBP, usually in combination with some visual representation, and assume they have built an FBP implementation. He then goes on to say that these should not be viewed as true FBP implementations, as they are missing some key characteristics of true FBP - mainly, asynchronism and information packets with unique ownership and lifetime - and so typically miss out on the critical paradigm shift... and a number of its attendant benefits. NoFlo is an example of this type of system. Because of the proliferation of such packages, we will use the term "FBP-inspired" (as suggested by Joe Witt of HortonWorks) when it is necessary to distinguish between them and FBP proper. You may also see the phrase "classical FBP" showing up from time to time, particularly in discussions with proponents of FBP-inspired systems.

NoFlo is based on Node.js and is written in JavaScript and CoffeeScript. These languages basically support a single-threaded implementation, although they can achieve some asynchronism by the use of "callbacks". Although NoFlo and its relatives can simulate asynchronism to some extent, only one thing is happening at a time, and they are limited to using only a single processor. While it is very understandable that people will assume that adding configurable modularity, componentry and visual design onto conventional programming should result in a powerful combination, while not getting too far away from the conventional programming that they are used to, it is my feeling, backed up by several decades of experience, that this does not really result in an improved developer experience or more maintainable systems.

We will be using the term "von Neumann paradigm" from time to time. For those unfamiliar with the term, it refers to a computer design where a single instruction counter walks through a program accessing a uniform array of non-destructive-readout memory cells. This has in fact been the standard computer architecture for several decades, but people are increasingly finding it inadequate for today's challenges, as shown by frequent cost and schedule overruns, weird bugs, and difficulty maintaining large applications. More and more writers have started to point out that these problems derive in large part from the architecture itself. Unfortunately programmers are exposed to this approach from the very start, and have a great deal of difficulty breaking loose from it! Ken Kan has pointed out this quote from Edsger Dijkstra (thanks, Ken!):

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

With all due respect to Dijkstra, it's not just BASIC! There is a basic problem with the von Neumann paradigm itself, but, because we have been taught since the '50s that you can do anything with it, this paradigm is very seldom questioned. I have frequently detected a certain degree of nervousness on the part of many programmers encountering FBP for the first time, at not being able to control the exact timing of every event in a running application! This is in part due to the very sensitive nature of the von Neumann storage model, and the fact that it confuses data with its storage medium.

I have been wrestling with how best to convey the difference between the old "von Neumann" storage mental model and that of Flow-Based Programming, and I am starting to think that the description in Chap. 3 of the book, "Flow-Based Programming", says it best. Since it is a little long for this essay, I would ask the reader to click on Chap. 3 - Concepts online, or look it up in their copy of the book - do a find on Fig. 3.1, and continue from there.

We sometimes refer to FBP as a "new/old" paradigm, because in fact its approach and methodology has parallels with Unit Record systems, which were used for the first data processing applications and were highly asynchronous and component-oriented. When these applications started being replaced by computers, which seemed so much more powerful, a lot of useful concepts were lost... which FBP is now reintroducing.

An application built using FBP may be thought of as a "data processing factory": a network of independent "machines", communicating by means of conveyor belts, across which travel structured chunks of data, which are modified by successive "machines" until they are output to files or discarded. The various "machines" run in parallel, or interleaved, as determined by the number of processors in the machine. It should be pointed out that this same image can be applied to networks of computers or other devices - Wayne Stevens pointed out that FBP provides a "consistent application view" from "maxi" to "mini". Granted each FBP process is a von Neumann program, but it runs independently of all other processes, and so tends to be quite simple internally. Almost all of the data that an FBP process deals with is held in "information packets" (IPs) or in method local storage. Unlike in conventional programming, the programmer does not have to worry about controlling the exact sequence of events - all s/he needs to concentrate on is the transformations that apply to the data to convert the original inputs to the desired output.

More importantly, the ways data is viewed in FBP vs. conventional programming (as well as many FBP-inspired systems) are completely different: in FBP, data is managed in packets (IPs), which have a well-defined lifetime, from creation to destruction, and can only be owned by one process at a time, or be in transit between processes - just like real-life objects. In conventional programming, data does not have a well-defined lifetime or clear ownership, as the data is confused with its storage medium. This, in combination with the single-threaded restriction, leads to many of the weird bugs that bedevil today's complex systems, as it is so sensitive to the exact timing of events that a minor timing error can have catastrophic results!

FBP supports data processing applications (business or scientific), typically long-running and high volume, and, as we have shown, involves a way of thinking (the new "paradigm") that is fundamentally different from that of conventional programming. This paradigm is actually more similar to engineering than to conventional programming, and, not surprisingly, involves a period of what might be called "apprenticeship", during which the practitioner is getting comfortable using its concepts. Conventional programming, by comparison, is as if you gave an engineer a bunch of blueprints and some girders, and told him or her to go build a bridge! It's not surprising that so many systems built using conventional technologies in recent years have suffered from cost overruns, logic glitches, etc., etc., and the problem is getting worse!

While data-oriented models have been used for application design for a number of years, up until now there was no easy way of converting these designs into running programs. Programmers could indeed design systems using data-oriented thinking, but then had to laboriously convert these designs into procedural code. In comparison, FBP provides a seamless transition from design to implementation, and our experience with it shows that it results in more maintainable and in fact better performing systems. It also facilitates communication between designers, programmers, maintenance staff and users. One large program written using an early ("green thread") implementation of FBP had been running in production for almost 40 years (as of the beginning of 2014), processing millions of transactions a night, while undergoing continuous maintenance during all that time, often by people who weren't even born when it was written!

While an FBP process is a "black box" component with its own internal environment and control thread, a NoFlo process is essentially a cloud of callbacks linked by instance variables. By comparison, the FBP mental model is much simpler - indeed, very similar to that of conventional programming - where basically each process has a single high-level method, which can then call subroutines in the regular way, since each process has its own independent call stack. There is then no confusion between the method's local storage and the process object's instance variables. Henri Bergius was able to simulate many FBP-inspired characteristics on the Node.js infrastructure, but some rather basic, and necessary, FBP techniques have no obvious counterpart in NoFlo. For instance, basic FBP business functions such as "Collate" require a process to be specific about which port it wants to receive from, and to be able to suspend until data arrives at that port - this function, or something similar, is being introduced gradually into NoFlo, but it logically requires a related architectural concept, missing from NoFlo, called "back pressure", where an upstream process will be suspended if the connection it feeds into becomes full. One other strange (from an FBP point of view) restriction is that, in the NoFlo world, a process can only send or receive from the highest level method - this becomes obvious when you think about how callbacks manipulate the stack.

In a major divergence from classical FBP, mentioned above, NoFlo lacks the concept of information packet (IP) "lifetimes", by which an IP is tracked from creation to destruction and can only be "owned" by a single process at a time, or be in transit between processes - mainly because they are still stuck in the von Neumann concept of data as a set of anonymous pigeon-holes - confusing the data as "object" with the "location" of the data. This in fact is the reason so many subtle bugs show up in conventional programs. This also explains NoFlo's insistence on allowing one output port to connect to multiple input ports, implying automatic replication of data. If data is seen as an "object", this makes very little sense, just like being able to send one pigeon to multiple destinations at the same time! Or having a single soft-drink bottle pass through two different machines at the same time! Conversely if your view of data is not as an object, you will see nothing wrong with this image. Here is a description from Henri Bergius on how the basic send/receive linkage works in NoFlo:

The actual sending is a normal JavaScript event that triggers the connected inport's callback function. The inport puts the new IP into its buffer and notifies component, again via a callback

and with regard to "back pressure":

Right now the NoFlo buffers are only limited by system memory.

Adding limits and backpressure is certainly something to consider down the line. Hasn't really been a consideration for things NoFlo is usually used for, though.

A consequence of this is that NoFlo requires that all data be processed by one node before being passed on to the next. This becomes prohibitive if we are dealing with large volumes of data packets. Back pressure is the only way I am aware of that allows "infinite" amounts of data to be processed using finite resources! The NoFlo team tells me that they have been making changes to NoFlo to bring it closer to FBP, so we shall see what the future brings.

While NoFlo is appealing because of its ability to support both client- and server-side processing, thanks to Node.js, and because of JavaScript's close integration with HTML, it is still tied closely to "von Neumann thinking". This in turn leads NoFlo designers to break up applications into finer-grained components, resulting in more complex networks. Also all existing implementations of FBP on my GitHub directory can take advantage of multiple cores, with the exception of JSFBP (see below). Because of JavaScript's restriction to a single core, neither NoFlo nor JSFBP support CPU-intensive applications, which are in fact well supported by JavaFBP, C#FBP and C++FBP using Boost. It should be pointed out that the first FBP implementation used "green threads" with multiple stacks, so you could use the same programming style as we do in today's FBP implementations. The underlying OS also supported asynchronous I/O, so, although we only had a single processor, performance was excellent - in fact run time was often better than with conventional programming, as, if a single process was suspended because of I/O, the whole job step did not have to be suspended, as is the case in conventional programming.

Because the NoFlo people use the term FBP so prominently when talking about NoFlo, we will also sometimes prefix "FBP" with the term "classical" when it is necessary to distinguish it from NoFlo and other FBP-inspired frameworks. A number of the latter are starting to appear, such as IBM's recent Node-RED, but, like NoFlo, these systems are different in important ways from classical FBP, based as they are on von Neumann thinking. There is clearly common ground, as both FBP and NoFlo are component-oriented, with multiple nodes (instances of components) communicating indirectly via a list of connections, giving what is called "configurable modularity", and both supporting stepwise decomposition using "subnets". However, our experience shows that it is the FBP paradigm change that offers the most leverage for improved productivity and maintainability in application development. At base, the problem with NoFlo is JavaScript - quoting Ken Kan:

It is too easy to just make FBP work for JS, but what we really want to do is make JS work for FBP!

Recently two colleagues and I have been working on an FBP implementation, written in JavaScript, called JSFBP, based on node-fibers, a package developed by Marcel Laverdet, and therefore in turn on Node.js. This is actually a "green thread" implementation, as were the first two FBP implementations running on IBM mainframes. However, "green threads" do not support multiple cores, a limitation shared by NoFlo, and I am told that JSFBP's dependence on "node-fibers" will likely prevent it from gaining wide acceptance. However, this implementation is very much an FBP implementation, so I have made it available via GitHub, as there may be a role for it in the future.

Because FBP is not trying to cram all sorts of logic into a single thread, it has actually a fairly simple set of scheduling rules, and FBP components have a fairly simple internal structure. In general, processes in FBP-inspired systems cannot decide which input port to receive data from (and possibly be suspended). I therefore thought I would compare one commonly used component in classical FBP against the same function written in NoFlo. The result is in "Concat" Component. Here is my conclusion, at the end of this article:

However, when we compare the two, the CoffeeScript code shown above seems to support the conclusion that the attempt to support classical FBP in JavaScript (or CoffeeScript) seems to lead to significantly increased code complexity.

I may be biassed, but I definitely prefer the FBP version...

For those wishing to gain experience with FBP, there is no substitute for reading the book (Flow-based Programming, 2nd edition), and then starting to use one of the FBP implementations such as JavaFBP, C#FBP or JSFBP, or even the C++/Boost implementation currently under development, as described on the FBP web site. JavaFBP has the advantage of being closely integrated with a powerful diagramming tool, called DrawFBP, although DrawFBP can support any data flow language - and indeed can support high-level, language-independent, design as well.

For the time being, users wishing to work with FBP can code up networks using JavaFBP, C#FBP or CppFBP by hand, or JSFBP. Alternatively, they can use the DrawFBP drawing tool, written using Java Swing, which is also quite general, and can in fact generate networks for JavaFBP and C#FBP, as well as the .fbp notation used by NoFlo and CppFBP, plus NoFlo JSON networks. If JavaFBP is chosen, DrawFBP can load any chosen components, display its description and ports, and even check whether all required ports are connected.

While DrawFBP does not support run-time network execution, the networks it generates are complete programs. Its diagrams are stored in XML format, and additional generators can be added easily, or users can build their own generators using the XML format as input. DrawFBP also has the capability of carving out a piece of a network and converting it into a subnet.

FBP and OO

For a discussion of the differences and similarities between FBP and OO, see Comparison between FBP and Object-Oriented Programming (Chapter 25 of the 2nd edition).