[Home]OOPAndFBP

FlowBasedProgramming | RecentChanges | Preferences

OOP And FBP

This is a conversation with TomLaudeman, NoahHealy, JoeMorrison and others - although slightly off-topic, this subject keeps coming up (I guess I started it). FBP has in fact been called "the first OOP system" ;-) See also http://www.jpaulmorrison.com/fbp/oops.htm (written in about 1992).

I wonder what is the best way to structure this...?

A colleague sent me the following note:

I found this site http://csis.pace.edu/~bergin/patterns/ppoop.html, and the quote near the end of this page really got me thinking: "Every if and every switch should be viewed as a lost opportunity for dynamic polymorphism."

I was curious to know what you thought of this statement.

Here was my answer (combined from two successive notes):

I used to call these debates "religious wars" - but then I am old-guard programmer... You know, for sheer maintainability, I would say tables are better than any of these alternatives (tongue slightly in cheek). Just build a table with a row per operating system, and you can throw in as many attributes as you like (text, goodness, etc.). If it's something like DB2, you can add more rows and/or columns without even having to recompile :-) Surely all the verbiage given in the cited web site renders down to something like:

Then I tossed the discussion on the waters, and some of the responses are given below.

--PaulMorrison

Yes, I definitely prefer all control structures to be data. This is basically the same argument that the OOP people advance. However, with their data encapsulation paradigm, that information becomes embedded in the code. Reading code to discover the information it's operating on has probably consumed as many man-years of time from the computing industry as Scott Adams has with Dilbert.

--NoahHealy

Yes, a table model is far better than OO. I think a table oriented version of that OO example requires a single SQL statement acting on a single table.

It is rare to see a computer language comparison that removes confounding factors like experience from the evaluation. Also, modern middleware-servlet-engine-SQL-integrated-rapid-application-development systems can't really be compared with EDT and cc on a PDP 11/23. Geez.

I *can* compare Deft (our new table oriented declarative language) and Perl. I've written some of both this week. The Deft has fewer bugs, was faster to develop, the code is shorter, and the code is easier to read.

The example at http://csis.pace.edu/~bergin/patterns/ppoop.html is outdated and probably wasn't that good originally. It requires seven files just to create a trivial example (with no comments, and there's no "main" so that's not even complete code).

I don't think the question of OO versus relational is necessarily subjective. My impression is that most people choose a programming language based on emotion (or what they were taught in college), but that doesn't mean there aren't objective measures.

I'm not a fan of "if" and "switch", but in the hours it takes to get a grasp of dynamic polymorphism, a programmer could write/comment/update a lot of ifs and switches. The OO code needs a page or two of comments. The old-school switch is fairly transparent.

--TomLaudeman

I agree that the http://csis.pace.edu/~bergin/patterns/ppoop.html example isn't a very good one. There are plenty of real-life cases requiring dynamic polymorphism that call for different solutions.

For example if the problem is to send email messages from various platforms, then SMTP becomes the primary point of problem decomposition, not an OO interface. A client is custom-written for each platform and communicates with a shared server. Clients can be written for new platforms and work seamlessly with the existing server. In this case the client/server architecture automatically solves the problem of making the "email system" extensible to handle new client platforms.

If the problem is to draw 3D graphics, then an API like OpenGL? is the natural point of problem decomposition, allowing a single piece of gaming software to use the appropriate rendering software for each platform.

I can imagine other situations where OO polymorphism and table models are more appropriate. But I can't imagine a single paradigm that solves all dynamic polymorphism problems in computing.

So, coming back to the original provocative quote, here's my proposal:

"Every *if* and every *switch* should be viewed as an opportunity for lively debate" :-)

--JoeMorrison

PaulLieberman? was included at this point.

I'm unclear about some (apparently) fundamental OOP assumptions.

From a philosophical point of view, doesn't polymorphism obscure the low-level intent of the programmer? I think the high-level intent can be ignored since that is generally trivial (and optimizing trivial problems is pointless).

In all the non-trivial OOP examples I've seen, it has literally taken hours (or days) of examining code to track down what will happen for a given method invocation. The code becomes very difficult to grep since there are many methods with the same name but different abilities. I've seen a case using a Perl object-management module where the methods did not exist at compile time. A whole huge class of methods were dynamically instantiated at runtime. Instead of being transparent and easy to modify/maintain, the code was a snarled nightmare of nested directories full of meta-compilands.

On the issue of code reuse, I'll return to a trivial example. My understanding is that OOP says it is good to have polymorphism so one can say

print(object);

and "object" is printed, no matter what it is. Of course, this isn't quite true because someone has to implement a method to print the object in question. If the method doesn't exist, then the code shouldn't compile, and certainly won't run. With a complete set of methods to deal with "legacy_object" and "novel_object" in the print class OOP can successfully do something like this:

print(legacy_object); print(novel_object);

The equivalent procedural code would read:

print_legacy(legacy_reference); print_novel(novel_reference);

Aside from saving a few keystrokes, I don't see any advantage to the OOP alternative. If the requirements change for printing, the OOP people will have to change 2 methods of class print. The procedural people will have to change 2 subroutines. Neither OOP or procedural will have to change the original code.

Am I missing something by using too trivial an example?

--TomLaudeman

In my book, in the chapter on OOP - http://www.jpaulmorrison.com/fbp/oops.htm - I said the following:

Two last comments about genericity: my (limited) experience is that application developers don't use it very much, and its main triumphs seem to be in the GUI area. When asked to give examples of genericity, writers on OO always seem to pick "display" and "destroy". It may be that, in business, you don't often use the same messages for different classes. For instance, at the numeric value level, subtracting a number of days from a date is quite different from subtracting a date from another date, or a date from days (you can't), so the user has to be very aware of the types of the operands. To me this means that it doesn't buy you much to be able to call them all "-". In fact, in Smalltalk you often see message names like "subtractDaysFromDate", to tell the user what types the message expects (there is no type checking at compile time, so this is particularly important).

... which seems to support what Tom is saying.

--PaulMorrison

Most arguments I've seen for OOP had at their base the ideal of data encapsulation. This is held as an unrestricted good. Many of the early popularizers referenced Parnas's work on information hiding. The argument was advanced that OOP = data encapsulation = information hiding = good design. With a little hand waving, a pretty cool language (smalltalk) and some big corporate and academic pull (Xerox PARC guys and where they went latter) a paradigm was born. The only real problem I see is that in the OOP = data encapsulation = information hiding = good design chain the first two links are suspect and false respectively.

For OOP to imply data encapsulation you have to implement it that way. If you do that then all the operations on that data are tied to that data type. As Brian Kernighan pointed out in 'Why Pascal is Not My Favorite Programming Language' this means that any higher order function that you may wish to create actually requires quite a bit of jiggery-pokery in the class hierarchy for it to be able to effect multiple types.

This problem fails to manifest under three conditions, all three of which seem to frequently apply to object shops. If you are content to churn out large quantities of similar code then this isn't a problem. Of course that willingness and the code produced are both problems where I have seen this is in corporate environments with mediocre programmers all creating largely redundant code and in the perl MakeMethods? package where perl will write a bunch of your repeated code for you. The second condition where this isn't an issue is if there is no higher order function that you are aware of which could be useful to your application. In GUIs this is frequently true. For the kind of people that are happy to cut and paste 100 lines of code a day and change three of them it is also frequently true that they are unaware of any useful generalization. The third thing that makes this not such a big deal is that what we call OOP isn't what Alan Perlis was talking about as he has famously pointed out on occasion. We break encapsulation in order to write generalized functions all the time and usually don't lose any sleep over it. However you have to wonder about a technique whose justification is abandoned in order to do 'advanced' programming.

As for data encapsulation = information hiding which is the level of the discussion we are currently pursuing this doesn't hold in general. As Joe pointed out various problems have differing 'natural' breakpoints which may or may not correspond to the operations available on a piece of data. David Parnas studied the effects of knowing anything about the far side of these breakpoints while working on this side and discovered that its pretty much always negative. However a quick gedanken experiment will reveal that not knowing something you need to know isn't going to work very well. As Tom points out we frequently do need to examine the implementation both for bug fixes and design reasons and then oop's (pun intended) style throws up unnecessary roadblocks which will frequently lead to the person learning more than required to solve their problem which means oop may be information revealing and therefore automatically bad design. --NoahHealy

I was just reading: http://en.wikipedia.org/wiki/Polymorphism_%28computer_science%29

That entry has some discussion as well. We could move this to a wiki, although I'm still in the "asking questions" phase.

After more consideration, I think there is (should be?) a distinction between operators that exhibit polymorphism as part of a langauge definition, and polymorphic behavior allowed in programs.

Langauge definitions tend to be fixed (a good thing) and standard. The generally agreed upon polymorphic behavior of plus (addition) is different from polymorphic functions written in the langauge.

It is a bit subjective, but "peculiarities" of the langauge are generally considered "features". These features are well documented and ostensibly exist because they are desireable and good. However, overloading the + operator to suddenly (and unexpectedly) start concatenating numbers would be an abomination. i.e. 3.14 + .00159 = "3.14.00159". This is an extreme example of what may be bothering me about polymorphism in general. The results aren't necessarily predictable.

Perhaps I'm missing a more fundamental idea. What was the original point of allowing programmers to write polymorphic classes (e.g. in Java)?

--TomLaudeman


JoeMorrison's answer to Tom's second note:

I think so. If the datatypes of the arguments are known at compile time, as is the case in your example, then object-oriented programming is really just about syntax (i.e. saving typing) which isn't all that interesting. It gets more interesting if the datatypes aren't known until runtime.

Consider the example of an e-commerce system (say you're trying to build the Amazon.com web site), and you sell many types of products. Different products have different shipping rules. You might end up with a "cartItems" data structure, containing all the items the customer ordered. You might want to write something like (this is pseudo-code, not a real language):

  processOrder ()
  {
    foreach item (cartItems)
    {
      item.ship ();
    }
  }

In this case, the datatypes of the items in the shopping cart aren't known until the program is executed, since they depend on user actions. In most OO languages, the code to deal with each type of item (book, gift cert, etc.) would be centralized, and could be changed without ever modifying the "processOrder" function. The goal is to avoid this sort of thing:

  processOrder ()
  {
    foreach item (cartItems)
    {
       if (isBook (item))
       {
         shipBook (item);
       }
       else if (isGiftCert? (item))
       {
         shipGiftCert? (item);
       }
       etc.
    }
  }

since (assuming there are lots of functions for operating on shopping carts, e.g. processOrder, verifyOrder, computeDeliveryDate?, or whatever) you wouldn't want the code for each datatype scattered throughout your application.

OO isn't a panacea, but I think there are often cases like this one where OO can help improve code maintainability. For other situations, programmers can simply choose to ignore the OO features of the language.

It's true that dynamically dispatching to different functions based on runtime datatype can make it hard to figure out what code is actually being called in a given method invocation. But good debugging tools can help cope with that.

One interesting point is that most OO languages allow you to do dynamic type dispatch on a single variable (i.e. the "object"), and the other arguments in the method invocation are treated differently - they can only be overloaded based on compile-time type information. For example in the method invocation:

  foo.bar (a, b, c);

many different chunks of code could be executed, depending on the runtime type of "foo", and the compile-time types of "a", "b", and "c". Over the years, various people have asked the question: Why treat foo specially? Why not write:

  bar (foo, a, b, c);

and dispatch on the runtime type of *all* of the arguments? The result is an elegant extension of OO called "MultiMethods?" or "Generic Functions". I think it was pioneered in CLOS (the Common Lisp Object System), but it's available in Python too.

--JoeMorrison


Joe's answer to Noah's 2nd note:

I'm with you here. I'd say that OOP is more about choosing the right balance between data encapsulation and code reuse (two good things that are fundamentally opposed.) As you say, too much encapsulation can result in programmers having to do a lot of cutting and pasting.

--JoeMorrison


Tom's answer to Joe's next to last note:

That is a better example, but unfair since you put the procedural decision tree in the calling code, but the OOP methods were hidden in class item. The procedural code would be nearly identical to the OOP example:

  processOrder ()
  {
    foreach item (cartItems)
    {
      ship (item);
    }
  }

I'll grant you that in the procedural case the programmer has to be smart enough to create data structures that can be inspected. This is trivial in Perl, and probably trivial in C. Every "item" needs to have at least one common field "type" in order to implement a typeof() function (or the set of functions isBook(), isGiftCert?(), etc).

It is likely that with multiple inheritance the picture changes, but multiple inheritance itself adds (what seems to me) a very nasty concept that your standard Java programmer isn't going to grok.

p.s.

On a different angle, the Wikipedia OOP reference points out that OOP has trouble capturing business logic. I'd have to think a little to verify the truth of that, but it seems sensible.

http://en.wikipedia.org/wiki/Object-oriented

In Deft (which is declarative) we integrated a state machine to handle problems like stateful behavior in a web site.

I've come to the conclusion that "information hiding" really should mean "algorithm hiding" not hiding the details of data structures. I don't think algorithm hiding is practical in functional, procedural, or OOP, which leaves declarative as the big winner for paradigm shift.

Again from the Wikipedia OOP entry:

"The bottom line of the conflict seems to be that OOP is mostly a behaviorist view of software design which conflicts with the data-centric, declarative view."

The Starlog people have a powerful idea, but I just can't imagine using a predicate logic language for my daily needs. http://www.cs.waikato.ac.nz/Research/starlog/

--TomLaudeman


Joe's answer to Tom's note:

Hello Tom,

I might be missing something, but the point of my OO shopping cart example is that you can add new item types over time, and the OO machinery will always call the appropriate "ship" method, without any requirement to modify or even recompile the "processOrder" function. The dynamic dispatching always "finds" the right ship method to call.

 > The procedural code would be nearly identical to the OOP example:
 >
 >  processOrder ()
 >  {
 >    foreach item (cartItems)
 >    {
 >      ship (item);
 >    }
 >  }

The key difference is that in this procedural version, you need to write a top-level "ship" function which contains some sort of manually-coded dispatch based on type (to invoke the other, type-specific ship functions), and this dispatching code must be modified whenever a new type is introduced into the system. This problem is present at all locations in the application that need to operate on shopping carts.

> I'll grant you that in the procedural case the programmer has to be smart enough to create data structures that can be inspected. This is trivial in Perl, and probably trivial in C. Every "item" needs to have at least one common field "type" in order to implement a typeof() function (or the set of functions isBook(), isGiftCert?(), etc).

Exactly. This is what OO languages do automatically - reduce the need for programmers to explicitly represent type information, write methods to inquire about type, and manually maintain "switch" statements scattered throughout an application. As you say, this reduces the requirement for programmers to be smart, or to put it differently - improves programmer productivity.

I'm not a big OO flag-waver; I agree that OO concepts can be implemented in procedural languages without a huge amount of work. But in cases where that work can be delegated to a compiler, I say - why not? I appreciate OO as a productivity enhancement tool in my software development arsenal, just as I appreciate exception handling facilities, garbage collectors, and any other facilities that help automate software development tasks. As long as I can ignore those tools when they don't fit the problem at hand, I don't see the downside.

--JoeMorrison

Tom's answer:

I'm sorry it took so long for me to see the savings of dynamic dispatch thing. Still, this is a feature I rarely need, and I have reservations about the reductions in complexity and typing. (In contrast, I constantly use Perl's hashes and regular expressions.) I have worked on a codebase that had a huge problem of exception processing, but that company's problem was so massive that they needed a big data driven workflow capture/processing subsystem.

What happens in the OOP example when shipping different types of items requires a different workflow, not just the same steps with different things happening at each step?

Noah might correct this, but it seems that in a table oriented programming (TOP) world, the equivalent feature to dynamic dispatching is inclusion of one or more columns in each row which map to functions required by that row. A record has columns and data something like the following:

 order_id item_id item_type ship_function  coupon_function
 123      456     book      ship_book      book_coupon
 789      012     gift_cert ship_gift_cert gift_cert_coupon

The process_order function has lines:

 call($ship_function);
 call($coupon_function);

which invokes the function named by the ship_function and coupon_function columns on each row of the table.

I suppose this is good data driven programming, but I'm wondering if there is a declarative/TOP philosophy or some syntactic sugar to make this more natural.

-TomLaudeman

PaulMorrison's 2-bits' worth:

I've been watching this conversation unfold - very interesting - but I felt impelled to comment on your last note. I think I prefer the TOP approach, but I really prefer to have my tables normalized (I believe you and Noah don't require this). If you normalize your table, then ship_function and coupon_function become columns in an item_type table. Now, even with normalized tables, this approach doesn't seem to be much better than the OOP notation, as the code will have to be written, installed, etc., and you need a language which can be driven from tables. So instead, let's provide two coded black box functions: ship and process_coupon, and store parameter information in columns in the item_type table. As long as the processing for new item_types can use the same parameter columns, maintainability is excellent - you don't even have to bring down your system. If you have to add new parameter columns, then you have to change the black boxes, but hopefully they will only grow slowly. Also, the tables are passive, so your only language restriction is whatever you choose to write the black boxes in... Remember, maintainability is more important than development speed; and the key to maintainability is considering what may change - although this cannot always be predicted. Hope this makes sense.

--PaulMorrison

Tom's answer (he rephrased the first paragraph):

re: process_order and ship as black boxes. The goal is for TOP to emulate (or implement) dynamic dispatching. Creating a column with the name of a function that applies to the current record is workable. However, maybe there is a more elegant method for TOP to handle the dynamic dispatch problem. Perhaps not. Creating a new column is easy, it is orthogonal to the data model, and it is conceptuall simple and clear.

re: normalization Deft is funny about the "table". It might be better described as a first normal view of a relational database. Deft's "variables" are columns of first normal records of an implicit table. The data in memory is conceptually first normal form (although implementation may vary from the concept). The data as it resides (presumably) in an SQL database is fully normalized.

First normal data has profound upsides, and using it only requires some aggregation functions that deal with traversing relations. Noah and I realize that people think this sounds crazy, but we've got beautiful working code. We need a good way to explain why first normal is better than variables and data structures. We are not advocating first normal databases.

We've discovered a few amazing aspects of first normal and declarative TOP:

- no data structures, and we don't miss them

- no algorithms (well... the programmer says what transformations to apply to the data)

- records can be processed in any order, and can be processed by multiple instances of the code (e.g. a grid or cluster).

--TomLaudeman


FlowBasedProgramming | RecentChanges | Preferences
This page is read-only - contact owner for a password | View other revisions
Last edited November 20, 2007 9:25 pm by PaulMorrison (diff)
Search: