[Home]Deft

FlowBasedProgramming | RecentChanges | Preferences

Introduction by TomLaudeman.

Deft is open source under the LGPL. Docs and the code are online:

http://defindit.com/

Deft is declarative (functional) programming, and is also essentially flow based programming and table oriented. Deft is well integrated with SQL, and includes its own templating system.

Here's a fairly trivial example.

 main:
{
    do_sql_simple("bug_pages","","select * from bug_pages");
    do_search("bug_pages", "baja", "url,title,description,keywords");
    dcc("distinct_t", "title", [""], ["rank,dn", "title,at"]);
    dcc("distinct_dt", "description,title", [""], []);
    render("","Content-type: text/html\n\n","/home/twl/public_html/msearch.html");
}

That Deft script drives this web page:

http://laudeman.com/msearch.pl

The associated template is here:

http://laudeman.com/msearch.html

The equivalent Perl is 20 or 30 lines, even if one uses HTML::Template. Note the nested loop. In procedural and OOP languages (C, Perl, PHP, Java, C++) this leads to icky nested data structures (a list of hashes for HTML::Template). In Perl, the poor programmer has to write SQL that is translated into the list of hashes, and then any necessary transformations of the data have to happen in the same data structure.

Deft gets around the data structure problem with a novel (sort of) use of first normal form an a set of aggregation functions. (See http://en.wikipedia.org/wiki/Database_normalization#First_normal_form). In the code above, dcc() is one such function. Implementation details are at the Deft site. Discussion is below.

More about the rationale for 1st normal form:

It is possible to represent any arbitrary data structure in first normal form. Generally accepted wisdom tells us there isn't much point in de-normalizing this way.

However, there are some profound reasons to use first normal.

Speed seems to be essentially the same as normal procedural Perl. Since speed is the same, and memory isn't an issue, first normal is a real joy to use. All the headaches of nested data structures go away. We have a (small) library of functions that parse out the logical depth.

We're trying to think of reasons why this won't continue to work. I'm beginning to think that the strengths of first normal have been overlooked.

There are some caveats: We don't advocate using first normal as a table schema for the database/data repository. We also don't require that every record have identical columns (see discussion further down in page). The actual implementation of the first normal concept may be fully normalized.


All the following by TomLaudeman (--twl), except those marked --nh (NoahHealy), and --pm (PaulMorrison).

Some of the following info is available in the Deft docs at http://defindit.com.

Each Deft process only has a single table, and each table is first normal form (although the low level implementation is not limited to first normal). Each line/process of Deft only sees a single record at a time. This trick is only possible in first normal. People tend to (rightly) disparage first normal due to data repetition, but first normal records can be handled independent of each other.

Deft internally handles iteration through the records of each 'table'. We're now adding features that branch the stream. This will allow 'true' Deft subroutines and massive recursion, even massive parallel recursion.

Noah started using first normal long ago, and has created a class of special aggregation operations that pretty much trivialize managing that seems like an awkward data structure. For instance dcc (memnonic for 'declare control column') creates a new column that encodes relational structure for one or more other columns, as well as constraining and sorting. Our usual example is a data set sessions, papers, and authors for a seminar. To render this dataset into a seminar schedule booklet or web page it required 3 dcc calls. Deft code is Perl, but I'll use a more verbose meta language to illustrate.

creat column "session_control" aggregate on "session" sort by session timestamp

create column "paper_control" aggregate on "paper" sort by paper timestamp

create column "author_control" aggregate on "author" sort by author order

That's it. Sending the data set with these three new columns to the rendering engine creates the booklet or web page. The new columns control sorting, so the records in the data set can be unordered.

In dcc the aggregation function is essentially a distinct.


Noah and I have a data structure theory we're developing, and although we've got strong empirical evidence, we're looking for additional critical examination.

Postulate 1: First normal form is good, and all other data structures are bad (less good?).

A collory to postulate 1 is: "Iteration is bad."

This requires that operations exist in the programming language to support mapping any arbitrary structure onto first normal form. The language also needs to support iteration internally. Deft has the supporting functions and internally handles iteration; we're expanding the depth and breadth of available operations.

So far we've found that anything that is sensible (or useful) with arrays, hashes, objects, or ad hoc structures is better with first normal (and the right tools). The code is easier to write and maintain.

The Deft docs allude to this, but we'd like to get some feedback. The idea is holding up, but it kind of runs counter to all popular programming.


The following three bullets are from a chapter in my book - http://www.jpaulmorrison.com/fbp/bdl.htm - quoting from a 1977 paper by B. Leavenworth ("Non-Procedural Data Processing", The Computer Journal Vol. 20, No. 1, 6-9, February 1977). --pm

TomLaudeman illuminates how Deft handles the three important concepts "arbitrary sequencing", "aggregate operations", and "associate referencing".

It is worth noting that much of the power of Perl is its hashes which are associative lists. There's no need for handles in Perl, and this saves all kinds of headaches. Granted, complex Perl data structures tend to be things like a hash of lists of hashes, but that's fairly easy to manage compared to a struct with a field that's an array pointer to an array of pointers to structs. Of course, the traditional case with pointers gets more exciting when handles are required to cope with how the memory manager is torturing the programmer.

First normal form completely obviates the need for "referencing". Even the table or row is an entirely philosophical entity to the Deft programmer. There are only columns. Some columns have interesting data not really meant for direct use by humans. The Deft serializer dcc() which creates control columns for templates is in this class. Programmers won't every look at the contents of a column created by dcc().

BDL (Tom is referring to the sketch of a business-oriented VHLL described in http://www.jpaulmorrison.com/fbp/bdl.htm) sounds more like a schema language than something that captures business logic. Granted that the schema implies a certain logic to the data, I tend to think of the relational model of the data as separate from how the software supports the business practices that manipulate the data.

Deft relies on SQL databases and SQL for data storage. It works well enough, and if people really want to do something BDL-like, I'd probably suggest UML or a highlevel schema management tool.

Deft does have another feature. We've got a state machine, and this is used to manage the workflow of user interface (UI). It's basically a flowchart for the UI. The code is simple and fairly easy to read, and reaonably robust. By their nature, well designed state machines are graceful if the user does something totally unexpected. We had a state machine driving the computer run creatures in the Island of Kesmai game. This was an online multiplayer roleplaying game. There was essentially no human moderation and the game ran 24/7/365. The players were actively trying to cheat. Any weakness in the software, particularily in the monsters would be exploited. The state-driven monster brains coped very well with this.


Question from PaulMorrison

I thought "aggregate operations" (or at least my understanding of them) were the central concept in Deft...? What are some of the aggregate operations - I would assume sum, average, but I am curious what else. Another thing - maybe it would be helpful for our readers if you could position Deft vis vis SQL.

Some of what you guys are describing sounds a lot like some of the data flow approaches that are being used in the Data Mining area. See Mike Beckerle's comment around the middle of the FlowBasedProgramming page. --pm

Good question. In a table oriented world, an operation that needs the entire record set, or "between-record" state information is aggregate. We anticipate a dozen or so fundamentals.

The current exmaple is dcc "memnonic: declare control column" which creates a new column that encodes distinct records and sort order. It has a "where" clause as well. This prepares columns for use by the template engine. Nested data is encoded by running dcc() on two columns instead of one. In the example at the top of the page, the first dcc() drives the outer loop, and the second dcc() drives the inner loop.

Since one tenet of Deft is that record order is meaningless, even dcc() does not re-order records. However, the column is produces contains all necessary information for ordering of the data rows (and even whether or not to render a row). Several calls to dcc() would allow a single template to render the same set of records in different orders in different sections of the template.

I (TomLaudeman) typically have simple needs for aggregation (data driven web pages). Noah has some fiendish stuff where the operators take functions as arguments, work on variable number of columns, crush rows or make new rows, and some recurse by sending a sub-table to another function. --twl

I wonder if there is some synergism with what my son is working on - it would be interesting if you guys were on convergent paths... See http://eleven.sourceforge.net/ --pm


Noah writes, with some editing by pm (Noah, could you check it?) :

I'll have to step in here because as usual when Tom is quoting me he's completely wrong, or at least seriously misleading. The guiding principle in Deft is data independence. As much data management as I can conceive of how to implement is strictly the provenance of the language. What the programmer gets to work on is therefore notionally a huge first normal form table. To date, that table, parcelled out a row at a time, is all the programmer has access to, though there is a class of aggregation functions which have the property that they can save state and therefore transform the entire table.

The corollary Tom mentioned that iteration is bad can be viewed in its proper light: that iteration over data structures is bad when in the pursuit of managing data. The reason is obvious: if the language is managing your data for you then when you choose to manage the data yourself you throw away the advantages gained by using Deft in the first place. Iteration is not completely dead however - it has a place in expressing logical depth.

In FlowBasedProgramming there is a 'factory floor' model for data processing. I use that metaphor frequently in Deft as well. In this model, hooking a machine into the network and specifying within the machine that we should do our task to each of our inputs is nonsensical. I can imagine a network that would involve a loop running back to the machine from its output - this is an example of a logically deep algorithm which Deft will handle via recursion or loop operators that syntactically sweeten recursion.

I'm still working on the implications of this but it seems to me that the big win of having your programming language handle your data for you isn't going to be all the structure maintenance it saves: Tom currently finds about a 3:1 code length ratio between Perl and Deft doing mostly trivial things. The big win is that any part of your algorithm that can be transferred to data essentially vanishes - this could allow massive code reduction. Combined with the reuse implied by FBP techniques I could see this offering 1-2 orders of magnitude reduction in programming effort. -- nh

Tom answers:

Noah,

Right. :-)

You've left out many of the implications, but that's fine. I'll fill those in.

By "iteration to implement logical depth", you mean the algorithm's logical depth, not the data's logical depth. Right? Any iterative behavior required to 'crawl' through the data set should be handled internally and automatically, either by interating through the records, or by a internal, accumulative aggregation function.

Yes, I left out the stateful aggregation functions (aside from a brief reference to dcc). So we have a second corollary to "first normal is good".

Postulate 1: First normal form is good, and can efficiently represent all other data structures.

Corollary 1: Iteration is implied (handled internally by the language).

Corollary 2: Inter-record state information is implied (handled internally)

Corollary 2 says that functions requiring stateful knowledge of multiple records (i.e. sum, average, min, max, sort) need to be built-in functions of the langauge, and have essentially black box behavior.

I'll work on some examples translating typical data structures to first normal.

Also, is it really "first normal" or is there a more accurate/precise name?


Noah again:

I've just finished the book excerpts from your web site and it's great stuff. The major differences between Deft and the system there described are: the afore-discussed first normal form thing, the implementation of 'flow' level operators like sort and merge, the text vs. graphical basis of the development environment (though some of that is related to the primitiveness of the implementation), and the role of optimization in execution.

Probably the most essential design decision was to create a programmer accessible view of the data as n-element tuples of scalars which would be accessible by some programmatically assigned name. This means that the names used within a program have to be given as arguments to any of the general purpose operators we create vs. the creation of interpreter code to insert in the network on either side of a call into such a function. This makes it more convenient to create these as language features.

We use the global term "aggregation operators" to refer to those tasks which require access to the stream in order to function. These aggregation functions get designed as I conceive of them and implemented as Tom needs them, so we don't have very many. My current hypothesis is that we will need a dozen or two all told which should form a small enough core that the language won't be baroque.

The big difference I'd like to see is in the field of design and optimization. From my reading in relational theory it seems to me that at some point the Deft compiler could do quite a lot of the design optimization using the same kind of algorithms used by DBs to optimize queries. I could also see the possibility for a network of Deft executing nodes to use runtime profiling data to create execution structures like in fig. 22-2 [in http://www.jpaulmorrison.com/fbp/perform.htm] on the fly. Like you point out about FBP this goal can be incrementally striven towards and will automatically be available to all users. --nh


Aggregate operations are important but they are higher order functions. (Yes Tom I know you're not doing that way yet but you will be). These operators slice the streaming table into sub tables based on the data contained in them and allow them to be altered by functions that are presumably easy to create. The point I've always driven for is a proper tool for each task: FBP and relational IP let one use relational algebra to handle data structures. Perl puts regular expressions for strings and math for numbers into the mix. I like to use as concise a notation as I can think up as long as I don't lose precision.

The whole structure is still a little nebulous because we haven't actually created it yet. I'm still going back and forth on the best set of behaviors and Tom hasn't had to use any of the advanced ones at all. The basic ordering operator is powerful enough that it's almost all he needs to run his web sites.

The wiki is very cool - I'll be reading it for a while. -- nh

Thanks --pm


The following from Tom Laudeman (PaulMorrison's responses in italics):

Deft doesn't fit into the parameterized component model, or LittleLanguages, but on the other hand Deft scripts are often less than 10 lines of code so the "little" part shouldn't be necessary.

In reference to ParametrizationOfComponents:

"port" seems too low level. In general, I believe programmers should be shielded from implementation details. It may be good to speak of this kind of thing in philosophical terms, but the syntax of the language and the tools available should assure that the programmers only write high level code.

I agree "port" is low-level, but I feel this is a necessary part of the infrastructure. There is no need to give a HLL programmer full access however. For instance, I believe each Unix stage(right word?) has an input port, an output port, and an error port. So

x | y
means
x OUT -> IN y
- the bar implies two port names and a "feeds" relationship. If you want to write components using a lower-level language, IMO it is overly restrictive to be limited to only 3 different port names. OTOH any process could talk to a Unix process running in a FBP network by sending to a port connected to the IN port of the Unix process.

In other words, if a "program" includes defined ports and connections, then the definition of the ports and connections should be "source code". In the FBP model, connections are "outside" of code - FBP is a coordination language.

If instead, ports and connections are static (like hardware configuration) then all programs need to rely on the static configuration (which becomes sort of a virtual machine). If the latter (virtual machine) is the case, then it cannot be allowed to change in ways that will break existing programs. I quite agree!

I'm not clear about the illustration at PortsAndConnections

- Where does the output from B and C go? Is it joined later?

- Are B and C instances of the same code? The text says "It is permitted for processes B and C to be executing the same code..." but I'd have thought that only one of the following can be true:

1) B and C are required to be running the same code thus this is a parallel system and M and N are portions of a parallelizable stream.

2) B and C must NOT run the same code and thus this is a system of asynchronous processes. M and N are separate streams of data.

You may be using the word "code" differently from me! I view "code" as a passive resource, namely the set of instructions to be followed when executing a particular function. The only data they can share is data that is guaranteed to be unchanging - most likely values like number of days in a week, or initial table sizes. Each process will have its own variable data, parameters, connections, etc., completely independent of those of another process (even if that process uses the same "code"). So, for me, 2) is correct starting at the word "thus"!

While Deft shares some qualities with FlowBasedProgramming (FBP) (there is a flow between processes, virtual or otherwise), Deft does not have the issues above. Perhaps Deft is a subset of your FBP implementation.

The Deft programmer only says what they want to happen (aside from the carefully circumscribed inline Perl that we allow). Deft "streams" (tables in database parlance) are philosophical entities, and programmers have no tools to manage tables since Deft does all that. Deft programmers can create new columns and refer to each column by name. New rows and tables can be created, but are always managed declaratively in the same way that rows in SQL are always managed declaratively. As a result, the syntax of Deft only has ways to talk about columns (essentially variables). Aggregate functions (like sum) work on a whole table, but can't refer to a named table because there isn't a named table; there is only 'the table'. The Deft SQL functions (which I suppose are aggregate functions) know about external data sources, but those external sources are always joined to 'the table'.

Why only one table (if I understand you correctly)? Why not, say, a table per process?

I've always thought that what Deft does is pretty easy to grasp. It is a formalization and toolbox to do something like Unix piping:

 read_data.pl | analyze_data.pl | write_data_to_sql.pl

Some of those Perl script utilities could can be (and historically were) reused:

 read_data.pl | analyze_data.pl | create_html_output.pl

Of course, the "analyze_data.pl" step could take 2 hours, and it was smarter to:

 read_data_from_sql.pl | create_html_output.pl

Why? Care to expand this section?!


Discussion of "not having to have identical columns" (to be reorganized):

I don't have a problem with your image, and appreciate the distinction between your 1st normal form image for processing vs. leaving databases normalized. Like an outer (or maybe inner) join in SQL. However, I did not understand the sentence that says "We also don't require that every record have identical columns." Do you mean that you support null values? If so, I agree: null is necessary. If not, I would like more information. --PaulMorrison

Null columns seem necessary. But I was refering to rows of the same "table" not having identical columns. Noah's "Normalized Information Compatible with Enos" (NICE) is not quite first normal. For example consider these two rows of the "table" (show in name=value format):

 type=book pages=100 illustrations=3
 shelf=1 row=3

This isn't common, but it is does seem to be necessary (albeit unappealing). Sending the table above to the illustrations_per_page black box (subroutine) that counts words on we would have this:

 type=book pages=100 illustrations=3 illustrations_per_page=0.03
 shelf=1 row=3 illustrations_per_page=NaN?

Clearly, illustrations_per_page needs to be smart enough to avoid a division by zero error. In our experience, rows almost always have the same columns, thus avoiding the ugly situation above. --TomLaudeman

I still think that conceptually, in a FBP environment, different black boxes could work on different tables. In fact, streams could turn into tables, and tables into streams. I believe the combination of these concepts would be extremely powerful! Think of a table as a weir in the stream :-) --PaulMorrison

In this sense Deft is FBP. Any Deft table can be sent to any Deft statement. Unlike your FBP, every Deft table has identical form (rows and columns) differing only in the number of columns (and number of rows, but in a declarative world row number is of no consequence).

Naturally, some Deft statements have required column(s) and will exit if expected column(s) don't exist.

FBP seems to use traditional data structures, which I suppose are serialized and sent from one black box to another. Deft does the same thing. Aside from a completely different implementation, the biggest differene between FBP and Deft is that Deft is declarative. --TomLaudeman

 type=book pages=100 illustrations=3
 shelf=1 row=3

My first reaction to the above is Ugh! My second reaction is that there is a concept behind this that is trying to get out. My guess, and it's just a guess, is that, since books sit on shelves (are there multiple rows on a shelf?), the second line is some kind of grouping construct. I think it would be more appropriate to add a "shelf" and "row" column to the "book" table, and make all table rows the same (with a null option for individual field values).

However, more generally, this grouping idea crops up very frequently in business applications: e.g. cities within provinces within countries, especially in reports designed for humans to read... Typically the group starts with a header, and ends with totals, averages, etc. Groups may be nested indefinitely. FBP's streaming concept lends itself very well to this concept. See also figs 10-2 and 10-3 in http://www.jpaulmorrison.com/fbp/simpapp3.htm

With your concept of a single table for a process, IMHO it would be cleaner, and less jarring to database types, to support additional tables accessed by foreign keys, rather than to allow heterogeneous tables. --PaulMorrison

I got frightened looking at figures 10-2 and 10-3. http://www.jpaulmorrison.com/fbp/simpapp3.htm

-O

It looks like there are at least two iterators (Detail and substream), and it is necessary to do a few things exactly one time (open, master, close, and stream). First normal transforms all that to a single operation: process one record.

As I understand, the FBP data reference to 'Detail.Dist code' would be something like these quasi-C examples:

master.prod_no->detail.dist_code

or

struct *prod_no_ref = master.prod_no; dist_code = prod_no_ref->detail.dist_code;

or

detail[master.prod_no].dist_code;

First normal would drop one of the prod_no fields (duplicate) and when the relation is flattened all the fields and subfields are merely columns in the record. dist_code is a column (or a variable). The data reference is merely

dist_code

The first normal view is trivially simple. The relational data strucutre is an array of structs, or an array of struct pointers, and is complicated and messy.

Deft has a function that creates a column that describes a flattened dimension of the relation. For example, to display the data in 10-3 in a report, master needs to be flattened, and the combination of master/detail needs to be flattened. Each is a single statement in Deft, and we usually use dcc() (memnonic: declare control column). Deft has an integrated template rendering engine that understands the data produced by dcc(). Each dcc() the script, has a corresponding statement in the template.

For the example in 10-3, Deft needs two dcc() calls (master, and master/detail), and two statements in the template. These replace nested loops in the FBP code and corresponding nested statements in the reporting tool.

Noah has found loads of math and database work (some done many year ago by people like Codd) that demonstrate various strenghts of first normal. Noah also found a nice math paper that demonstrates that first normal has identical theoretical performance boundaries with other data representations. I haven't read the paper yet:

http://www.ucalgary.ca/%7Erzach/279/0702-003.pdf

Besides, Deft is declarative, and SQL database people should like that, even if they have some initial misgivings about first normal. --TomLaudeman

Conceptually there is no difference between rows with disjoint or overlapping columns and rows with nulls. As Tom mentions to date we havent tackled any problems that actually required this feature. the actual way to think of this would be type=book pages=100 illustrations=3 shelf=NULL row=NULL type=NULL pages=NULL illustrations=NULL shelf=1 row=3

Now I can't concieve of any situation where this would actually exist but it is legal and my response is also Ugh. We too have convienent representation of arbitrary depth of relationship through one of two means. For relationships like you reference a word is on a page is in a book is on a shelf is in a room is in a building is on a street is in a city is in a country is on a planet... we simply widen the row. A relationship with 12 free variables, levels of indirection in a structure based system, will have twelve columns each containing the relationships held by this particular row. This allows the representation of arbitrary even disconnected graphs of relations.

But what about arbitrary depth relationships? You say you have built trees of structs that contained other trees of structs with depths of hundreds, thousands, or millions. How can a programmer cope with records with posibly billions of columns whose contents are known only at runtime? Well the answer is another of my intuitive leaps that all such situations are relations of algorithmic construction for example a complete geneology of the human race for the last 6000 years would be an immesly complicated graph. Instead of making every person thier own column devoting rows to each set of relationships instead we can create a person column and then a column or columns containing his relationship to the global structure. Currently we have a function called dcc, short for declare control colonm, which populates a row with strings which embody this row's ordered and participatory relationship with all other rows on some relationship. Combine this with recursive functions and a small handfull of columns can represent massive, deep, and arbitrary nestedness with no more handles for the programmer to juggle then any other technique. --NoahHealy

I agree with Noah's point about widening the table, and you re right, these are legal rows, so the feature is available if needed. But I'm confused about the following:

a) You say you have built trees of structs that contained other trees of structs with depths of hundreds, thousands, or millions.

Did I really say that?! We have built streams where some or most of the IPs were tree IPs - in this case a tree IP acts like an eddy in the stream, somewhat the way your tables do... we didn't have trees pointing at trees - FBP ownership rules would have precluded that.

b) How can a programmer cope with records with posibly billions of columns whose contents are known only at runtime?

Where does the idea of billions of columns come from? In fact, why would you have a column per person? Why not just one table for relationships and resulting kids - one row per kid - so I assume the columns would be person 1, person 2, kid (null if none from that relationship). Oddly enough, this is pretty much how the GeneWeb? program that I used on my web site looks at the world, and it can look at 2 individuals in the same tree and figure out that they are 2nd cousins, once removed! --PaulMorrison

> a) You say you have built trees of structs that contained other trees of > structs with depths of hundreds, thousands, or millions.

Sorry just a rhetorical device some people have dealt with programs that traversed truly massive and massivly interconnected data structures. Obviously this traversal is accomplished algorithmically. If deft abandoned algorithmic traversal and forced programmers to create a distinctly named column to hold every potential relationship and spewed them across every row the conceptual size of row would rapidly exceed human comprehension. For large problems it would even defy machine memories. So I highlighted our solution to that part of the problem.

I have a conversational writing style and that can lead to the interjection of non sequitors.

> Where does the idea of billions of columns come from? In fact, why would > you have a column per person?

Just carrying out reductio ad absurdum arguments against myself. If you take the one column per relationship concept from paragraph 1 and apply it to the situation proposed in paragraph 2 you get to a very bad place.

The solution you mention is actually quite workable in deft by using recursive joins we can crawl the table form any local point without constructing the set of global relationships. --NoahHealy


Note from PaulMorrison to Tom and Noah:

I think I've got the concept of 1st normal form, and even rows with lots of nulls :-) , but I have a (probably dumb) question, which maybe you or Noah can answer for me. You have said several times that you see a similarity between FBP and Deft. While I agree that there are correspondences, to me the two key differences between FBP and older technologies are

data streams, and

in a network definition, external to the processes.

I think it's possible that these characteristics are intrinsic to Deft, but most of your documentation talks about the table(s), as this is certainly the novel aspect of what you are working on. This would be quite understandable. However, if Deft does not have these characteristics, then we can't really call it FBP-like. However, of course, one or more Deft processes would fit beautifully into the FBP architecture, so IMO a Deft-FBP hybrid would be incredibly productive. Could you guys clarify?

One other point, Tom: yes the streaming concept does involve nested iterators, but IMO this doesn't make the logic more complex. You just do foreach's at each level of the hierarchy - in fact with an appropriate mini-language, you can factor the foreach's out, and just talk about what kind of processing you do for each type of Information Packet. If you combine this with a stack to hold context data, I think you could express a lot of logic. In Deft, if I understand it correctly, the foreach's sort of become SELECTs...?

--PaulMorrison

The character of the network definition is the widest point of divergence between what we are doing and FBP. In Deft we have no tradition as yet of GUI coding. Nor have we explicitly laid out the set of queues that connect our operations. Those are viewed as compiler tasks. Each line of Deft acts as a notional black box in a flow system. If one wished to create a system that translated flow graphs to code then creating 'black boxes' in Deft should be an excellent method to accomplish this. In fact in a given environment I believe it is probably useful to construct such a graphical mini-language for common use. It was in creating the foundations for such a system that the "render Deft" function was first conceived and coded up. Render does a stream to text file translation and was created with database sourced publishing in mind. The reason our streams are in the form of relational tables is that it makes a compiler capable of creating the appropriate network easier to write. Also the optimizations you note in your book, rearranging execution orders and running parallel processing boxes should all be able to be achieved purely by the compiler leaving code that is very brief. For example there is no concept of foreach in Deft because it's implicit in every statement that it is foreach row.

Deft's actual implementation has gone through several phases but I believe every iteration to date has had some sort of queue standing between statements. Even the single process version creates an in memory queue and access to whichever queue the runstyle is using is through the unwind() and rewind() function calls. In that sense coding the internals of Deft is flow based. Anyway, hope this helps clear things up a bit.

--NoahHealy

Yes, Deft does have streams connecting processes. However, Deft allocates these streams dynamically at runtime and probably always will (it is possible to pre-compile, but I'm not sure that's a meaningful optimization.) The single tasking version just runs the code sequentially on one machine. FBP seems to have different capabilities at different ports on different machines. In the Deft world there is a codebase, and there are some number of machines. Any machine is capable of running any part (or all) of the code.

The parallel version currently uses a very simple model of just starting a process on the next host in the list. We open a stream to the host and send an id for the code that host needs to run. Every host contacts a central database to retrieve its code and its stream of data. This is robust, but clearly open to optimization. Implementation is sure to become more complex, but the end-programmer won't see any change (except more speed).

End programmer Deft code doesn't come in "single tasking" and "parallel". Any Deft code will run in one task or in parallel. This isn't possible in the usual programming langauges where the parallel version of the code is dramatically different from the single tasking version.

We are still trying to understand weaknesses in the Deft model. Currently we've discovered that first normal/declarative requires a mental shift, but the code is (so far) not complex or weird.

--TomLaudeman


FlowBasedProgramming | RecentChanges | Preferences
This page is read-only - contact owner for a password | View other revisions
Last edited April 25, 2005 8:54 pm by PaulMorrison (diff)
Search: