Dataflow and Query Processing
|
|
Back in the early 80's, there was a lot of hype behind the Fifth Generation project that the Japanese unveiled to the world. So much so that Time magazine had some nice graphics explaining non-von Neumann dataflow architectures. Those images have stuck with me ever since then.
The peculiar thing about Dataflow architectures is that they are ideal architectures if you want to squeeze out every last bit of parallelism out of a program. It also turns out that it's an ideal way of executing workflow.
See some workflow languages tend to express themselves in a control flow manner, that is the sequencing is explicit. However, the draw back is that it can lead to certain scheduling constraints that are unnecessary. This can lead to inefficiencies that you din't plan for.
The better way is to be able to declare what needs to be done, and have the dataflow machine schedule tasks when the necessary dependent tasks or inputs arrive. In short, a declarative specification tends may lead to better efficiencies than an imperative one.
This leads me to queries, which traditionally has been one of the more succesful stories of declarative programming. It turns out, if I wanted to retrieve information and I didn't what to be concerned with the sequence in how I find all the necessary parts, then a query would be the ideal expression. Just like we don't concern ourselves as to how a relational database makes its query plan, we equally shouldn't concern ourselves as to how a workflow is put together.
What I'm driving it at is that, for programming business process management systems, a query engine that works in a distributed asynchronous fashion could be an extremely powerful tool. The way it works is that a query is a request for work that spans multiple systems. The sequencing is discovered by the system based on the metadata defining the structure of the participants. It's just like a regular database but only its federated and asychronous.
I've heard of distributed information retrieval (IR) and I vaguely recall that certain workflow descriptions are goal oriented. However, I'm wondering, has anyone done research were the particpants of a query deliver in a asychronous manner? Something to think about!

