Category Archives: stuff

Articles from the Plone version of the blog.

How to Achieve Highly Scalable Transaction Processing

Share the article!

I’ve been asked to share my thoughts regarding highly scalable Transaction Processing. Transaction Processing has been an exhaustively researched area in computing for several decades now and a lot of the brightest minds have likely beaten this subject to death by now. Scaling transaction processing is an extremely difficult problem because it tries to find a solution to two diametrically opposite forces, that is ACID (single entity) versus Distributed computing (replicated entities).

Transaction Processing systems have been invented in the past using either of two communication mechanisms, RPC and Queued. Application servers like EJB containers can trace their lineage back to Transaction Monitors. These prescribed to a RPC style of communication to do transaction processing. One would use a JCA adapter to communicate to XA resources managed by JTA. Underneath the covers RPC would use the Two-Phase Commit protocol (TPC). In its most general form, the Paxos algorithm would be the mechanism to ensure consistency amongst distributed systems. In the quest to sell more capable hardware at exponentially higher prices, a RPC based approach seems to be an ideal motivator.

The problem though with RPC based Transaction Processing is that it is difficult to scale to many cheaper boxes. ( Scaling to use more servers is more economical since the cost increases only linearly and the horsepower isn’t bound to the latest technology the industry can provide.) The queued approach however has been shown over the years (TBD: show reference) to be the right approach to achieving scalability. The balance one works with is Latency versus Consistency. In the RPC approach, latency is compromised for consistency in that TPC-like communications are very expensive. In the Queued approach, latency is also compromised in favor of consistency in that communication processing is performed asynchronously and not immediately.

A good analogy to explaining the difference in the approaches can be made by examining the difference in a pessimistic versus optimistic locking approach. In the pessimistic approach, one takes a preventive approach, such that all data is always synchronized. This is done, by reserving resources prior to execution. This reservation however reduces the amount of concurrency in the system. Think for example of a version control system where a developer has a file locked, with developers having to wait until it is unlocked. By contrast, a optimistic locking approach (aka. versioning), allows multiple developers to continue on their work in a non-blocking manner. Queueing is by its nature non-blocking and thus ensures maximum parallelism. However, there will always be that critical section that works on shared resources. The hope however, is that this critical section is performed in a single machine and in a manner that keeps locks short-lived.

At an abstract level you can gain insight from Patt Heland’s ‘Life Beyond Distributed Transactions‘ paper, where he emphasized the constraints that a system requires to enable transactions. A truncated list of his requirements are enumerated as follows:

  • Entities are uniquely identified – each entity which represents disjoint data (i.e. no overlap of data between entities) should have a unique key.
  • Multiple disjoint scopes of transactional serializability – in other words there are these ‘entities’ and that you cannot perform atomic transactions across these entities.
  • At-Least-Once messaging – that is an application must tolerate message retries and out-of-order arrival of messages.
  • Messages are adressed to entities – that is one can’t abstract away from the business logic the existence of the unique keys for addressing entities. Addressing however is independent of location.

He writes about Entities that have extremely limited distributed transaction capabilities. To achieve scalability one has to define bundles of data are fixed in their ability to participate in transactions. These bundles of data or partitions cannot perform transactions with other partitions. When partitioning for scalability, you will need to route traffic to the specific partition that is capable of performing the transaction completely locally. It’s kind of a link the dual of NoSQL. In NoSQL data is duplicated (i.e. denormalized) in many places to achieve maximum scalability. In a transactional system, the data that is processed in a transaction should reside logically in only one place.

Cameron Purdy’s talk “Traditional Programming Models: Stone Knives and Bearskins in the Google Age” that mentions a FOREX exchange clearing house where a third of the transactions are EURUSD. This requires the EURUSD trades to be routed to a single box that handled the EURUSD market. Only the order submission and the order fill are performed against the database for durability reasons, everything else just is a flow through in memory data. This system was 1000x faster than the original system and had orders were processed in under 3ms.

Billy Newport in his slides “Using Chained Transactions for Maximum Concurrency Under Load” also employs the same basic blueprint in showing how he scales a eCommerce system. The partition that he performs is that each SKU is an entity that ensures the transaction integrity of data. Messages are routed through the system that contain instructions and its corresponding undo instructions. This undo capability allows the system to perform cascading compensating transactions in the event that a transaction down the chain isn’t able to complete successfully.

In summary, the fundamental guideline to enabling high scalability transactions is simple, “can you find a partition that allows your transactions to be performed in a single box?” This partitioning is without a doubt be dependent on a specific set of use cases. So for FOREX, it is the currency pairs (i.e. EURUSD) market and for eCommerce it may be the Product SKU and its inventory. However, what if your use cases don’t have a natural partition? The solution the would be one that would be analogous to how airlines handle over booking of their flights. That is, hope that the likelihood is low and that if the scenario does occur, then use out of band compensations to correct the violations of the business rules. How this kind of compensation mechanism is implemented is an extremely interesting topic by itself.

Interestingly enough, my high level thoughts about transactions hasn’t changed much over the years.

Share the article!

Cloud Computing is Software in a Box

Share the article!

Cloud Computing is all the rage these days. Problem here though is there are too many definitions floating around. I would like to however propose yet another definition. Cloud Computing is Software in a Box, or more concisely “Cloud Computing is Software in a Virtual Box”.

What is “Software in a Box”? This is an idea that has been around for a long time. I wrote a blog about it back in 2003 (see: “An Alternative Way Of Packaging Software: Hardware Included!)”. For Cloud Computing this title would now read: “An Alternative Way Of Packaging Software: Data Center Included!”.

Years ago, Bill Gates once referred to pc manufactures like Dell and Compaq (Now HP) as Value Added Reseller (VARs). That is all they did was add value to Microsoft’s software by bundling their hardware with it. Selling software is all about packaging. Most software is bought because its packaging delivers convenience to customers. It is not usually whiz bang technology that makes the sale, rather is is about ease of use, that is usability. What can be easier than receiving a box from a vendor, then plugging it into the wall and into the network to get it running? Well it turns out, receiving a virtual box from a vendor and not be even easier!

The benefits of a “Software in a Box” were described in this old 2003 article:

  • Reduced development complexity – That is less configurations to support.
  • Higher Performance – Performance can be tuned to the hardware delivered with the software.
  • Better Security – The box can be hardened and tested prior to delivery.
  • Easier Provisioning – Just add power and network connectivity.
  • Reliability – Less configuration implies less parties to point finger at.
  • Pricing – People like paying for something they can touch.
  • Distribution – Ride on the coat tails of hardware vendors.

Fast forward now, 8 years later, and we have these same benefits for Cloud Computing:

  • Reduced development complexity – Software can be pre-configured, tested and hardened for the target cloud platform. See: AWS Cloud Formation for handling complex networks.
  • Higher Performance – Software can be pre-tuned to the target cloud platform. For example, if one were delivering a Machine Learning based application, one could tune a solution for Amazon’s GPU cluster.
  • Better Security – The solution can be hardened in the cloud.
  • Easier Provisioning – Just sign up on a website.
  • Reliability – The cloud provider takes complete responsibility even for operational issues. One doesn’t need the expertise to configure a high availability setup.
  • Pricing – Pay as you go, use only what you need.
  • Elasticity – Seamlessly scale when demand increases.
  • In a former life as a Product Architect, I was working on a slide deck that showed how my company’s solution would fit in a prospective client’s network. I had drawn a solution that involved multiple boxes to cover the scalability, availability and heterogeneity of the solution. The feedback that I received was that there were too many boxes! I had also noticed come deployment time with a customer, it became painstaking to have to attend so many network interconnectivity meetings. My eventual solution to this packaging problem was that the software would now be deployed in a blade chassis with all the components pre-configured into blades and the network pre-configured with a virtual router in a blade. We were now back to a single 16u box!

    The drawback of Cloud Computing as compared to Software in a Box is the fact that the customer can’t hold it and as a consequence store it in one’s premises. At a mammalian brain level, a lot of people can be very uncomfortable with this. There’s also of course the concerns of hosting in a shared network,the security of data on shared storage and the robustness of network connectivity. To overcome these fears, one of course could deliver a “Cloud in a Box”.

    Share the article!

    A Pattern Language for High Scalability

    Share the article!

    A couple years ago (i.e. 2007), I wrote a short blog entry commenting on Pat Helland’s paper “Life beyond Distributed Transactions: an Apostate’s Opinion” (Worthy of a second and third read). I found it curious that it was re-discovered by (see: “7 Design Patterns for Almost Infinite Scalability“). Though is a treasure trove of implementation ideas on achieving high scalability. It made me wonder if anyone else had created a pattern language for high scalability? I have seen a few attempts and this entry is a quick attempt to extend those and conjure a new one up. Hopefully it serves as a good starting point for further refinement and improvements.

    At the most abstract level there is Daniel Abadi’s PACELC classification for distributed systems. IMHO, PACELC, as compared to Brewster’s CAP theorem, is a more pragmatic description of the trade-offs one will make when designing a distributed system. PACELC says that if there is a network (P)artition does the system favor (A)vailability or (C)onsistency; (E)lse in the normal state does it favor (L)atency or (C)onsistency.

    Cameron Purdy (founder of Oracle’s Coherence product) has a presentation where he proposes these building blocks for scaling-out:

    • Routing
    • Partitioning
    • Replication (for Availability)
    • Coordination
    • Messaging

    This short list is rumored to comprehensively cover every distributed system that can be encountered in the wild. If I applied the PACELC to this classification, I may be able to select Routing, Replication and Coordination techniques that favor either Consistency or Availability. Also, I may select Routing, Coordination and Messaging that favors Latency or Consistency.

    Jonas Boner, who I have a big fan of for a very long time (see: AspectWerkz ), has a great slide deck that comprehensively enumerates in detail existing techniques to achieve scalability, with availability and stability thrown in for good measure. Shown below is how this list may be mapped into Purdy’s classification (I have taken the liberty to refine the original classification), I’ve marked which trade-off that is favored, either Latency or Consistency, where I thought made sense.

    • State Routing
      • Distributed Caching(Latency)
      • HTTP Caching (Latency)
    • Behavior Routing
      • Fire-forget (Latency)
      • Fire-Receive-Eventually(Latency)
      • ESB
      • Event Stream Processing(Latency)
      • CQRS(Consistency)
      • Dynamic Load Balancing
    • Behavior Partitioning
      • Loop Parallelism
      • Fork/Join
      • Map/Reduce
      • Round Robin Allocation
      • Random Allocation
      • Weighted Allocation
    • State Partitioning (Favors Latency)
      • Distributed Caching
      • HTTP Caching
      • Sharding
    • State Replication (Favors Availability in Partition Failure)
      • Master Slave-Synchronous (Consistency)
      • Master Slave-Asynch (Latency)
      • Master Master-Synchronous (Consistency)
      • Master Master-Asynch (Latency)
      • Buddy Replication-Synchronous (Consistency)
      • Buddy Replication-Asynch (Latency)
    • State Coordination
      • Message Passing Concurrency(Latency)
      • Software Transactional Memory(Consistency)
      • Shared State Concurrency(Consistency)
      • Service of Record(Consistency if Synchronous)
    • Behavior Coordination
      • SIMD
      • Master/Worker
      • Message Passing Concurrency
      • Dataflow Architecture
      • Tuple Space
      • Request Reply
    • Messaging
      • Publish-Subscribe(Latency)
      • Queuing (Consistency)
      • Request Reply(Latency)
      • Store-Forward(Consistency)

    The trade-off between Consistency and Availability arises with the implementation of Replication by selecting an Synchronous versus Asynchronous Messaging (or even Coordination) approach. Employing Partitioning favors Latency and never Consistency (this should be obvious). The remaining patterns of Routing, Coordination and Messaging provides the flexibility where one can choose either Latency or Consistency.

    This for now appears to be a workable starting point. Although, there’s a lot of room for improvement. For example in the Replication category, Master-Master or the more general form of Buddy Replication is clearly favors Consistency at the cost of Latency irregardless of the choice of Synchronous or Asynchronous messaging and coordination strategy. I think this article “Concurrency Controls in Data Replication provides a better classification of replication techniques.

    There is also some inconsistencies that appear to need further refinement, for example the Fire and Forget Routing strategy appears to favor Latency in the sense that it is non-blocking (see: Scalability Best Practices: Lessons from eBay“), however messaging pattern may be the presence of a queue that clearly favors Consistency over Latency. So it favors Latency from the caller perspective, but Consistency from the receiver side (i.e. everything is serialized). In general one may say that decoupling (or loose coupling) favors latency while the tight coupling favors consistency. As an example, optimistic concurrency is loosely coupled and therefore favors latency.

    To summarize, there are a lot of techniques that have been developed over the past few decades. Concepts like Dataflow and Tuple Spaces and many other Parallel Computation techniques have been known since the ’70s. The question an architect should however can ask today (which wasn’t asked back then) is which technique to use given the trade-offs defined by PACELC. The short coming of this pattern language is that is does not provide a prescription of how to achieve high scalability. It only provides the patterns one would find in a high scalability system.

    The selection of the architecture, should be clearly driven by the use-cases and requirements. That is, consider vertical (see: “Nuggets of Wisdom from eBay’s Architecture“)as well as horizontal partitioning. Finally, unless a service has a limited set of use cases, one can’t expect to build a one-size fits all architecture in the domain of high-scalability.

    P.S. I stumbled upon recently this very impressive paper by James Hamilton from Microsoft’s He writes about the important considerations when designing a high scalability system from the operational perspective. This kind of insight is extremely very hard to come by. Not many software developers have the intuition to understand what goes on in the data center. On my next entry, I’ll attempt to incorporate some of Hamilton’s ideas to improve this pattern language.

    Share the article!

    Is High Scalability SOA an Oxymoron?

    Share the article!

    All too many Service Oriented Architecture (SOA) practitioners seem to have a belief, that because SOA deals with distributed computing, that scalability is a given. The reality however is that conventional SOA practices tend to work against the development of high scalability applications. This article shows the properties of a system that can achieve high scalability and then contrasts it with conventional SOA practices.

    The patterns found in a system that exhibits high scalability are the following:

    • State Routing
    • Behavior Routing
    • Behavior Partitioning
    • State Partitioning
    • State Replication
    • State Coordination
    • Behavior Coordination
    • Messaging

    This has been discussed in a previous blog entry “A Design Pattern for High Scalability“. SOA based systems conventionally cover Routing, Coordination and Messaging. However, the patterns of Partitioning and Replication are inadequately addressed by SOA systems. For reference, one can refer to the SOA Patterns book that I’ve covered in this review. The words “Partitioning” and “Replication” unsurprisingly can’t be found in the book’s index. Scalability apparently isn’t a concern to be addressed by SOA patterns.

    What then are the patterns that we can introduce to SOA to ensure scalability? Here are a couple of suggested patterns from the previous article:

    • Behavior Partitioning
      • Loop Parallelism
      • Fork/Join
      • Map/Reduce
      • Round Robin Allocation
      • Random Allocation
      • Weighted Allocation
    • State Partitioning (Favors Latency)
      • Distributed Caching
      • HTTP Caching
      • Sharding
    • State Replication (Favors Availability in Partition Failure)
      • Synchronous Replication with Distributed Locks and Local Transactions
      • Synchronous Replication with Local Locks and Distributed Transactions
      • Synchronous Replication with Local Locks and Local Transactions
      • Asynchronous Replication with Update Anywhere
      • Asynchronous Replication with Update at the Master Site only

    How can these patterns be manifested in a SOA system?

    To achieve Behavioral Partitioning, the construct of a Command Pattern (see: Command Pattern) and the Functor Pattern can be used. In the conventional SOA architecture, behavior (as in executable code) needs to be propagated through the network, to be executed by receiving services. In lieu of a commonly agreed standard, one may either employ XQuery as a stand in for this capability. One should therefore can define services to accept XQuery in a way analogous to how SemanticWeb systems accept SPARQL. A key to achieving scalability is that behavior be allowed to be move close to the data that it will act on. Behavior that works on data through remote invocations is a guarantee to kill scalability. See “Hot Trend: Move Behavior To Data For A New Interactive Application Architecture“.

    To achieve State Partitioning, SOA based system need to adopt the notion of persistent identifiers of data. WS-I has the notion of WS-Addressing which typically are used to reference endpoints as opposed to actual entities. What is needed is that this addressing or persistent identifiers act analogous to Consistent Hashing so that entities may be partitioned and accessible using multiple endpoints. Identifier based services would need to be stood up to perform the redirection to the endpoints.

    Finally, there is the issue of Replication to support availability and fail-over. The Identifier based services described early may function as routers to handle the fail-over. Alternative, one may employ proxy servers in the manner described in A New Kind of Tiered Architecture. The replication capability however will require the exposure of new kinds of services that support a replication protocol. The most basic of which would be to provide a Publish and Subscribe interface.

    To conclude, high scalability in SOA may indeed be possible. It is a bass-ackwards way of achieving high scalability, but if your only option is to use SOA, then there may just be a possibility to achieve it.

    Share the article!

    Software Development Trends and Predictions for 2011

    Share the article!

    Oh how time flies! I’ve been quite remiss in providing my yearly predictions in the Software development space. Must be the mini economic depression that we’ve gone through that just doesn’t appear to be ending for many folks. I am surprised myself to find the last time I published my predictions was in 2007 That’s a year before the epic market crash.

    Well it does turn out that most of my predictions (or rather observations) came out to fruition. That is,

    Virtualization, Cloud Computing, Dynamic Languages, Javascript Browser Integration, RIA frameworks, openID, the death of Webservices and life after EJB are all common features of today’s software development landscape.

    What is surprising is how little these trends have changed in the last 4 years since I made them. Appears to be just steady improvements (or declines) on a year to year basis. What is interesting to note is that when a concept or technology has a declining, it does persist. The only question is the velocity of the decline. Take for example WS-* Standards and EJB, over the past 4 years progress in this space has slowed to a crawl (EJB) or halted completely (WS-*). Of course these spaces are not entirely dead economically. Massive amounts of investments have been made in these two specific areas and their certainly is a bias and vested agenda to continue to pursue. These are today’s generation’s COBOL legacy apps.

    My other 2007 predictions have had made less of a discernible progress. Parallel Programming ( at the CPU level) and Semantic Web. There of course have been massive progress in the CUDA space and in LinkedData, but these technologies are not as ubiquitous as I had predicted.

    There however is one prediction that is experiencing a trend reversal. JCR in the years following 2007 made robust progress in that it was common to see many new Java projects based on it. However in recent years, likely because of competition with NoSQL, there seems to be a compelling alternative to document storage.

    Despite the economy (which I attribute more to global equilibrium), these are very exciting times for software development. There is a sea change that is happening that will rapidly change the face of the industry. Here are my 2011 predictions for software development.

    1. Enterprise AppStores – Okay, folks have been pointing this out for years using different names ( Widgets, Gadgets, Midlets etc) I mean Java itself got its boost because of Applets. We now know what has come of applets. The Thin Client revolution was hyped a decade ago. Well it turns out, it has arrived, courtesy of faster and more energy efficient processors and gesture driven devices. Enterprise will demand the kind of apps they find on their smart phone and tablets. Gone will be monolithic apps with loads of features that are rarely used. In will be highly specialized applications with high usability. Corporations, the control freaks that they are, will insist on governance. This ultimately implies an AppStore per enterprise basis.

    2. Javascript Virtual Machines – This trend has becoming bigger and bigger every year and as much as we find the language and environment to be difficult and unscalable, the community will have to deal with it. A majority of new development projects will need to find out how they can scale Javascript development or die trying. There are several competing alternatives, these include Javascript frameworks like jQuery and YUI, treat Javascript as assembly language like GWT and CoffeeScript and pretend it doesn’t exist like JSF, Vaadin and RAP. Your poison would like be chosen based on the kind of app you are building. What we will see is a major percentage of the app would be executed in a Javascript container, whether that be in the browser, the mobile app (PhoneGap) and the server (see: NodeJS, Jaxer).

    3. NoSQL – This is a harder call to make simply because not many applications are going to need this kind of scalability. That is if you look at it from the point of view of large monolithic applications. However, if an enterprise is to serve thousand of little applications, they better quickly figure our a way to manage that proliferation without the conventional overhead of managing the typical enterprise app. Virtualization gets you halfway their, but cloud based infrastructure, particularly storage managed by specialized NoSQL servers would be the economical way to go. The enterprise will consist of horizontally scalable ReSTful accessible endpoints that will be spewing HTML and Javascript. End users will be quickly able to compose apps and install them on their own devices.

    4. Semantic Indexing – So now that you have these massively horizonally scalable data stores, you are faced with an even bigger problem of finding anything. Full text engines like Lucene/Solr will be standard fare, however the Enterprise will demand higher fidelity in their searches and this will drive demand for indexing beyond inverted indexes. Expect to see servers with hundreds of gigabytes of DRAM to enable this.

    5. Big Analytics BigData will need analysis. Analysis is only as good as the quality of your data. There will be a rising demand for higher quality data sources, data quality tooling and of course alternative ways to analyzing and visualizing data. There will be intense debates between Map-Reduce NoSQL and traditional OLAP vendors. NoSQL will need to progress quickly enough before customers realize their inadequacies. Look to Hadoop to increase its dominance in this area.

    6. Private Clouds – Despite the economic and technological incentives of outsourcing one’s infrastructure, there will always be that organizational and human need to build and own one. Private clouds of course will overlap and need to integrate with public clouds, simply because the dominant business model will be SaaS. Look for OpenStack to emerge as the clear leader.

    7. OAuth in the Enterprise – OAuth is already widely used in consumer applications. With OAuth 2.0, I expect rapid adoption in the enterprise space simply because there is no other viable alternative. Application coordination will be a necessity and OAuth will be its enabler in secure environments.

    8. Inversion of Desktop Services – Services that we commonly expect on a desktop like the file systems, short cuts, contacts, calendars and programs will increasingly reside in the cloud. This is a consequence for the need for greater security on mobile devices and the need to share information among muliple devices and muliple collaborators. The filesystem of the future will be collaborative shared spaces. We are already seeing Services like Dropbox, Instapaper, Evernote, Springboard, Delicious and Twitter serving as the means of coordination between mobile applications.
    9. Proactive Agents – For decades people have been forecasting the emergence of digital personal assistants that would actively react to the environment on one’s own behalf. The emergence of always present smart mobile devices and cloud computing shared spaces will be the catalyst for the developing of active agent based computing platforms and frameworks. At present, most computing is merely reactive, that is servicing web requests only on command of a user. Future computing will include a proactive aspect that suggests courses of actions to users. Semantic technologies like Zemanta and OpenCalais provide intelligence to writers by suggesting tags that are relevant to a written document.

    10. Migration to Scala – My hunch on the programming language that will have the highest growth in adoption and interest would be Scala. Scala’s appeal is that it is elegant and can express complex constructs succinctly. I see most adopters migrating from serverside developers looking for a new shiny toy to play with. Scala frameworks like Lift and Akka, and IDEs like IntelliJ will be the wind that propels this migration. Alternative JVM languages like Groovy, Clojure and JRuby will likely plateau in popularity as a consequence.

    What now is becoming apparent with these trends is greater fragmentation in IT. No more is one kind of database, one kind of programming language, one kind of operating system, one kind of UI framework going to be viable. One cannot control anymore what its built on (specifically for any cloud based service), however one may still have control of the data (see: and this will be key.

    Share the article!

    The Ten Commandments for SOA Salvation

    Share the article!

    I stumbled upon this excellent short white paper from PolarLake. The paper lays out “Seven Principles of SOA Success“. To summarizes the unamed author list these principles:

    1. Minimize costs of disruption.
    2. Integrate incremenatally.
    3. Reduce coding.
    4. Use industry standards whenever possible.
    5. Accept the things you cannot change.
    6. Understand the strategic value.
    7. Buy – don’t build.

    It’s a pragmatic whitepaper that’s worth a quick read. Too many integrations projects fail because of the desire for perfection. A drive towards perfections leads to idealism and creates vision that can easily become unrealistic. That is the crux of the problem, the person that can envision the ideal world is in many cases the person who cannot see reality.

    This reminds of an entry I wrote “Ten Fallacies of Software Analysis and Design“. Where I outlined several intellectual sink holes which many developers are victim to:

    1. You can trust everyone

    2. Universal agreement is possible
    3. A perfect model of the real world can be created
    4. Change can be avoided
    5. Time doesn’t exist
    6. Everyone can respond immediately
    7. Concurrency can be abstracted away
    8. Side effects and non-linearity should be removed
    9. Systems can be proven to be correct
    10. Implementation details can be hidden

    Now with armed with this knowledge, I hereby present the 10 Commandments for SOA1 Salvation:

    1. Thou shalt not disrupt the legacy system.
    2. Thou shalt avoid massive overhauls. Honor incremental partial solutions instead.
    3. Thou shalt worship configuration over customization.
    4. Thou shalt not re-invent the wheel.
    5. Thou shalt not fix what is not broken.
    6. Thou shalt intercept or adapt rather than re-write.
    7. Thou shalt build federations before attempting any integration.
    8. Thou shalt prefer simple recovery over complex prevention.
    9. Thou shalt avoid gratuitously complex standards.
    10. Thou shalt create an architecture of participation. The social aspects of successful SOA tends to dominate the techinical aspects.

    1. SOA is a nebulous term and too easily overloaded. I use it here in the sense of a legacy re-engineering project who’s goal is to migrate to a more adaptive and flexible architecture.

    Share the article!

    Open Source Workflow Engines Written in Java

    Share the article!

    This is an updated list of active Open Source Workflow Engine that are written in Java or hosted in a JVM:

    • uEngine – The uEngine BPM suite consists of a modeling tool and process engine, dashboard with SSO and a OLAP inspired process analyzer. uEngine BPM foundation is built using the Liferay Enterprise Portal, Mondrian OLAP Server, JBoss Drools BRE and Axis 2.
    • Triana – An open source problem solving environment developed at Cardiff University that combines an intuitive visual interface with powerful data analysis tools. Already used by scientists for a range of tasks, such as signal, text and image processing, Triana includes a large library of pre-written analysis tools and the ability for users to easily integrate their own tools.
    • Pegasus – The Pegasus project encompasses a set of technologies the help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and now clouds. Scientific workflows allow users to easily express multi-step computations, for example retrieve data from a database, reformat the data, and run an analysis. Once an application is formalized as a workflow the Pegasus Workflow Management Service can map it onto available compute resources and execute the steps in appropriate order. Pegasus can handle 1 to 1 million computational tasks.
    • Drools Flow – Drools Flow provides workflow to the Drools platform. Drools Flow allows end users to specify, execute and monitor their business logic.The Drools Flow process framework is easily embeddable into any Java applicationor can run standalone in a server environment.
    • Activiti – Activiti is a Business Process Management (BPM) and workflow system targeted at business people, developers and system admins. Its core is a super-fast and rock-solid BPMN 2 process engine for Java. It’s open-source and distributed under the Apache license. Activiti runs in any Java application, on a server, on a cluster or in the cloud. It integrates perfectly with Spring.
    • jBpm

      JBoss jBPM is a platform for multiple process languages supporting workflow, BPM, and process orchestration. jBPM supports two process languages: jPDL and BPEL. jPDL combines human task management with workflow process constructs that can be built in Java applications. Includes also a Visual Designer for jPDL and Eclipse-based tooling for BPEL.

    • RiftSaw – Project Riftsaw is a WS-BPEL 2.0 engine that is optimized for the JBoss Application Server container. WS-BPEL 2.0 is an XML-based language for defining business processes that orchestrate web services. Riftsaw is based on Apache ODE .
    • Joget – Joget Workflow is a people-driven, form-based workflow management system. Joget Worklfow is XPDL compliant and has a plug-in architecture to extend its usability. The system can be used on its own to manage the flow of processes and data captured from forms. Supports synchronous and asynchronous integration of other business processes. Supports portal integration using AJAX or JSON APIs.
    • Orchestra – Orchestra is a complete solution to handle long-running, service oriented processes. It is based on the OASIS standard BPEL 2.0. Provides a generic engine (Process Virtual Machine), Web 2.0 based process console and a graphical BPEL designer.
    • Enhydra Shark

      Shark is completely based on standards from WfMC and OMG using XPDL as its native workflow definition format. Storage of processes and activities is done using Enhydra DODS.

    • Taverna The Taverna project aims to provide a language and software tools to facilitate easy use of workflow and distributed compute technology within the eScience community. Taverna is both a workflow enactor and also an editing suite.

    • Bonita

      Bonita is a flexible cooperative workflow system, compliant to WfMC specifications. A comprehensive set of integrated graphical tools for performing different kind of actions such as process conception, definition, instanciation, control of processes, and interaction with the users and external applications. 100% browser-based environment with Web Services integration that uses SOAP and XML Data binding technologies in order to encapsulate existing workflow business methods and publish them as a J2EE-based web services. A Third Generation Worflow engine based in the activity anticipation model.

    • Imixs – The project comprises a framework to create workflow systems as well as a reference implementation based on the J2EE standard. The project includes the development of a graphic editor for creation and management of workflow models based on the Eclipse framework.
    • Bigbross Bossa The engine is very fast and lightweight, uses a very expressive Petri net notation to define workflows, does not requires a RDBMS and is very simple to use and to integrate with java applications. Actually, it was designed to be embedded.

    • YAWL – YAWL (Yet Another Workflow Language), an open source workflow language/management system, is based on a rigorous analysis of existing workflow management systems and workflow languages. Unlike traditional systems it provides direct support for most of the workflow patterns ( YAWL supports the control-flow perspective, the data perspective, and is able to interact with web services declared in WSDL. It is based on a distributed, web-friendly infrastructure.
    • Zebra – Zebra is a workflow engine – originally developed to fill in the gaps in some commercial and open source workflow engines. The key differences between it and other workflow systems are able to model all the workflows described in workflow patterns, a GUI designer and Hibernate persistence layer.
    • ActiveBPEL – ActiveBPEL engine is a robust runtime environment that is capable of executing process definitions created to the Business Process Execution Language for Web Services (BPEL4WS, or just BPEL) 1.1 specifications.
    • Ode – Orchestration Director Engine – The initial source for Ode originates from the Sybase Business Process Enginge (BPE) and the PXE BPEL 2.0 engine from Intalio. ODE implements the WS-BPEL specification. The implementation will also support Message/Event to process correlation. ODE can be plugged into various service bus or component architectures like ServiceMix.
    • BeanFlow – A tiny library with just a few classes and only depends on commons-logging and Java 5. Uses plain Java code to do boolean logic, handle state, do looping, call functions, create objects, aggregation and composition. Based on just one single concept, joins.
    • Swamp – SWAMP is a workflow processing platform.

      The workflow is designed in a XML based meta language. Workflows can be built from different workflow ‘patterns’ like simple actions, decisions, selections, loops, but also custom code and external events.

      SWAMP builds a HTML GUI from the workflow definition file that guides different users through the whole process, sends notifications if required, assembles overview pages over all running processes and much more. A SOAP interface can be used to integrate external systems into the workflow.

    • Sarasvati – Sarasvati is a workflow/bpm engine based on graph execution. It has a simple core which allows for subsititions of implementations.

      Features include: Simple graph execution based core;


      Process Definition/Graph Visualizations;

      Process Visualizations;

      Domain specific language (Rubric) for user understandable guards;

      XML file format for process definitions;

      Hibernate backed engine for DB persistence;

      Memory backed engine;

    • TobFlow – TobFlow (Total Object Base and Flow or the Object Flow) is a web application platform to manage forms and workflows. It is made of an engine which manages the user interface (forms) and the scheduling of tasks (workflows) based on object model descriptions.

      The TobFlow is a true document workflow tool.

    • >

      Codehaus Werkflow

      Werkflow is a flexible, extensible process- and state-based workflow engine. It aims to satisfy a myriad of possible workflow scenarios, from enterprise-scale business processes to small-scale user-interaction processes. Using a pluggable and layered architecture, workflows with varying semantics can easily be accomodated.

    • OpenSymphony OSWorkflow

      What makes OSWorkflow different is that it is extremely flexible.

    • Run WFE – RunaWFE consists of JBOSS-JBPM workflow core and a set of additional components. Includes user web interface,

      graphical process designer,

      flexible system for roles executors determination,

      web services, portlets,

      Alfresco integration and

      security with LDAP/MS Active Directory integration.

    • wfmOpen

      WfMOpen is a J2EE based implementation of a workflow facility (workflow engine) as proposed by the Workflow Management Coalition (WfMC) and the Object Management Group (OMG). Workflows are specified using WfMC’s XML Process Definition Language (XPDL) with some extensions.

    • OFBiz Workflow Engine

      The Open for Business Workflow Engine is based on the WfMC and OMG spec. OFBiz Workflow Engine uses XPDL as its process definition language.

    • JFolder JFolder (formerly PowerFolder) contains features critical to many applications – including web pages, workflow, security, persistence, email, file management, and data access.

    • Open Business Engine Open Business Engine is an open source Java workflow engine which supports the Workflow Management Coalition’s (WfMC) workflow specifications, including interface 1, also known as XPDL, interface 2/3 known as WAPI and interface 5 for auditing. OBE provides an environment for executing activities in a controlled, centralized environment. OBE supports both synchronous and asynchronous execution of workflows. The primary OBE implementation is based on J2EE.

    • Freefluo

      Freefluo is a workflow orchestration tool for web services. It can handle WSDL based web service invocation. It supports two XML workflow languages, one based on IBM’s WSFL and another named XScufl. Freefluo is very flexible, at its core is a reusable orchestration framework that is not tied to any workflow language or execution architecture. Freefluo includes extension libraries that enable execution of workflows written in a subset of WSFL.

    • Micro-Workflow – The micro-workflow framework targets developers who want to separate the control and logic aspects in their programs, thus making them flow independent. A well-factored flow independent application facilitates change because the most frequent business changes translate into process changes, thus leaving the code intact. Flow independence also fosters reuse, because domain objects make fewer assumptions about the control context in which they operate

    • con:cern – con:cern is a workflow engine based on an extended case handling approach. A process is described as a set of activities with pre- and postconditions. An activity is executed when its preconditions are met. It manipulates the process item, thereby creating postconditions. The process flow is determined at run-time. This approach is superior to the conventional process flow approach.
    • XFlow2 – Inspired by simple workflow definition language in XFlow developed to improve its implementation. Externalized SQL in iBatis mapping files. Works as embedded workflow engine.
    • Apache Agila – Agila is centered around Business Process Management, Workflow and Web Service Orchestration. It’s composed of two specialized modules: Agila BPM and Agila BPEL. Agila BPM is basically handling tasks and users who have to complete these tasks. It’s a very flexible and lightweight workflow component. Agila BPEL is a BPEL-compliant Web Services Orchestration solution
    • Syrup – Syrup is an adaptive Workflow system. Syrup provides five basic concepts: Tasks, Links, Workflows, Workers and the WorkSpace. Syrup can overcome the von Neumann bottleneck that stops traditional software systems from scaling. It does this by strictly separating the specification, identification and execution phase of Workflows in a distributed setup. Syrup doesn’t follow the more complex standards such as Wf-XML, BPML and XPDL.
    • Dalma – The heart of the engine is an ability to capture the execution state of a thread and resume it later. Many applications of today need to have a part of the program that waits for other entities. . Often there are multiple conversations running concurrently. Those are what we call “workflow” applications. Today, those applications can be written, but one can’t write it very productively. Dalma makes it very easy to write those workflow applications by letting you write it as an ordinary procedural program without any boilerplate.
    • Pi Calculus for SOA – The first stage of this project is to provide an implementation of the W3C Web Services Choreography Description Language (WS-CDL). It provides the necessary tools to describe and police blueprints for complex distributed IT architectures as well as for describing cross domain business protocols (e.g. FIX, fpML, SWIFT, etc).
    • Intalio BPMSIntalio BPMS is designed around the open source Eclipse BPMN Modeler, Apache ODE BPEL engine, and Tempo WS-Human Task service.
    • GridAnt – GridAnt is not claimed as a substitution for more sophisticated and powerful workflow engines such as BPEL4WS, XLANG and WSFL. Nevertheless, applications with simple process flows tightly integrated to work with GT3 can vastly benefit from GridAnt without having to endure any complex workflow architectures. The philosophy adopted by the GridAnt project is to use the workflow engine available with Apache Ant and develop a Grid workflow vocabulary on top of it. In other words, we provide a set of Grid tasks to be used within the Ant framework.
    • Kepler Project – The Kepler project’s overall goal is to produce an open-source scientific workflow system that allows scientists to design scientific workflows and execute them efficiently using emerging Grid-based approaches to distributed computation. Kepler is based on the Ptolemy II system for heterogeneous, concurrent modeling and design.
    • JOpera – JOpera for Eclipse is a rapid service composition tool offering a visual language and autonomic execution platform for building distributed applications out of reusable services, which include but are not strictly limited to Web services. Due to its generality, JOpera for Eclipse has a wide range of applications: from classical Workflow Management and Business Process Automation, Enterprise application integration, to Virtual laboratories (e.g., scientific workflows, bioinformatics), Cluster and Grid computing and even Data Stream processing.
    • BpmScript – BpmScript aims to make writing Business Processes simple by handling Workflow, Web Services Orchestration and Sheduling. BpmScript has an embedded ServiceMix ESB. This allows it to take advantage of the prebuilt ServiceMix components (e.g. SOAP, FTP, Email, File, RSS, Jabber, JMS etc.)
    • JaCOB – PXE’s BPEL implementation relies on the JACOB framework to implement the BPEL constructs. The framework provides the mechanism necessary to deal with two key issues in implementing BPEL constructs: Persistence of execution state and Concurrency. By rolling up these concerns in the framework, the implementation of the BPEL constructs can be simpler by limiting itself to implementing the BPEL logic and not the infrastructure necessary to support it.
    • Tempo – Intalio Tempo is a set of runtime components that support BPEL4People to bring workflow functionality to a BPEL engine. Tempo provides one possible User Interface for users to manage their tasks, which goes beyond the scope of BPEL4People. Tempo only provides runtime component. It does not provide tools to generate workflow processes nor does it provide forms.
    • Oryx – Oryx is a web-based, extensible modeling platform. You can create diagrams in many different modeling languages and share them.
    • GWES – The Generic Workflow Execution Service (GWES) is the workflow enactment engine. GWES coordinates the composition and execution process of workflows in arbitrary distributed systems, such as SOA, Cluster, Grid, or Cloud environments. The Generic Workflow Description Language (GWorkflowDL) is based on Petri nets. It provides interfaces to Web Portal frameworks and to a command line clients. The workflow service supports pure Web Services and Globus Toolkit 4.
    • Java Workflow Tooling – The Java Workflow Tooling project (JWT) aims to build design time, development time and runtime workflow tools and to foster an ecosystem of interoperable Business Process Management (BPM) platforms.
    • ZBuilder

      ZBuilder3 is a second generation of workflow development and management system which intends to be an open source product. It defines a set of standard JMX management interfaces for different workflow engines and their workflows. Abandoned

    • Twister – Twister’s aim is to provide a new generation, easily integrable, B2B oriented workflow solution in Java, based on the latest specification efforts in this field. The process engine is based on the BPEL business process specifications and Web Services standards. Abandoned
    • MidOffice BPEL Engine – MidOffice BPEL Editor (MOBE) is an open-source platform for process orchestration which executes, monitors, adjusts and terminates pre-defined processes). The platform is implemented using J2EE technologies and standards like BPEL, XML and SOAP. Abandoned
    • jawFlow – JawFlow is a Workflow Engine partially conformal to WfMC ( directives. It ia based on XML Process Definition Language (XPDL) and activities can be written in Java or any BSF based scripting language. JawFlow is composed of modules that are JMX Mbeans. Abandoned
    • Beexee – Bexee is a BPEL engine capable of executing deployed business processes described in BPEL by orchestrating existing Web Services. Abandoned
    • OpenWFE OpenWFE is an open source java workflow engine. It features 3 components, easily scalable : an engine, a worklist and a web interface. Its workflow definition language is inspired of Scheme, a Lisp dialect, though it is expressed in XML.Abandoned

    • Antflow – AntFlow (Onionnetworks)is a tool for the automation and scheduiling of data system tasks, including those with complex dependencies and workflow logic. Antflow represents a new approach to simplifying system automation that leverages pipelines of hot folders chained together to perform a given task. Using XML, Antflow associates an automated task, such as data transfer, compression, or encryption, with a directory on the local system. Whenever a file is copied or written into the hot folder, the associated task is executed and the file is moved to the next hot folder in the pipeline for further processing.Abandoned

    Please let me know if I missed something that should be in the list.

    Share the article!

    Best Practices for Service API Definition

    Share the article!

    In recent days I have come across a couple of interesting articles on the web on how to define service APIs.

    The first one titles “Web API Documentation Best Practices” from ProgrammableWeb. The author writes about the importance of good documentation in that it encourages and keeps developers interested in the service and also helps reduce support costs. The article describes some basic areas that should be covered by documentation such as having an overview, a introduction section, sample code, and references. The article further recommends the following best practices:

    • Auto-generate Documentation
    • Include Sample Code
    • Show Example Requests and Responses
    • Explain Authentication and Error Handling

    Mark Blotny wrote that “Each Application Should Be Shipped With a Set of Diagnostics Tools. He writes that developers typically have limited access to the production servers. However in the event that something goes wrong, developers require the capabilities to perform an investigation in reasonable time to identify to uncover the causes of the problems. He writes that a service api should have the following:

    • Each integration point should include a diagnostic tool.
    • There should be accessible logs for each call to an external system.
    • Service peformance data should be accessible by developers.
    • All unexpected errors should be logged and easily accessible.

    Finally there is an article by Juergen Brendel who wrote “The Value of APIs that Can be Crawled“. He writes that a service API should be designed such that it can be discovered via a crawler. Although this requirement is commonsense for anyone concerned with SEO, it unfortunately isn’t quite common for developers of service APIs.

    The notion of a decentralized index who’s data is populated by crawlers should in fact be key technology component of any Service Oriented Architecture (SOA). Surprisingly however, despite the success of search engine companies like Google, this component is absent in most of all SOA stacks I have seen. In SOA stacks, there is a notion of a service directory, in most implementations the assumption is for a centralized service and the onus is on each service to register and provide appropriate and current information to the directory. It appears to be logically the same thing, however what scales in practice is the decentralized index/crawler and not the centralized directory.

    These three articles show that there is in fact value in providing service functionality that goes beyond the documented functional requirements. There is in fact research that I came across that documents this in a more comprehensive manner. Here are the property groups that a service may provide:

    • Temporal
    • Locative
    • Availability
    • Obligations
    • Price
    • Payment
    • Discounts
    • Penalties
    • Rights
    • Language
    • Trust
    • Quality
    • Security

    I love lists exhaustive lists like this because it reminds me of what I may be missing. Speaking of which, it reminds me that Web Services Modeling Ontology (WSMO) has something formal along similar lines. In fact, if you really want to go into the deep end with service contracts, you can read this.

    It is interesting how this non-functional attributes (i.e. ilities) align well with the idea of Aspects Oriented Programming and can be implemented in a proxy like infrastructure. That is in fact what it appears that existing “API Management” firms (ex. Mashery, Sonoa, WebServius, 3Scale) appear to provide. Here are some examples of the features that these API Management firms are offering:

    • Reporting, Analytics and Visualization dashboard.
    • Traffic management and rate-limiting
    • Security, Access Control, Authorization.
    • Mediation – Protocol Bridging.
    • Monetization
    • User management and provisioning. Self service provisioning.
    • Community management. Portal, Access key management, FAQs, Wiki
    • Scalabilability. Clustering, Caching.
    • Threat Protection – Denial of service attacks.
    • Versioning
    • Operations Management. Root cause analysis. Logging.

    Share the article!

    Some More SOA Design Patterns

    Share the article!

    Did some more googling around and have uncovered a couple more noteworthy SOA patterns. These are from the following sources:

    • Agent Itinerary – Objectifies agent itineraries and routing among destinations.
    • Forward – Provides a way for a host to forward newly arrived agents automatically to another host
    • Ticket – Objectifies a destination address, and encapsulates the quality of service and permissions that are needed to dispatch an agent to a host address and execute it there
    • Delegation – The debtor of a commitment delegates it to a delegatee who may accept the delegation, thus creating
      a new commitment with the delegatee as the new debtor.
    • Escalation – Commitments may be canceled or otherwise violated. Under such circumstances, the creditor or the
      debtor of the commitment may send escalations to the context Org.
    • Preemption – To cancel a commitment based on conflicting demands.
    • Barrier – Guards an action and specifies (pre)conditions on its execution
    • Co-location – Two or more resources are to be co-located at a certain time and place for a specified duration.
    • Correspondence – Relating two pieces of information each owned by a different participant
    • Deadline – Some information is required for an action before a certain time after which an alternate action is taken
    • Expiration – Some information will become invalid at a certain point in time (not shown in figure)
    • Notification – On-state-change xe2x80x9cpushingxe2x80x9d of information to enforce Correspondence.
    • Query – On-demand periodic polling of information to enforce Correspondence
    • Retry – Retrying an action a number of times before resorting to an alternate action
      Selection Choosing from among similar service offerings from multiple participants according to some criteria
    • Solicitation – Gathering information about service offerings from participants
    • Token – Issuing a permission for executing an action to other participants
    • Saga – How can we get transaction-like behavior or complex interactions between services without transactions.
    • Obligation Management – Allow obligations relating to data processing to be transferred and
      managed when the data is shared
    • Sticky Policies – Bind policies to the data it refers to

    A couple of them are redundant with other patterns in other texts. You can find these patterns here:

    Share the article!

    Lean Development Applied To SOA

    Share the article!

    I’ve been doing a little bit of musing (“Is SOA an Agile Enterprise Framework?“) about a development framework to support SOA. However, maybe Lean Production/Thinking would be better fit for SOA. In an earlier entry I mused about “How Web 2.0 supports Lean Production“. Let’s turn this question around and ask the question “How can Lean development be used in support SOA development?”.

    Lean focuses on the elimination of waste in processes. Agile in comparison is tuned toward practices that adapt efficiently to change. There are though some commonalities between Lean and Agile Development. These specifically are:

    • People centric approach
    • Empowered teams
    • Adaptive planning
    • Continuous improvement

    The last two bullet points align directly with the SOA Manifesto. For reference:

    • Business value over technical strategy
    • Strategic goals over project-specific benefits
    • Intrinsic interoperability over custom integration
    • Shared services over specific-purpose implementations
    • Flexibility over optimization
    • Evolutionary refinement over pursuit of initial perfection

    Lean differs from Agile in that Lean is purported to be designed to scale (see: “Set-based concurrent engineering” and Scaling Lean and Agile Development ):

    One of the ideas in lean product development is the notion of set-based concurrent engineering: considering a solution as the intersection of a number of feasible parts, rather than iterating on a bunch of individual “point-based” solutions. This lets several groups work at the same time, as they converge on a solution.

    In contrast, agile methods were meant for smaller more nimble development teams and projects. One would therefore think that in the context of enterprise wide SOA activities, Lean principles may offer greater value than the Agile practices. Well, let’s see if we can convince ourselves of this by exploring this in more elaborate detail.

    Lean Software Development is defined by a set of “seven lean principles” for software development:

    1. Eliminate Waste – Spend time only on what adds real customer value.
    2. Create Knowledge – When you have tough problems, increase feedback.
    3. Defer Commitment – Keep your options open as long as practical, but no longer.
    4. Deliver Fast – Deliver value to customers as soon as they ask for it.
    5. Respect People – Let the people who add value use their full potential.
    6. Build Quality In – Don’t try to tack on quality after the fact – build it in.
    7. Optimize the Whole – Beware of the temptation to optimize parts at the expense of the whole.

    Can we leverage these principles as a guide for a better approach to SOA development?

    Where can we find waste in the context of software development? Poppendieck has the following list:

    • Overproduction = Extra Features.
    • In Process Inventory = Partially Done Work.
    • Extra Processing – Relearning.
    • Motion = Finding Information.
    • Defects = Defects Not Caught by Tests.
    • Waiting = Delays.
    • Transport = Handoffs.

    What steps in SOA development can we take to eliminate waste? Here’s a proposed table:

    Waste in Software Development Lean SOA
    Extra Features If there isn’t a clear and present economic need for a Service then it should not be developed.
    Partially Done Work Move to a integrated, tested, documented and deployable service rapidly.
    Relearning Reuse Services. Employ a Pattern Language. Employ Social Networking techniques to enhance Organizational Learning.
    Finding Information Have all SOA contracts documented and human testable on a shared CMS. Manage Service evolution.
    Defects Not Caught by Tests Design Testable Service interfaces. Test driven integration.
    Delays Development is usually not the bottleneck. Map the value stream to identify real organizational bottlenecks.
    Handoffs Service developers work directly with Service consumers (i.e. developers, sysadmins, help desk)

    The customers for a Service are similar to the customers for an API. I wrote years ago about how the design and management of APIs leads to the development of damn good software. The same principles can be applied with the lean development of services. Taking some wisdom from that, here are some recommended practices for Lean SOA:

    • Designing Services is a human factors problem.
    • Design Services to support an Architecture of Participation. Focus on “Organizational Learning
    • Focus on what a user of your Service will experience. Simplicity is the #1 objective. Only when this has been accomplished (at least on paper) do we talk about implementation details.
    • Services arenxe2x80x99t included in a release until theyxe2x80x99re very simple to use.
    • In tradeoff situations, ease of use and quality win over feature count.
    • Useful subsets of standards are OK in the short term, but should be fully implemented in the longer term.
    • t

    • Continuous Integration – Fully automated build process and tests.
    • Always Beta – Each build is a release candidate; we expect it to work.
    • t

    • Community Involvement – Community needs to know what is going on to participate. Requires transparency.
    • Continuous Testing.
    • Collective Ownership.
    • Preserve Architectural Integrity –
      Deliver on time, every time, but must preserve architectural integrity. Deliver quality with continuity.
    • Services First – When defining a new Service, there must be at least one client involved, preferably more.


    Finally, there is one last principle in Lean development “Decide as Late as Possible” that is of high importance. The ability to decide as late as possible is enabled by modularity. The absence of modularity makes composing new solutions and therefore new integrations extremely cumbersome. The key however is not to become a Cargo Cult and practice Lean SOA without understanding how one achieves modularity (or to use another phrase “intrinsic interoperability”). The key is to understand how to achieve that. That of course is the subject of the Design Language that I am in the process of formulating.

    In conclusion, Lean SOA (A mashup of Lean and SOA) follow these principles:

  • Eliminate Waste – Spend time only on what adds business value.
  • Create Knowledge – Disseminate and share Service knowledge with the organization and its partners.
  • Defer Commitment – Be flexible before you optimize.
  • Deliver Fast – Deliver value quickly and incrementally. Don’t try to boil the ocean.
  • Respect People – Let the people who add value use their full potential.
  • Build Quality In – Don’t try to tack on quality after the fact – build it in.
  • Optimize the Whole – Strategic goals over project-specific benefits.
  • Build Interoperability In – Services should be designed to be modular.

  • This is a simple as it gets. The devil of course is in the details.

    Notes: You can find an earlier and different take on this here: SOA Agility.

    Share the article!