Skip to content.

Manageability

Sections
Personal tools
You are here: Home » news

How to Achieve Highly Scalable Transaction Processing

  • Posted by admin
  • Published: 2011-03-30

I've been asked to share my thoughts regarding highly scalable Transaction Processing. Transaction Processing has been an exhaustively researched area in computing for several decades now and a lot of the brightest minds have likely beaten this subject to death by now. Scaling transaction processing is an extremely difficult problem because it tries to find a solution to two diametrically opposite forces, that is ACID (single entity) versus Distributed computing (replicated entities).

Transaction Processing systems have been invented in the past using either of two communication mechanisms, RPC and Queued. Application servers like EJB containers can trace their lineage back to Transaction Monitors. These prescribed to a RPC style of communication to do transaction processing. One would use a JCA adapter to communicate to XA resources managed by JTA. Underneath the covers RPC would use the Two-Phase Commit protocol (TPC). In its most general form, the Paxos algorithm would be the mechanism to ensure consistency amongst distributed systems. In the quest to sell more capable hardware at exponentially higher prices, a RPC based approach seems to be an ideal motivator.

The problem though with RPC based Transaction Processing is that it is difficult to scale to many cheaper boxes. ( Scaling to use more servers is more economical since the cost increases only linearly and the horsepower isn't bound to the latest technology the industry can provide.) The queued approach however has been shown over the years (TBD: show reference) to be the right approach to achieving scalability. The balance one works with is Latency versus Consistency. In the RPC approach, latency is compromised for consistency in that TPC-like communications are very expensive. In the Queued approach, latency is also compromised in favor of consistency in that communication processing is performed asynchronously and not immediately.

A good analogy to explaining the difference in the approaches can be made by examining the difference in a pessimistic versus optimistic locking approach. In the pessimistic approach, one takes a preventive approach, such that all data is always synchronized. This is done, by reserving resources prior to execution. This reservation however reduces the amount of concurrency in the system. Think for example of a version control system where a developer has a file locked, with developers having to wait until it is unlocked. By contrast, a optimistic locking approach (aka. versioning), allows multiple developers to continue on their work in a non-blocking manner. Queueing is by its nature non-blocking and thus ensures maximum parallelism. However, there will always be that critical section that works on shared resources. The hope however, is that this critical section is performed in a single machine and in a manner that keeps locks short-lived.

At an abstract level you can gain insight from Patt Heland's 'Life Beyond Distributed Transactions' paper, where he emphasized the constraints that a system requires to enable transactions. A truncated list of his requirements are enumerated as follows:

  • Entities are uniquely identified - each entity which represents disjoint data (i.e. no overlap of data between entities) should have a unique key.
  • Multiple disjoint scopes of transactional serializability - in other words there are these 'entities' and that you cannot perform atomic transactions across these entities.
  • At-Least-Once messaging - that is an application must tolerate message retries and out-of-order arrival of messages.
  • Messages are adressed to entities - that is one can't abstract away from the business logic the existence of the unique keys for addressing entities. Addressing however is independent of location.

He writes about Entities that have extremely limited distributed transaction capabilities. To achieve scalability one has to define bundles of data are fixed in their ability to participate in transactions. These bundles of data or partitions cannot perform transactions with other partitions. When partitioning for scalability, you will need to route traffic to the specific partition that is capable of performing the transaction completely locally. It's kind of a link the dual of NoSQL. In NoSQL data is duplicated (i.e. denormalized) in many places to achieve maximum scalability. In a transactional system, the data that is processed in a transaction should reside logically in only one place.

Cameron Purdy's talk "Traditional Programming Models: Stone Knives and Bearskins in the Google Age" that mentions a FOREX exchange clearing house where a third of the transactions are EURUSD. This requires the EURUSD trades to be routed to a single box that handled the EURUSD market. Only the order submission and the order fill are performed against the database for durability reasons, everything else just is a flow through in memory data. This system was 1000x faster than the original system and had orders were processed in under 3ms.

Billy Newport in his slides "Using Chained Transactions for Maximum Concurrency Under Load" also employs the same basic blueprint in showing how he scales a eCommerce system. The partition that he performs is that each SKU is an entity that ensures the transaction integrity of data. Messages are routed through the system that contain instructions and its corresponding undo instructions. This undo capability allows the system to perform cascading compensating transactions in the event that a transaction down the chain isn't able to complete successfully.

In summary, the fundamental guideline to enabling high scalability transactions is simple, "can you find a partition that allows your transactions to be performed in a single box?" This partitioning is without a doubt be dependent on a specific set of use cases. So for FOREX, it is the currency pairs (i.e. EURUSD) market and for eCommerce it may be the Product SKU and its inventory. However, what if your use cases don't have a natural partition? The solution the would be one that would be analogous to how airlines handle over booking of their flights. That is, hope that the likelihood is low and that if the scenario does occur, then use out of band compensations to correct the violations of the business rules. How this kind of compensation mechanism is implemented is an extremely interesting topic by itself. Interestingly enough, my high level thoughts about transactions hasn't changed much over the years.

Cloud Computing is Software in a Box

  • Posted by admin
  • Published: 2011-03-14

Cloud Computing is all the rage these days. Problem here though is there are too many definitions floating around. I would like to however propose yet another definition. Cloud Computing is Software in a Box, or more concisely "Cloud Computing is Software in a Virtual Box".

What is "Software in a Box"? This is an idea that has been around for a long time. I wrote a blog about it back in 2003 (see: "An Alternative Way Of Packaging Software: Hardware Included!)". For Cloud Computing this title would now read: "An Alternative Way Of Packaging Software: Data Center Included!".

Years ago, Bill Gates once referred to pc manufactures like Dell and Compaq (Now HP) as Value Added Reseller (VARs). That is all they did was add value to Microsoft's software by bundling their hardware with it. Selling software is all about packaging. Most software is bought because its packaging delivers convenience to customers. It is not usually whiz bang technology that makes the sale, rather is is about ease of use, that is usability. What can be easier than receiving a box from a vendor, then plugging it into the wall and into the network to get it running? Well it turns out, receiving a virtual box from a vendor and not be even easier!

The benefits of a "Software in a Box" were described in this old 2003 article:

  • Reduced development complexity - That is less configurations to support.
  • Higher Performance - Performance can be tuned to the hardware delivered with the software.
  • Better Security - The box can be hardened and tested prior to delivery.
  • Easier Provisioning - Just add power and network connectivity.
  • Reliability - Less configuration implies less parties to point finger at.
  • Pricing - People like paying for something they can touch.
  • Distribution - Ride on the coat tails of hardware vendors.

Fast forward now, 8 years later, and we have these same benefits for Cloud Computing:

  • Reduced development complexity - Software can be pre-configured, tested and hardened for the target cloud platform. See: AWS Cloud Formation for handling complex networks.
  • Higher Performance - Software can be pre-tuned to the target cloud platform. For example, if one were delivering a Machine Learning based application, one could tune a solution for Amazon's GPU cluster.
  • Better Security - The solution can be hardened in the cloud.
  • Easier Provisioning - Just sign up on a website.
  • Reliability - The cloud provider takes complete responsibility even for operational issues. One doesn't need the expertise to configure a high availability setup.
  • Pricing - Pay as you go, use only what you need.
  • Elasticity - Seamlessly scale when demand increases.
  • In a former life as a Product Architect, I was working on a slide deck that showed how my company's solution would fit in a prospective client's network. I had drawn a solution that involved multiple boxes to cover the scalability, availability and heterogeneity of the solution. The feedback that I received was that there were too many boxes! I had also noticed come deployment time with a customer, it became painstaking to have to attend so many network interconnectivity meetings. My eventual solution to this packaging problem was that the software would now be deployed in a blade chassis with all the components pre-configured into blades and the network pre-configured with a virtual router in a blade. We were now back to a single 16u box!

    The drawback of Cloud Computing as compared to Software in a Box is the fact that the customer can't hold it and as a consequence store it in one's premises. At a mammalian brain level, a lot of people can be very uncomfortable with this. There's also of course the concerns of hosting in a shared network,the security of data on shared storage and the robustness of network connectivity. To overcome these fears, one of course could deliver a "Cloud in a Box".

    Is High Scalability SOA an Oxymoron?

    • Posted by admin
    • Published: 2011-03-12

    All too many Service Oriented Architecture (SOA) practitioners seem to have a belief, that because SOA deals with distributed computing, that scalability is a given. The reality however is that conventional SOA practices tend to work against the development of high scalability applications. This article shows the properties of a system that can achieve high scalability and then contrasts it with conventional SOA practices.

    The patterns found in a system that exhibits high scalability are the following:

    • State Routing
    • Behavior Routing
    • Behavior Partitioning
    • State Partitioning
    • State Replication
    • State Coordination
    • Behavior Coordination
    • Messaging

    This has been discussed in a previous blog entry "A Design Pattern for High Scalability". SOA based systems conventionally cover Routing, Coordination and Messaging. However, the patterns of Partitioning and Replication are inadequately addressed by SOA systems. For reference, one can refer to the SOA Patterns book that I've covered in this review. The words "Partitioning" and "Replication" unsurprisingly can't be found in the book's index. Scalability apparently isn't a concern to be addressed by SOA patterns.

    What then are the patterns that we can introduce to SOA to ensure scalability? Here are a couple of suggested patterns from the previous article:

    • Behavior Partitioning
      • Loop Parallelism
      • Fork/Join
      • Map/Reduce
      • Round Robin Allocation
      • Random Allocation
      • Weighted Allocation
    • State Partitioning (Favors Latency)
      • Distributed Caching
      • HTTP Caching
      • Sharding
    • State Replication (Favors Availability in Partition Failure)
      • Synchronous Replication with Distributed Locks and Local Transactions
      • Synchronous Replication with Local Locks and Distributed Transactions
      • Synchronous Replication with Local Locks and Local Transactions
      • Asynchronous Replication with Update Anywhere
      • Asynchronous Replication with Update at the Master Site only

    How can these patterns be manifested in a SOA system?

    To achieve Behavioral Partitioning, the construct of a Command Pattern (see: Command Pattern) and the Functor Pattern can be used. In the conventional SOA architecture, behavior (as in executable code) needs to be propagated through the network, to be executed by receiving services. In lieu of a commonly agreed standard, one may either employ XQuery as a stand in for this capability. One should therefore can define services to accept XQuery in a way analogous to how SemanticWeb systems accept SPARQL. A key to achieving scalability is that behavior be allowed to be move close to the data that it will act on. Behavior that works on data through remote invocations is a guarantee to kill scalability. See "Hot Trend: Move Behavior To Data For A New Interactive Application Architecture".

    To achieve State Partitioning, SOA based system need to adopt the notion of persistent identifiers of data. WS-I has the notion of WS-Addressing which typically are used to reference endpoints as opposed to actual entities. What is needed is that this addressing or persistent identifiers act analogous to Consistent Hashing so that entities may be partitioned and accessible using multiple endpoints. Identifier based services would need to be stood up to perform the redirection to the endpoints.

    Finally, there is the issue of Replication to support availability and fail-over. The Identifier based services described early may function as routers to handle the fail-over. Alternative, one may employ proxy servers in the manner described in A New Kind of Tiered Architecture. The replication capability however will require the exposure of new kinds of services that support a replication protocol. The most basic of which would be to provide a Publish and Subscribe interface.

    To conclude, high scalability in SOA may indeed be possible. It is a bass-ackwards way of achieving high scalability, but if your only option is to use SOA, then there may just be a possibility to achieve it.

    Navigation
    visitors
    reading
     
     

    Powered by Plone

    This site conforms to the following standards: