Tag Archives: scalability

How to Achieve Highly Scalable Transaction Processing

Share the article!

I’ve been asked to share my thoughts regarding highly scalable Transaction Processing. Transaction Processing has been an exhaustively researched area in computing for several decades now and a lot of the brightest minds have likely beaten this subject to death by now. Scaling transaction processing is an extremely difficult problem because it tries to find a solution to two diametrically opposite forces, that is ACID (single entity) versus Distributed computing (replicated entities).

Transaction Processing systems have been invented in the past using either of two communication mechanisms, RPC and Queued. Application servers like EJB containers can trace their lineage back to Transaction Monitors. These prescribed to a RPC style of communication to do transaction processing. One would use a JCA adapter to communicate to XA resources managed by JTA. Underneath the covers RPC would use the Two-Phase Commit protocol (TPC). In its most general form, the Paxos algorithm would be the mechanism to ensure consistency amongst distributed systems. In the quest to sell more capable hardware at exponentially higher prices, a RPC based approach seems to be an ideal motivator.

The problem though with RPC based Transaction Processing is that it is difficult to scale to many cheaper boxes. ( Scaling to use more servers is more economical since the cost increases only linearly and the horsepower isn’t bound to the latest technology the industry can provide.) The queued approach however has been shown over the years (TBD: show reference) to be the right approach to achieving scalability. The balance one works with is Latency versus Consistency. In the RPC approach, latency is compromised for consistency in that TPC-like communications are very expensive. In the Queued approach, latency is also compromised in favor of consistency in that communication processing is performed asynchronously and not immediately.

A good analogy to explaining the difference in the approaches can be made by examining the difference in a pessimistic versus optimistic locking approach. In the pessimistic approach, one takes a preventive approach, such that all data is always synchronized. This is done, by reserving resources prior to execution. This reservation however reduces the amount of concurrency in the system. Think for example of a version control system where a developer has a file locked, with developers having to wait until it is unlocked. By contrast, a optimistic locking approach (aka. versioning), allows multiple developers to continue on their work in a non-blocking manner. Queueing is by its nature non-blocking and thus ensures maximum parallelism. However, there will always be that critical section that works on shared resources. The hope however, is that this critical section is performed in a single machine and in a manner that keeps locks short-lived.

At an abstract level you can gain insight from Patt Heland’s ‘Life Beyond Distributed Transactions‘ paper, where he emphasized the constraints that a system requires to enable transactions. A truncated list of his requirements are enumerated as follows:

  • Entities are uniquely identified – each entity which represents disjoint data (i.e. no overlap of data between entities) should have a unique key.
  • Multiple disjoint scopes of transactional serializability – in other words there are these ‘entities’ and that you cannot perform atomic transactions across these entities.
  • At-Least-Once messaging – that is an application must tolerate message retries and out-of-order arrival of messages.
  • Messages are adressed to entities – that is one can’t abstract away from the business logic the existence of the unique keys for addressing entities. Addressing however is independent of location.

He writes about Entities that have extremely limited distributed transaction capabilities. To achieve scalability one has to define bundles of data are fixed in their ability to participate in transactions. These bundles of data or partitions cannot perform transactions with other partitions. When partitioning for scalability, you will need to route traffic to the specific partition that is capable of performing the transaction completely locally. It’s kind of a link the dual of NoSQL. In NoSQL data is duplicated (i.e. denormalized) in many places to achieve maximum scalability. In a transactional system, the data that is processed in a transaction should reside logically in only one place.

Cameron Purdy’s talk “Traditional Programming Models: Stone Knives and Bearskins in the Google Age” that mentions a FOREX exchange clearing house where a third of the transactions are EURUSD. This requires the EURUSD trades to be routed to a single box that handled the EURUSD market. Only the order submission and the order fill are performed against the database for durability reasons, everything else just is a flow through in memory data. This system was 1000x faster than the original system and had orders were processed in under 3ms.

Billy Newport in his slides “Using Chained Transactions for Maximum Concurrency Under Load” also employs the same basic blueprint in showing how he scales a eCommerce system. The partition that he performs is that each SKU is an entity that ensures the transaction integrity of data. Messages are routed through the system that contain instructions and its corresponding undo instructions. This undo capability allows the system to perform cascading compensating transactions in the event that a transaction down the chain isn’t able to complete successfully.

In summary, the fundamental guideline to enabling high scalability transactions is simple, “can you find a partition that allows your transactions to be performed in a single box?” This partitioning is without a doubt be dependent on a specific set of use cases. So for FOREX, it is the currency pairs (i.e. EURUSD) market and for eCommerce it may be the Product SKU and its inventory. However, what if your use cases don’t have a natural partition? The solution the would be one that would be analogous to how airlines handle over booking of their flights. That is, hope that the likelihood is low and that if the scenario does occur, then use out of band compensations to correct the violations of the business rules. How this kind of compensation mechanism is implemented is an extremely interesting topic by itself.

Interestingly enough, my high level thoughts about transactions hasn’t changed much over the years.

Share the article!

Is High Scalability SOA an Oxymoron?

Share the article!

All too many Service Oriented Architecture (SOA) practitioners seem to have a belief, that because SOA deals with distributed computing, that scalability is a given. The reality however is that conventional SOA practices tend to work against the development of high scalability applications. This article shows the properties of a system that can achieve high scalability and then contrasts it with conventional SOA practices.

The patterns found in a system that exhibits high scalability are the following:

  • State Routing
  • Behavior Routing
  • Behavior Partitioning
  • State Partitioning
  • State Replication
  • State Coordination
  • Behavior Coordination
  • Messaging

This has been discussed in a previous blog entry “A Design Pattern for High Scalability“. SOA based systems conventionally cover Routing, Coordination and Messaging. However, the patterns of Partitioning and Replication are inadequately addressed by SOA systems. For reference, one can refer to the SOA Patterns book that I’ve covered in this review. The words “Partitioning” and “Replication” unsurprisingly can’t be found in the book’s index. Scalability apparently isn’t a concern to be addressed by SOA patterns.

What then are the patterns that we can introduce to SOA to ensure scalability? Here are a couple of suggested patterns from the previous article:

  • Behavior Partitioning
    • Loop Parallelism
    • Fork/Join
    • Map/Reduce
    • Round Robin Allocation
    • Random Allocation
    • Weighted Allocation
  • State Partitioning (Favors Latency)
    • Distributed Caching
    • HTTP Caching
    • Sharding
  • State Replication (Favors Availability in Partition Failure)
    • Synchronous Replication with Distributed Locks and Local Transactions
    • Synchronous Replication with Local Locks and Distributed Transactions
    • Synchronous Replication with Local Locks and Local Transactions
    • Asynchronous Replication with Update Anywhere
    • Asynchronous Replication with Update at the Master Site only

How can these patterns be manifested in a SOA system?

To achieve Behavioral Partitioning, the construct of a Command Pattern (see: Command Pattern) and the Functor Pattern can be used. In the conventional SOA architecture, behavior (as in executable code) needs to be propagated through the network, to be executed by receiving services. In lieu of a commonly agreed standard, one may either employ XQuery as a stand in for this capability. One should therefore can define services to accept XQuery in a way analogous to how SemanticWeb systems accept SPARQL. A key to achieving scalability is that behavior be allowed to be move close to the data that it will act on. Behavior that works on data through remote invocations is a guarantee to kill scalability. See “Hot Trend: Move Behavior To Data For A New Interactive Application Architecture“.

To achieve State Partitioning, SOA based system need to adopt the notion of persistent identifiers of data. WS-I has the notion of WS-Addressing which typically are used to reference endpoints as opposed to actual entities. What is needed is that this addressing or persistent identifiers act analogous to Consistent Hashing so that entities may be partitioned and accessible using multiple endpoints. Identifier based services would need to be stood up to perform the redirection to the endpoints.

Finally, there is the issue of Replication to support availability and fail-over. The Identifier based services described early may function as routers to handle the fail-over. Alternative, one may employ proxy servers in the manner described in A New Kind of Tiered Architecture. The replication capability however will require the exposure of new kinds of services that support a replication protocol. The most basic of which would be to provide a Publish and Subscribe interface.

To conclude, high scalability in SOA may indeed be possible. It is a bass-ackwards way of achieving high scalability, but if your only option is to use SOA, then there may just be a possibility to achieve it.

Share the article!

Design Patterns for Almost-Infinite Scalability

Share the article!

Pat Helland has written a very illuminative paper entitled “Life beyond Distributed Transactions: an Apostate’s Opinion“. Pat Helland in a former life worked on TP monitors (the pre-cursor to EJB) at Tandem and advocated SOA (the pre-cursor to WS-*) at Microsoft. A few people have remarked that Pat’s paper shows that he re-discovered ReST the ‘hard way‘. Pat never mentions ReST in througout his paper, but the similarities are clearly apparent. However, there are other interesting observations that are worthwhile noting down. These observations I believe make an additional step forward beyond the original ReST principles.

As a side note, I’ve been an advocate for years in favor of ReST. However, I don’t believe it’s the end-all in building web scale applications. There are a lot of useful patterns being discovered everyday that aren’t incompatible with ReST, but rather can be used complementary to it.

First of all, I completely subscribe to his title, that is distributed transactions as he calls it, a ‘Maginot Line’. I’ve made similar arguments about transactions in a past blog. Pat Helland’s paper reveals some of these patterns in building what he calls ‘alsmost-infinite scalability’.

  • Entities are uniquely identified – each entity which represents disjoint data (i.e. no overlap of data between entities)
    should have a unique key.
  • Multiple disjoint scopes of transactional serializability – in other words there are these ‘entities’ and that you cannot perform atomic transactions across these entities.
  • At-Least-Once messaging – that is an application must tolerate message retries and out-of-order arrival of messages.
  • Messages are adressed to entities – that is one can’t abstract away from the business logic the existence of the unique keys for addresing entities. Addressing however is independent of location.
  • Entities manage conversational state per party – that is, to ensure idemptency an entity needs to remember that a message has been previously processed. Furthermore, in a world without atomic transactions, outcomes need to be ‘negotiated’ using some kind of workflow capability.
  • Alternate indexes cannot reside within a single scope of serializability – that is, one can’t assume the indices or references to entities can be update atomically. There is the potential that these indices may become out of sync.
  • Messaging between Entities are Tentative – that is, entities need to accept some level of uncertainty and that messages that are sent are requests form commitment and may possibly be cancelled.

These are interesting observations that jive quite well with my table of loosely coupled APIs, as well as my previous arguments about distributed transactions and the need for provisional information and contracts.
I believe that these patterns can be implemented on top of ReST using in the manner I prescribe in my Speech Acts blog entry.

Pat Helland now works at Amazon, although I’m unaware of his influence on the development of S3, but it’s interesting to contrast some of the design principles of that product:

  • Decentralization: Use fully decentralized techniques to remove scaling bottlenecks and single points of failure.

  • Asynchrony: The system makes progress under all circumstances.

  • Autonomy: The system is designed such that individual components can make decisions based on local information.

  • Local responsibility: Each individual component is responsible for achieving its consistency; this is never the burden of its peers.

  • Controlled concurrency: Operations are designed such that no or limited concurrency control is required.

  • Failure tolerant: The system considers the failure of components to be a normal mode of operation, and continues operation with no or minimal interruption.

  • Controlled parallelism: Abstractions used in the system are of such granularity that parallelism can be used to improve performance and robustness of recovery or the introduction of new nodes.

  • Decompose into small well-understood building blocks: Do not try to provide a single service that does everything for every one, but instead build small components that can be used as building blocks for other services.

  • Symmetry: Nodes in the system are identical in terms of functionality, and require no or minimal node-specific configuration to function.

  • Simplicity: The system should be made as simple as possible (- but no simpler).

There clearly is an emerging consensus on the architecture of ‘almost-infinitely scalable’ applications. Furthermore, revealing insights from the experiences of Google, Amazon, Yahoo and eBay are sheding a lot of light on this subject.

Share the article!