I’ve been asked to share my thoughts regarding highly scalable Transaction Processing. Transaction Processing has been an exhaustively researched area in computing for several decades now and a lot of the brightest minds have likely beaten this subject to death by now. Scaling transaction processing is an extremely difficult problem because it tries to find a solution to two diametrically opposite forces, that is ACID (single entity) versus Distributed computing (replicated entities).
Transaction Processing systems have been invented in the past using either of two communication mechanisms, RPC and Queued. Application servers like EJB containers can trace their lineage back to Transaction Monitors. These prescribed to a RPC style of communication to do transaction processing. One would use a JCA adapter to communicate to XA resources managed by JTA. Underneath the covers RPC would use the Two-Phase Commit protocol (TPC). In its most general form, the Paxos algorithm would be the mechanism to ensure consistency amongst distributed systems. In the quest to sell more capable hardware at exponentially higher prices, a RPC based approach seems to be an ideal motivator.
The problem though with RPC based Transaction Processing is that it is difficult to scale to many cheaper boxes. ( Scaling to use more servers is more economical since the cost increases only linearly and the horsepower isn’t bound to the latest technology the industry can provide.) The queued approach however has been shown over the years (TBD: show reference) to be the right approach to achieving scalability. The balance one works with is Latency versus Consistency. In the RPC approach, latency is compromised for consistency in that TPC-like communications are very expensive. In the Queued approach, latency is also compromised in favor of consistency in that communication processing is performed asynchronously and not immediately.
A good analogy to explaining the difference in the approaches can be made by examining the difference in a pessimistic versus optimistic locking approach. In the pessimistic approach, one takes a preventive approach, such that all data is always synchronized. This is done, by reserving resources prior to execution. This reservation however reduces the amount of concurrency in the system. Think for example of a version control system where a developer has a file locked, with developers having to wait until it is unlocked. By contrast, a optimistic locking approach (aka. versioning), allows multiple developers to continue on their work in a non-blocking manner. Queueing is by its nature non-blocking and thus ensures maximum parallelism. However, there will always be that critical section that works on shared resources. The hope however, is that this critical section is performed in a single machine and in a manner that keeps locks short-lived.
At an abstract level you can gain insight from Patt Heland’s ‘Life Beyond Distributed Transactions‘ paper, where he emphasized the constraints that a system requires to enable transactions. A truncated list of his requirements are enumerated as follows:
- Entities are uniquely identified – each entity which represents disjoint data (i.e. no overlap of data between entities) should have a unique key.
- Multiple disjoint scopes of transactional serializability – in other words there are these ‘entities’ and that you cannot perform atomic transactions across these entities.
- At-Least-Once messaging – that is an application must tolerate message retries and out-of-order arrival of messages.
- Messages are adressed to entities – that is one can’t abstract away from the business logic the existence of the unique keys for addressing entities. Addressing however is independent of location.
He writes about Entities that have extremely limited distributed transaction capabilities. To achieve scalability one has to define bundles of data are fixed in their ability to participate in transactions. These bundles of data or partitions cannot perform transactions with other partitions. When partitioning for scalability, you will need to route traffic to the specific partition that is capable of performing the transaction completely locally. It’s kind of a link the dual of NoSQL. In NoSQL data is duplicated (i.e. denormalized) in many places to achieve maximum scalability. In a transactional system, the data that is processed in a transaction should reside logically in only one place.
Cameron Purdy’s talk “Traditional Programming Models: Stone Knives and Bearskins in the Google Age” that mentions a FOREX exchange clearing house where a third of the transactions are EURUSD. This requires the EURUSD trades to be routed to a single box that handled the EURUSD market. Only the order submission and the order fill are performed against the database for durability reasons, everything else just is a flow through in memory data. This system was 1000x faster than the original system and had orders were processed in under 3ms.
Billy Newport in his slides “Using Chained Transactions for Maximum Concurrency Under Load” also employs the same basic blueprint in showing how he scales a eCommerce system. The partition that he performs is that each SKU is an entity that ensures the transaction integrity of data. Messages are routed through the system that contain instructions and its corresponding undo instructions. This undo capability allows the system to perform cascading compensating transactions in the event that a transaction down the chain isn’t able to complete successfully.
In summary, the fundamental guideline to enabling high scalability transactions is simple, “can you find a partition that allows your transactions to be performed in a single box?” This partitioning is without a doubt be dependent on a specific set of use cases. So for FOREX, it is the currency pairs (i.e. EURUSD) market and for eCommerce it may be the Product SKU and its inventory. However, what if your use cases don’t have a natural partition? The solution the would be one that would be analogous to how airlines handle over booking of their flights. That is, hope that the likelihood is low and that if the scenario does occur, then use out of band compensations to correct the violations of the business rules. How this kind of compensation mechanism is implemented is an extremely interesting topic by itself.
Interestingly enough, my high level thoughts about transactions hasn’t changed much over the years.