The Betfair exchange is a pioneering, market-leading online betting Exchange. It is the biggest and most mature betting exchange around.
What is a betting exchange?
A betting exchange allows customers to bet against each other rather than against a bookmaker. This differentiates them from traditional betting shops and bookmakers as the betting exchange allows the user to act as the bookie (by setting the odds for an event) or the customer (who bets using the odds set by the other user).
Whereas traditional bookmakers accept the risk of going head-to-head in various bets with customers, the business of a betting exchange does not involve any risk. The exchange simply provides the technology that pairs customers together in order for bets to take place and takes a commission from the net winnings that result.
What happens when a bet is placed?
In betting exchange systems, orders submitted to the system by the customer are validated against the customer’s “Total available balance”, the amount available to wager at any given time. If there are enough funds to cover these orders, the orders are passed on to the next stage where they are processed by the bet matcher. The Betfair Exchange manages a customer’s balance and risk (exposure) separately. This means your ‘available to bet’ or ‘total available’ balance is your account balance minus the current exposure associated with any bets you have placed that have not yet been settled.
What is exposure and how is it calculated?
Exposure is the customer’s liability to cover their bets if the worst possible outcome occurred.
So we sum maximal losses across all bets in a market, and that’s your exposure calculation.
This replicates the interaction you have when you’re in a betting shop – you place a bet and pay the maximum liability of your bet up front – but this is a very primitive and naive model for calculating exposure. Betfair permits its users to spend potential winnings on further bets. This is good for the customer because the amount they can lose is accurately calculated.
In its simplest form, the exposure calculation involves evaluating a customers bet’s profit and loss for every possible outcome of the market. Considering the amount of bets Betfair processes, as one might expect, this is quite computationally intense.
BnE (Balance and Exposure) – Before
Exposure was calculated by PL/SQL code running on our Oracle DB server, which has 10s of cores, and was the highest consumer of the processing resource.
On a typical Saturday afternoon, our database was hitting 95% CPU. We could no longer scale out. In the following graph, this is represented by the blue, green and orange. For reference these are the top 10 SQL calls, the pink is for everything else.
BnE (Balance and Exposure) – Now
Over the years we have moved big chunks of stored procedure business logic from the database and created microservices in Java. So the next candidate was Balance and Exposure (BnE) as this was accounting for most of the high CPU usage.
We wanted BnE to be a replicated state machine that takes its inputs from a replicated log. At the time of creating BnE, we found that industry adopted replicated log solutions such as Kafka were not fit for purpose, due to the fact that these solutions tend to favour availability over consistency. The dev team looked for other solutions and decided to create an implementation of the raft consensus algorithm. Again at the time of designing BnE, there was no readily available mature raft implementation or anything similar on the C side of CAP.
Once we completed development, BnE replaced Oracle as our new account wallet and exposure storage along with the exposure calculation logic;
Below is what the ecosystem looked like when it was finished, but again this has changed during the last couple of months.
To say the least, the whole journey was very interesting. There was so much that we didn’t know existed in our legacy estate and we had to cater for this during the development, as we were replacing a section of the exchange that has existed since day one.
After months of development, one of the trickiest parts was how we roll this new component into production without any of our millions of customers noticing, ensuring everything was working as expected. We decided we’ll first have the component run in parallel with what we currently have in production and we wrote up an application to send us comparison differences between the two. The new instance did not transact any orders – it just received the same streams as the production build, ran its logic and wrote the output to a location that we could review. We then checked each comparison difference and fixed any issues we came across. Over the months the number of differences was eventually cut down to zero.
There were a lot of positive results after BnE was fully productionised. Below are a few of them.
So, if you remember what the load looked like before BnE (see graph above), below you can see what it looked like after rolling out BnE.
The load was cut to just under 20% compared to 95% during a Saturday afternoon. There’s still work ongoing to cut this even further.
Another very positive result is that the exchange now handles more transactions per second than it used to before, exposure calculation is now near instantaneous.
Also, those of you who use our Exchange API to place bets may have seen ERROR_IN_MATCHER error code. One of the reasons behind this error code was that when exposure calculation was done in PL/SQL it couldn’t handle concurrent exposure calculations, so internally this would produce the error “Exposure calculation already in progress” and we’d return ERROR_IN_MATCHER to the caller. This is no longer true. Concurrent exposure calculation can now be processed within BnE.