The SRE Low Frequency Reporting Platform

By

Grant Mitchell And Ashley Bullock

sre-logo

Hello, this is the first in a series of blog entries we intend to write about all things Site Reliability Engineering at Paddy Power Betfair. As this is our first ever blog post (so please be kind!), we thought we’d write about a straightforward project we ran recently, our low frequency reporting platform. We’ll discuss what we built, why we built it and some of the outcomes we saw. This project is quite simplistic, and likely not 100% transferable to your estate, however, hopefully our thoughts on the project may be useful, even if the detailed “how we did it” would not be :). The framework was delivered over a few sprints, the services running on it are continually being developed.

What is a Low Frequency Reporting Platform?

During our day to day work, we have discovered the need to collect and process lots of small datasets at a relatively low frequency such as daily or hourly collections. These include reports such as hardware health, app performance or information from ServiceNow. The typical way this is tackled in our environment would be to create a service or App – internally known as a TLA (we name them generally with a three letter acronym). All TLA’s are deployed with an A or B appended to their hostname. An update and re-deployment of a TLA using “A” will become “B” after re-deployment is complete. The next deployment will then become A, then B, etc… Rather than develop individual TLA’s for each little project, we decided to create a framework to host “small” services.

Continue reading “The SRE Low Frequency Reporting Platform”

The SRE Low Frequency Reporting Platform