The Power of Open-Source in Cyber Security

The beauty of open source software is that it allows you to create, experiment and transform code, and even give it a higher purpose. After discovering and deep diving into a new and exciting security scanning tool, with the help of our engineering team, we began making this tool into something more. What initially could have been used for red-teaming, bug bounty hunting or hacking in general was transformed into a tool that can help blue teams defend against the bad guys better.

A bit over a year ago, Paddy Power Betfair’s Application Security Engineering team started an endeavour to adapt a trending, secrets scanner, called shhgit. On a daily basis, our team works on creating and implementing the necessary tools to ensure applications are developed and delivered with the best quality standards to existence. As soon as we learned more about this tool, we all immediately saw its potential to help us raise awareness and proactively reduce the possibility of leaking sensitive tokens and secrets into source code.

What is Shhgit?

Shhgit, is an open-source tool that finds committed secrets and sensitive files across GitHub and its Gists in real time. Simply put, it makes heavy usage of GitHub’s API to find public code repositories containing leaked secrets or files. It was developed to raise awareness and bring to life the prevalence of this issue.

Underneath this high-level definition, shhgit is using a simple regular expression engine to match patterns the user can define against every line of code that exists in that repository. Once a user pushes code into a public repository, the application is triggered to perform a full scan in that repository and emit alerts on findings. The base and default configuration contain over 130 signatures, things like AWS Keys, Google Cloud’s or SSH keys. It can detect all these things and many, many more. Some of these secrets, follow specific formats (like the ones mentioned), and that can used to our advantage, to create tools that can detect them.

Why use a secrets scanner?

Data leakage is one of the most common threats companies face, as well as any other software development project, for that matter. Leaking a secret token into the public might not necessarily mean a breach, but it surely facilitates malicious attackers to leverage that knowledge when choosing vectors of attack. The matter is so relevant that lots of tools focus on detecting and alerting when sensitive data is exposed. Amazon´s AWS GuardDuty is probably one of the most well-known examples out there.

Highs and lows of scanning tools – how we got to develop the blue team version

While this tool provided great visibility and information over sensitive data made public, it only sent alerts once the secret was already pushed into the public. Furthermore, it was made so that it sent alerts via a web application and would not easily integrate with our existing notification system. On the other hand, the tool was not ready to be set per repository. That was a key feature we felt required so that we could drop the false positive numbers we could potentially get when using this tool. If each development team could specify the patterns that are relevant to them, or the files that should be ignored, that would ensure that the tool would only notify us when trouble came about. This level of customization was not possible in shhgit, and it is typically hard to find in many other existing solutions.

While these tools provide info about potentially sensitive data made public, they are reactive. Most often, they cannot be customized, making it difficult to be used. They fail to adapt to the specifics of a project, or a company.

While the concept was good, these results gave us the push to understand the tool´s core engine which turned out to be really simple: Regex matching and string comparisons. This single configuration file allows the customization of all options relevant to the scanner. Moreover, we wanted repository-level configuration, so that the tool could adjust itself to the specific needs of each project.

Therefore, we optimized the tool and basically created a blue team focused tool, from scratch, written in another language, reusing most concepts from shhgit, closely integrating our systems and development practices.

How it works

With the way the tool currently works, it allows developers to set-up a secret scanner directly in their GitLab’s repositories which will scan new code additions in new merge requests. It can be seen like a step in our continuous integration (CI) pipeline and is an attempt to proactively try and stop developers from accidently leaking database credentials, email addresses, AWS keys or other sensitive data into source code. The base configuration holds formats for pretty much all the tokens you can think off, but here comes the main plus of the tool: you can customize it to your needs, including files and paths to ignore – like test directories and test files, at repository level. This ensures the tool adapts to your project, and not the other way around.

When the tool starts a new scan, (I.e., a new merge request was opened), it searches the repository for a configuration file and if it exists, it will use that instead of the default one. Do you want to ensure no company email is leaked into your source code? Add a regular expression with your company’s domain and there you have it. Do you use a tool that has a predictable token format, and it is not there in the default configuration file? Add it in yours! Don’t want to run the scans in your test files? No problem, add them in the ignore lists.

For the time being, the project has a little roadmap of features we would like to implement in the near future. These are extensions from the base idea, in hopes that more people can benefit from this project, and also to ensure the tool efficiency.

The first step is, naturally, to integrate the scanner back in GitHub. The platform contains numerous projects, both private and public, and the tech community has a huge presence there. Therefore, we think integration in there is a must. On the efficiency and portability, we want to work on implementing key entropy detection (to try and discover secrets from word entropy, particularly relevant for really sensitive or critical projects), customizable notification level (as in, blocking merge requests or simply alerting developers), and, finally, officially publish a Docker image of this project to facilitate its integration in development teams.

At Paddy Power Betfair, we always work on improving our tooling to ensure a safer software development life cycle. With all our knowledge, research and hands-on experience we believe this tool really fills an existing gap. Luckily, Paddy Power Betfair encourages and allows teams to be openly curious, build new things and grow, both the people and the company. For this I am wholeheartedly thankful. Not everywhere could we find such support to work on this tool and publish it as open-source software.

Since the tool’s origins are open source, we wanted to keep it that way. We saw this as an opportunity to give something back to a community that gives so much to developers and teams.

We would love to see inputs and contributions from the tech community, so feel free to contribute to this project. If you are interested in knowing more, or just setting yourself up, please check the repository in GitHub for further information.

The Power of Open-Source in Cyber Security

Identifiers for UI testing: a reflection based approach

Typically a graphical user interface(GUI) application has a companion GUI or simply UI testing process attached to it which is responsible to ensure that the UI of the product meets its specifications.

One of the most common things about UI testing is the need to assign identifiers to UI elements, usually called views. Usually these identifiers are assigned manually which easily scales to a painful process.

This article aims to describe an approach to get rid of that burden using reflection, which allows us to do runtime inspections.

Continue reading “Identifiers for UI testing: a reflection based approach”

Identifiers for UI testing: a reflection based approach

My objects are aging too fast

A couple weeks ago we were making some performance tests in a legacy java component, in preparation for a week of expected high load. We wanted to find problems that may cause the services to fail during this important period so we could be prepared and preemptively perform some mitigation actions. We were not familiar with the inner workings of the service, there were few metrics available and it had never been closely monitored under this kind of load, so we had some work ahead.
At high level the service’s job was not too complex, it received messages from multiple rabbitMQ queues, did some processing over the data and finally published the result to an outbound rabbitMQ exchange and different Kafka topics.

Continue reading “My objects are aging too fast”

My objects are aging too fast

Adopting Distributed Tracing – Part 1

In this article, we’ll cover the introduction of distributed tracing in Betfair. First, we outline our problems with monitoring and logs, and explain why we think tracing is important. Then we detail the steps we took to enable tracing the bet placement in our platform. If you are looking for an open source implementation of a distributed tracing system, we propose you to check out https://www.jaegertracing.io/ . Jaeger is also our choice here in Betfair. Yuri Shkuro has written a great book on the subject.

Continue reading “Adopting Distributed Tracing – Part 1”

Adopting Distributed Tracing – Part 1

Showcasing the Betfair Exchange at Kafka Summit, London

ks

A few months ago, I had a unique opportunity to represent Betfair at the prestigious Kafka Summit in London where I presented the story behind how the Betfair Exchange evolved technologically from a monolithic system with serious reliability problems that was tightly coupled to an RDBMS, to a modern, event-driven, scalable and reliable platform, that is the backbone of our global business today.

Continue reading “Showcasing the Betfair Exchange at Kafka Summit, London”

Showcasing the Betfair Exchange at Kafka Summit, London

The Unintended Consequences of Technology: Second Order Effects

By

Mark Smyth


“Every action has a consequence, and each consequence has another consequence… Be careful when making changes.” – Josh Kaufman

Frank Chen of Andreessen Horowitz spoke at the AI Summit in London in June 2018 on an area he calls “The Autonomy Ecosystem”1. The centrepiece of the presentation is the addition of a new artefact to tomorrow’s world – the self-driving car. The narrative to date around autonomous vehicles has predominately been focused on two significant consequences; human drivers being made redundant by technology and lives being saved through reduction in errors. In many instances, those two points are contrasted almost as cost and benefit. The debate of whether or how we proceed with the revolution of transport appears to rest on a simple net equation. Technology does not live in a vacuum however and the car touches a lot of ecosystems. As we come to understand the implications from the advent of new technologies, we should look towards the farther-reaching consequences and understand the role we have in shaping effects.

Continue reading “The Unintended Consequences of Technology: Second Order Effects”

The Unintended Consequences of Technology: Second Order Effects

Rebuilding the Gaming lobbies – part 2

By

Alex Cioflica, Cristian Bote and Tiberiu Krisboi


In the first part, we’ve taken you through benchmarking and choosing and the right frameworks by using PoCs, now we’ll take a deep-dive into our app.

Multi-product, multi-brand app

The big challenge was how to deliver several products for multiple brands from the same codebase, with no overhead or performance penalties when developing new features. Similarly challenging is that everything needs to be easily maintained.

Continue reading “Rebuilding the Gaming lobbies – part 2”

Rebuilding the Gaming lobbies – part 2

Rebuilding the Gaming lobbies – part 1

By

Alex Cioflica, Cristian Bote and Tiberiu Krisboi


Just a few days before the World Cup started, we sequentially released a major rework of all our Gaming sites covering both redesigning all the products and a complete under the hood revamp. Rebuilding our technology stack was a massive effort across multiple teams and departments, ranging from Product, Design and Marketing to Tech, Content, SEO and the list goes on.

In this lengthy article we’ll talk about our journey through choosing the right technology, learning from different mistakes and ultimately achieving what we set out to do.

Continue reading “Rebuilding the Gaming lobbies – part 1”
Rebuilding the Gaming lobbies – part 1

Ensuring Stability and Responding to Failures on the Betfair Exchange

The Betfair Exchange is a truly 24/7 product as part of a global business, both in terms of the customers that bet on it and where the events they are betting on take place. Uptime and stability of the Exchange technology stack are of paramount importance to the business and its customers. Even a few minutes downtime could mean a huge loss in revenue and customer impact, as an important horse race runs without any in play bets being placed or a customer misses a crucial trade out opportunity as part of their carefully tuned betting strategy. Therefore, keeping the Exchange platform running and having an early indication of any problems are always at the forefront of our minds on the Exchange development team.

Continue reading “Ensuring Stability and Responding to Failures on the Betfair Exchange”

Ensuring Stability and Responding to Failures on the Betfair Exchange

SAML 2.0 SSO certificate rollover

At Paddy Power Betfair (PPB) we build software not only for us and our direct customers but also for several partners. We’re responsible for empowering them to run their betting businesses without having to build and run a whole infrastructure from scratch.

Some of the web applications we build for our external partners run within our infrastructure but are meant to be used by both, us and them.

Continue reading “SAML 2.0 SSO certificate rollover”

SAML 2.0 SSO certificate rollover