Cover image for The Most Dangerous Systems Are the Ones That Work

The Most Dangerous Systems Are the Ones That Work; From 10Gbps to 100 Integrations (Seven Years of Security at Scale)

• by Craig Greenhouse

The Day We Drowned in Traffic

Seven years ago, I walked into a situation most engineers would recognise: a multi-million-pound programme that had been stuck for three years, a 25-person engineering team that had lost direction, and executives who had stopped believing delivery was possible.

The product was a network monitoring and analysis platform. The promise was simple: ingest 10 Gbps of live network traffic, extract metadata, identify patterns, and surface what mattered. In practice, we were drowning in packets.

But the technical challenge wasn't the real problem. Through interviews with team leads and engineers, I identified two deeper issues: first, the team had stopped believing the product was actually possible to build. Second, the different disciplines - backend, data, frontend, ML - were siloed and weren't communicating effectively.

I led a series of cross - team workshops to rebuild trust and alignment between groups that had stopped talking to each other. More importantly, I worked to help engineers believe they could solve problems they'd come to see as insurmountable. Sometimes the hardest part of delivery isn't the code - it's restoring confidence.

The technical challenge remained: networks generate an extraordinary amount of telemetry, most of it routine. Somewhere inside that flood of data were the outliers: rare ports, forgotten services, unusual protocol behaviour, devices that shouldn't be talking to each other. The kind of activity that precedes a breach.

We built a system around packet capture, metadata extraction, and ElasticSearch, with machine learning to establish baselines of "normal" behaviour. A GraphQL layer let analysts navigate the chaos without writing raw queries. The challenge wasn't just ingesting 10 Gbps - it was ensuring that query performance remained consistent as the dataset grew. Visibility at scale is worthless if analysts have to wait 30 seconds for results.

Within six months, we had a market - ready platform. But the lesson stayed with me: security only works when you can see what matters. You can't defend what you can't observe, and observation at scale requires more than logging - it requires context, pattern recognition, and the ability to distinguish routine activity from risk.

What Attackers Actually Look For

Network monitoring taught me something counterintuitive: attackers don't hunt firewalls. They hunt forgotten things.

They look for services running on non-standard ports. VPNs that were provisioned for a contractor three years ago and never decommissioned. Test environments that accidentally got exposed. Protocols being used in ways they shouldn't be. The outliers.

The systems that get breached aren't always the ones with obvious vulnerabilities. They're the ones that exist but aren't being watched. A database backup running on a misconfigured instance. An API endpoint left over from a migration. Access credentials that never expired.

Every breach I studied started the same way: something was there, it was reachable, and nobody was paying attention to it.

The hard part wasn't building detection capabilities - it was deciding what to surface. Too much signal and analysts ignore it. Too little and you miss the warning signs. We tuned the system to highlight deviations: traffic that didn't fit learned patterns, connections that hadn't been seen before, services behaving differently than their baseline.

The principle was simple: if you can't explain why something exists, it's probably a problem.

That insight would become far more relevant than I realised at the time.

Fast-Forward: A Payments Platform Made of APIs

Fast-forward to 2023. I joined a global payments platform as Digital Transformation Project Lead, reporting to the Group CISO. The brief was IAM and API security across a regulated fintech environment.

I expected payments rails, card processing infrastructure, and compliance frameworks. What I found was something more complex: a bank made of APIs.

The platform supported around 100 fintech clients, each with their own integration patterns, authentication models, and access requirements. There were thousands of API keys, OAuth clients, certificates, webhook endpoints, and sandbox environments. Payment service providers, card schemes, regulatory bodies, and enterprise customers - all connected through a web of identity, trust, and access control.

The architecture had evolved organically over years of growth. Different clients had been onboarded using different patterns. Many integrations relied on point-to-point VPNs - expensive, time-consuming to provision, and painful to maintain. Others used API keys, some OAuth2. There were legacy authentication flows sitting alongside modern standards. Test credentials that had become production credentials. Integrations that worked but nobody could fully explain.

It wasn't chaos - the platform functioned, processed transactions, and met regulatory requirements. But from a security perspective, it was fragmented. There was no unified model for identity, no consistent approach to onboarding, and no clear view of what access existed across the entire ecosystem.

The challenge wasn't building new capabilities. It was bringing coherence to a system that had grown faster than its architectural boundaries could scale.

SaaS is a Bigger Attack Surface Than Any Network

Here's what I learned: networks have edges. SaaS has combinatorics.

A network perimeter is conceptually simple. You have devices, connections, and boundaries. You can draw a diagram. The attack surface grows with infrastructure, but it grows linearly.

SaaS platforms grow differently. Every new customer adds accounts, roles, permissions, API keys, OAuth clients, webhook endpoints, certificates, and integrations. Each of those has a lifecycle: provisioning, rotation, expiry, revocation. Each can be misconfigured. Each can be forgotten.

Multiply that by 100 clients, each with their own access patterns, and the combinatorial complexity becomes the risk. It's not just "who has access?" - it's "who has access to what, under which conditions, through which integration, using which credential type, with what level of audit visibility?"

At the network monitoring company, we hunted rare packets - unusual protocol behaviour, forgotten services, connections that didn't fit the baseline. At the payments platform, I was hunting rare credentials: API keys that had never rotated, OAuth clients with excessive scope, service accounts with unclear ownership, test integrations that had quietly become production dependencies.

The fundamental problem was the same: exposure lives where visibility stops. But the scale and complexity were different. A network might have thousands of devices. A SaaS platform can have millions of access relationships, most of them implicit, many of them undocumented.

And unlike a network breach - which often triggers obvious alerts - a compromised API key or OAuth token can look exactly like legitimate traffic. The attacker doesn't break in. They log in.

Why Identity Is Now the Perimeter

In traditional network security, the perimeter was physical: firewalls, VPNs, network segmentation. In modern SaaS platforms, the perimeter is logical: authentication, authorisation, and access control.

This shift changes everything. Your security posture is no longer about keeping people out -it's about knowing exactly who's in, what they can do, and ensuring that access matches intent.

At the payments platform, this meant establishing coherent patterns across the entire identity and access lifecycle: onboarding, account provisioning, role assignment, credential rotation, scope management, audit logging, and eventual offboarding. Each stage had to work consistently whether the client was a small fintech in their first integration or an enterprise with complex multi-tenant requirements.

The challenge wasn't just technical - it was making these patterns understandable and adoptable. I ran workshops with dozens of fintech engineering teams, walking them through OAuth2, OIDC, certificate-based authentication, token lifecycle management, and secure integration patterns. Many of these teams were building their first regulated API integration. The concepts weren't intuitive, and the consequences of getting them wrong were significant.

What I learned: clarity scales better than complexity. If your security model requires a 40-page integration guide and a dedicated workshop to understand, adoption will be inconsistent and mistakes will be frequent. The goal was to design patterns that were secure by default, with guardrails that made the wrong thing hard to do accidentally.

We moved the platform toward standards-based architecture - OAuth2, FAPI, mTLS, and Zero Trust principles - not because they were fashionable, but because they provided a common language. Once a fintech team understood the model, they could onboard with minimal friction and low ongoing operational overhead.

The result was a system where identity and access control became predictable, auditable, and defensible. In a SaaS environment, your auth layer is your firewall.

The Most Dangerous Systems Are the Ones That Work

Here's the uncomfortable truth: breaches rarely happen in broken systems. They happen in working ones.

The integrations that pose the most risk aren't the ones that fail loudly - they're the ones that function quietly. The API key that was issued for a proof-of-concept three years ago and never revoked. The service account with admin scope that was created for a migration and then forgotten. The "temporary" VPN access that became permanent. The OAuth client with excessive permissions that nobody ever audited.

These aren't misconfigurations in the traditional sense. They're working integrations with trusted partners, using valid credentials, passing all authentication checks. From a logging perspective, they look identical to legitimate activity.

At the network monitoring company, we were looking for protocol anomalies - traffic that didn't fit learned patterns. At the payments platform, the equivalent was identity anomalies: credentials that existed but were rarely used, access grants that had never been reviewed, integrations whose original purpose had been forgotten.

The challenge is that security teams are trained to look for obvious failures: expired certificates, failed login attempts, malformed requests. But modern breaches often involve none of those things. Attackers use valid credentials obtained through phishing, supply chain compromise, or credential stuffing. They log in. They enumerate. They move laterally using legitimate access paths.

This is why traditional security tooling struggles. A pen test might find a vulnerability, but it won't find the OAuth client with excessive scope that was provisioned two years ago for a partner integration that's no longer actively managed. Compliance audits check that policies exist, not that every access grant is still justified.

The risk isn't in the systems that are broken. It's in the systems that work but are no longer understood.

What Both Worlds Taught Me

Looking back, the two experiences were more similar than they first appeared.

At the network monitoring company, the challenge was turning chaos into legible telemetry. We had raw packets, metadata, and machine learning baselines - but the real work was building context. Engineers needed to understand not just what was happening, but why it mattered. A cross-functional team had to align around a shared model of what "normal" looked like and what deviations warranted investigation.

At the payments platform, the challenge was turning fragmentation into coherent architecture. We had OAuth clients, API keys, certificates, and VPNs - but the real work was establishing patterns. Fintech teams needed to understand not just how to integrate, but why the patterns existed and what risks they were designed to mitigate. Distributed teams had to align around a shared model of identity, trust, and access control.

In both cases, the technical problem was solvable. The harder problem was organisational: getting teams to see the same system the same way, and ensuring that architectural intent survived contact with delivery pressure.

The lesson I took from both: risk lives where visibility stops.

At the network monitoring company, visibility stopped where telemetry wasn't being collected or analysed. At the payments platform, visibility stopped where access grants weren't being reviewed or understood. In both environments, the attack surface wasn't defined by what existed - it was defined by what existed but wasn't being watched.

This mindset matters even more when you're building security products rather than just operating them. As an application leader, the challenge isn't just building the detection engine - it's building the interface that translates combinatoric risk into actionable intent for users. Security teams don't want more data. They want clarity: what matters, why it matters, and what to do about it. The product has to encode that understanding.

The best security posture isn't the one with the most controls. It's the one where everyone knows what matters, and what to do when something doesn't fit.

Why Preemptive Exposure Beats Post-Mortems

Traditional security operates on a cycle: deploy controls, wait for an incident, conduct a post-mortem, add more controls. Pen tests provide snapshots. Compliance audits verify that policies exist. Vulnerability scans identify known CVEs.

None of this is useless. But none of it answers the question that matters most: what exposure exists right now, and what would an attacker do with it?

A pen test might run quarterly. An audit might happen annually. But your attack surface changes daily. New integrations go live. API keys get provisioned. OAuth clients are created for proof-of-concepts and forgotten. Service accounts accumulate permissions. Test environments get exposed. Partners change their infrastructure. Engineers rotate off projects and take their context with them.

If you only discover exposure after it's been exploited, you've already lost. The breach happened weeks or months ago. You're doing forensics, not defence.

The alternative is continuous exposure management: understanding what access exists, what it's being used for, and whether it still matches intent. Not as a one-time exercise, but as an ongoing operational practice. Not waiting for alerts after something goes wrong, but continuously identifying exposure before it becomes a breach.

This is harder than it sounds. It requires telemetry that captures not just events, but relationships. It requires baselines that distinguish routine activity from outliers. It requires tooling that can operate at scale without generating noise. And it requires teams that understand what they're looking at.

But the organisations that get this right operate differently. They don't wait for breaches - they identify and close exposure continuously. They don't rely on compliance theatre - they have real-time visibility into their security posture. They don't react - they stay ahead.

The attack surface is always growing. More SaaS. More APIs. More integrations. More partners. More credentials. More automation. More ways for things to exist without being watched.

The only winning move is to see yourself the way attackers already do.

👋 Enjoyed the article?

Book a Call with Us