Q&A: Facebook product security chief Collin Greene

06.14.2021 | 24'' read

Q&A: Facebook product security chief Collin Greene

by Ryan Naraine

[ Transcript presented by Eclypsium]

This is the transcript of a Security Conversations podcast interview with Facebook product security leader Collin Greene. We discuss his career from Street Fighter summers in Seattle to being part of the ‘security 1%’ at Facebook, the intricacies of building bug bounty programs, and why taking Ls is a part and parcel of being in security. It has been edited for brevity and clarity.

[ Click here to listen to the full episode ]

Ryan Naraine: What does the head of product security at Facebook do?

Collin Greene, head of product security, Facebook

Collin Greene: Broadly, we find, fix and build structures to prevent risks to the business. That could mean a security vulnerability or an architecture design flaw early at the idea stage. There’s a response function when we screw-up something and a bug makes it out the door and we need to fix it. But, generally I don’t do anything cool these days – my problem domain is people. I help build and tend to the team that does the actual cool stuff.

Being Sanguine With Taking Ls

Let’s discuss the reality of ‘taking an L’ or taking losses in cybersecurity. The inevitability of mistakes is something you’ve written about extensively. Can you take a step back and describe what the reality of a modern security program looks like, and how suffering through a data breach fits into that?

Yeah, I like the way you put that. We’re not perfect. We’re human beings, right? Whether we’re building a bridge or a widget or a desk or something insanely complex like software comprising hundreds of million lines of code and other people’s components – perfection might be the target, but it’s not something that anyone is going to be able to achieve. Even The Challenger blew up.

Even when you try so hard, you’re not going to hit perfection. You, your organization, or the software you’re protecting are always going to take an L. Maybe you won’t get hacked, but there are still things you will do wrong – all of us will. The sooner we can all accept that, the smarter we can be about allocating our time and energy to give a balanced, risk appropriate response.

To get a little more concrete, this means we can’t sit in ivory towers and build perfect software that’s formally verified. We want to stack up different layers of defense – like SDLC and sandboxing – to make sure the bugs that are out there don’t see the light of day.

My mindset here is that I know there are a bunch of security vulnerabilities in my code right now, and I don’t know about them. I want to stack up as many things as possible that reduce the probability of those bugs turning into breaches, for lack of a better phrase. And even if they do turn into breaches, I have some agency over how I respond.

And the bugs that do get through are lessons in a way, right?

That’s exactly right. Every little bug bounty that comes through is by definition something that was missed by all the layers and check marks we set up. So it teaches us a lesson and allows us to build more feedback mechanisms, tools, and protection.

You’re a Seattle guy, which put you in an interesting place where you got to interact with some security stalwarts early on…

I grew up in this small town in the middle of nowhere, 2.5 hours out of Seattle where no one was interested in computers or security, and that was fine. I was one of those kids that loved building stuff like Fords and Legos – building and breaking them. I was always enchanted with this idea of being a hacker in middle school and wanted to figure out what that was all about. I hit a turning point when I was around 15 years old, when I would ask my parents to drive me up to Seattle and drop me off at 2600 meetings.

I completely lucked out with a wonderful group of mentors that I still hugely respect today. I had Didi – Michael Eddington – who worked on Peach Fuzzer; Riley Eller, who I guess you could say was a leader of the ghetto hackers; and Jeremy Bruestle, who started CoCo Communications where I got my first official computer job. I think Jeremy wrote AirSnort as a primer for WEP cracking. My friend and mentor Frank Heidt was also there, who would drive me back from the meetings sometimes because he also lived in the middle of nowhere.

They would really encourage me with projects like building ciphers to figure out certain encryptions, fixing exploits, or building tic-tac-toe in C. They were always ready to be pestered with questions. I remember building a crummy operating system with a bootloader and going to them with questions as soon as it got hard. Those times were wonderful because there was no gatekeeping.

You mentioned luck there. Being in Seattle puts you in that Microsoft world. Can you speak a bit more to that, and how Leviathan came into being?

I worked at Boeing for six months and quickly got vacuumed into the Microsoft security world. I rightly got fired from Boeing – or I didn’t have my contract renewed, so more of a soft firing – which was the correct decision. I remember calling Frank Heidt after walking out of the Boeing building and he interviewed me on the phone and I joined the next day to start on a consulting basis. We were once again lucky here; I think we convinced Matt Miller to join from Kansas City. That group eventually became Leviathan around 12 months later.

[ READ: Remembering Dan Kaminsky ]

When I first got wind of Leviathan, it was around the time of the infamous Vista pen test. There was a new pen testing industry starting at that point – NGSS, you guys, IOActive… We lost Dan Kaminsky recently, and many of the obituaries mentioned the Vista pen testing period as well. Can you speak to how a pre-Vista XP world looked like with respect to security? How did it change?

I took so much away from that period at a professional, personal, and strategic level. Software in those days went in a box and went out the door. Agile existed in a limited capacity, but it was more waterfall-focused. There were different gates and you took actions to progress to the next gate, if I were to oversimplify it.

I was lucky enough to be a kid in a room with a group of geniuses like Kaminsky and Iliya who were looking at the interconnection of surfaces within Vista before it shipped. I remember writing a fuzzer for the mail slot protocol from which we found a bug, which was very cool. The most notorious thing I actually remember from that summer – besides writing fuzzers in a windowless room for 4 months – was getting completely annihilated in Street Fighter 2 by Kaminsky over and over again as he laughed without mercy.

It was a magical summer where I passively learned so much about the big pieces and down in the details. Stuff like identifying bugs, looking for patterns, and how to extrapolate the existence of bugs by looking at abstractions and how they might or might not fit together.

Skinning the Security Org Cat

You have written about product security and one of the lines you wrote was about how “bugs are an observed example of insecurity, and everything in product security revolves around bugs”. Can you add some color to that statement and how true it still holds today?

I’m wrong and I’ve changed my mind! That article was from 2017, and I change my mind all the time. My background and universe then was fully focused around bugs. What I meant at the time was that bugs were interesting because they were pointers to larger things that were maybe wrong with the way the product design was thought out. Bugs are point-in-time instances of insecurity. And if you find a bug, it’s probably one among 30 others lurking in your code base.

It’s kind of like fishing. If you catch two fish from a school of fish, you haven’t won the game. You know there are plenty more there underwater. You win when you’ve prevented the entire family of bugs from being possible in the first place. That was my narrow product-centric view from a few years ago.

While that statement is still partly true, I now know there’s a much richer world of ways we can influence the outcome of secure software. For example, I could have a 30-minute meeting with the Instagram team and tell them not to build their CDN a certain way because it would lead to DDoS attacks. That’s just a conversation in words that becomes product design, code, and a large number of bugs that are never written. So those bugs are never found. That’s just another example of the shift-left mindset I talk about.

I now recognize that you have many tools in your toolbox to influence things. What we really care about is risk, and bugs are just an important data point to help us understand if something is risky or secure.

Speaking of shift-left, how are we educating developers here? How are we giving them the right runway and set of tools? Where does that fit into your allocation of resources?

I’ll preface this by saying that security is hard not only because we don’t know where everything is and whether we will find them all, but also because you often need to use intuition in resource allocation scenarios. At my current job and in previous jobs, developer education has always been a big deal. When we built the HTML rendering framework that became REACT, we got to work on the security of that framework at Facebook and Instagram.

You want to guide people towards doing the right and secure thing by default, which is super obvious. This isn’t just at the code level, but also at the education level. Success for us is developers doing the right thing automatically. That is often because you make the secure solution the most alluring one – it’s less code, it’s friendlier and ergonomic, it fits their use case, and it was already taught to them in their onboarding class. You want to build pits of success where you can accidentally fall into doing the right thing.

To give you a strict allocation of resources, we have 120 people on the product team and it’s probably the full-time job of one or two people to handle security education through forms, videos, classes, and other means. It also involves A/B testing the wording and guidance we use on static analysis rules and having something loosely resembling a Top 10 list. All of these things are done to tick up the probability that the right thing happens.

Security teams around the world are divided into the “haves” and “have nots”. Your team at Facebook is definitely in the “haves” bucket. Can you talk a bit about this security tax that’s a reality in our world and whether the industry can ever get ahead of it?

You’re exactly right. At Facebook, compared to the other places I’ve worked at, we are the “security 1%ers” for sure. We are very lucky. I’d even say we’re spoiled! We’d have to make different trade-offs if we were in another company. It’s also important to note that we have a different class of problems at Facebook.

I’ve worked at Facebook for 11 years, and I left for 2 years in between to build the Uber security team, which I think presents a nice contrast. The Uber security team was 8 people when I joined and we grew it to around 100 over the course of a year or two. What I found was that you get a lot of things “for free” at companies like Facebook, Google, and Square that help you immensely with security even if they aren’t security-related. Examples of this are inventory of third-party code, inventory of laptops, vendor relationship management, and ownership of code.

Ownership of code is a huge thing. When I walked into Uber, found bugs, and assigned it to a person to fix them, I didn’t get any response for days even after following up. I think it was a pretty serious externally available SQL injection. When I escalated it to their manager, I found that they wrote the code but didn’t own it. So I just wrote the fix and landed the code, but then I figured out that was a pattern.

So now you’re trying to recreate things from scratch and trying to navigate how ownership of code moved around…

Yes, exactly. Companies are like people, right? They have preferences and goals and different things that they consider important. Facebook and Google care about quality and security, those things matter. Uber cared about quality and security as well, but 20 other things mattered as well. We ended up having to speak with the Head of Engineering and the CEO after we found enough of these bugs and having to convey that security was a structural problem.

That was a huge awakening moment for me where I realized I was just a fish in this giant pond. I had to make my case with empathy, understand what other people cared about, and frame it that way to get the company to move in a different direction. We had all these bugs that no one was fixing, and the bigger problem was that no one owned 90% of our code base, so we were adding junk on top of junk. So we had to create different engineering processes to change that.

Down the Bug Bounty Rabbit Hole

A big part of security improvements at Uber had to do with your use of bug bounty programs and how you became focused on shift-left. In your writing on this, you put security vulnerabilities into some interesting buckets. You say the best outcome is the bug that never makes it into code.

Absolutely. You can argue either way if tools are a part of that bucket. My canonical examples are abstractions over SQL so you can’t write SQL injections, or write HTML templating with auto-escape on by default so you can’t write XSS. The best outcome is if you can prevent the thing in the first place.

Your other vulnerability buckets are “found automatically” and “found manually” that are left of Boom. Then you include bug bounty programs under “found externally”, which are in the middle or even right of Boom.

I think about this in terms of risk. It’s better for us to find bugs internally or with external consultants than it is for bug bounty researchers to find it, because the carrot we dangle for them is one of reputation and money. We tell them “Hey, don’t exploit this thing, tell us instead because we’re the best recipient for it and we’ll pay you the most”. But I would always rather have increased control over that bug. I’d like to know about the WhatsApp update bug ourselves before a researcher tries to sell it to us.

The last two buckets you mention are arguably the worst. The first one is “never found”. You say that most bugs end up in this bucket. Even with an ideal security program, do you think most bugs will end up not being found?

Call me pessimistic, but yes! There’s honestly a lot of guesswork here. That’s what is so cool about our field, right? Rainforest Puppy found SQL injection 15 years ago when we weren’t even looking for it or thinking about it. In our lifetime, there will be many more classes of security vulnerabilities found that are lurking today and we aren’t even thinking about.

The last bucket is “exploited”, which is when you find out a bug has been exploited in the wild. What good comes out being on the receiving end of an experience like that and learning from it? How do those days feel like?

I’ll hinge this answer on a specific example. In 2010-11 at Facebook, someone found and exploited a security bug in something called the Puzzle Server, which was actually a recruiting game. Facebook wanted to draw out the best software engineers and told them to write a solution to the 8 Queens problem in C++, upload it to the Facebook team, and they’d execute it. It got popped and we had a giant response internally to clean it up.

In the moment of responding to an exploit, I found that I was just searching for the next clue. It started with just a routine compliance scan from Qualys, our vulnerability scanner. We found something weird in the wrong DMZ, which led to us looking for logs that were missing, which was even weirder, which then led to us finding a host that moved and never-before-seen IP addresses, and the journey of discovery just kept continuing.

The entire company pivoted that summer for a few months. There were some internal-facing bugs, but we prioritized fixing them immediately in case this threat actor had seen our source code. It was an incredible and very meaningful experience for me. I had been a theoretical security person for most of my career, but being on the receiving end, understanding the impact and consequences was very different.

How did that experience light a fire under you?

Oh, like never before. For the first few weeks, there was extreme clarity. Every software engineer at the company was figuring out what happened. What did the threat actor touch? Where did he go? Where is he coming from? Did he leave any tracks? Call law enforcement. Oh, he wrote his own custom shell that deletes his history, can we extract anything from the hard drive based on that info?

Everything just explodes into these different work streams that are progressing at varied speeds and, looking back on it, that was exhilarating and I learned a lot from it. It wasn’t theoretical anymore, and it actually felt bad. I was outsmarted by some kid who noticed something I missed or screwed up. I felt like I was letting people down. There was definitely an emotional component to it – of not letting this guy win.

Looking back, there’s definitely something about being exploited that can be turned into fuel and be used to improve things so that the next exploit doesn’t happen, or at least its probability is reduced.

You say “we aim to shift as many bugs as possible left”. Can you share some percentage breakdowns of bugs found in the different buckets you listed and how they have changed over the years? Basically, is it possible to measure the effectiveness of shift-left security?

Thankfully, yes it is possible to measure it. I’ll make one small correction to what I said earlier – just replace ‘bugs’ with ‘risk’ whenever I mentioned it in past quotes. Things like misconfigured firewalls matter a ton too; focusing on bugs was just my lens at the time.

At Facebook, we have a security review program. Historically over recent years, about a third of the bugs we find every year come from that, found either internally through manual review or by third-party consultants. About 95% of these are found internally by the team and 5% by the consultants. Another third of the bugs came from bug bounty programs, and the remaining one third came from tools like static analysis and fuzzing.

That ratio has shifted over time. Now, about half of the bugs we find come from homegrown tools, which reduces the burden on security reviews and bug bounty programs. That is a measurable shift left.

We’ve also tried to measure bugs prevented through milestones, which is obviously a very subjective and tricky thing. We identify a class of bugs and measure that we were getting 40 of them a year for four years, for example. And now we measure that, once everyone is using the secure-by-default library, we have zero of those bugs and that’s held true for two years, which is great and tangible progress.

People vs Tools

When you say homegrown tools, are these tools your team have conceptualized, written, built and used internally? Can you speak a bit more on that?

Yes, that’s exactly right. There are three big buckets of tools and the longest running one for us is static analysis. Tying this back to the Facebook hack I talked about earlier – we actually found the hacker and they were in jail for a while, but that entire episode got us worried about bugs in our source code. Our source code was written in PHP, which is a janky language. We wanted to find all these bugs in a scalable manner. A couple of amazing engineers hacked together a rough static analysis tool for PHP that could find web app bugs. That has grown with time and is now a big part of our arsenal.

This tool is called Zoncalon now, finds a large number of our bugs, runs on thousands of different pull requests, and is the backbone of our security operations.

We will always conduct security reviews, but the outcome of those reviews aren’t just bugs. There are also patterns that turn into documentation and static analysis rules, there are roadmap items that get added to the secure-by-default library, and there are other things that all feed back into one another.

The topic of homegrown tools often brings up a security retention issue. When your senior engineer – who created the tool – gets bored and moves onto the next thing, how do you continue maintaining the tools with a new team?

Yes, it’s a problem that we face for sure. Whenever I get a group of people together, my job as a manager is to help them be successful, help them do useful work, and help them do stuff that they care about. Motivation is a huge part of that. We’re tackling the security retention problem by breaking the security team into two big chunks – security engineers and software engineers.

Security engineers are hacker-y types who want to use and break the tools, improve them, find bugs, run bug bounty, and do security reviews. Software engineers still care about security, but they usually just like building infrastructure. Those folks really like to stick with their projects for longer because it’s kind of their baby. When you build these things, you get a wonderfully fast feedback loop. Your customer here is not some angry person in the middle of nowhere, it’s the person sitting next to you who will use it on every build every day, complain to you if something doesn’t work and give you props if you’re doing awesome.

These organizational structures along with incentives that ensure people know what they’re working on is important are how we’ve tackled the retention problem.

How do you decide when an open source strategy makes sense? For any of these homegrown tools, how do you decide if you can release them as open source? Are there risks involved in making these tools open source and helping others?

I think Facebook has screwed this up some times in the past that I haven’t been too closely involved with. On the security team, we generally try to open source everything by default when we can responsibly open source it. Just pressing a button to open source something might be useful, but it’s much more useful to dedicate an internal team that invests time, listens to the bug reports, and maintains the tool because it’s valuable to us and to the community. Responsibly open sourcing stuff is the bar we hold ourselves to. We don’t want to fire and forget it.

To answer the last part of your question, the more eyes we have on our tools the better. If people find bugs in our code, that’s awesome. If people use it to find bugs in their code and get any value out of it, that’s a win.

Coming back to bug bounty programs – can you talk about ramping up a bug bounty program from scratch and some of the hidden costs that go into it?

I started the bug bounty program at Facebook a decade ago and we learned a lot there. Going to Uber and starting the bug bounty program there resulted in an entirely new set of mistakes. Drawing from those experiences, my first piece of advice would be to have your house in order. Bug bounty is the icing on the security cake – it’s not the first thing you do. When your company cares about security, when you have ownership of code, when you have a reasonable process to fix a vulnerability: those things all need to happen before you can ramp up a bug bounty program.

A tactical lesson I would share is to start with the smallest possible scope and define it really well. Bug bounty is really hard. You’re asking security engineers who want to find bugs to perform a customer service job, in some respects. Their new co-workers are a bunch of people on the internet that can sometimes be hard to communicate with. They have the company’s best interests at heart, but it can also inherently be an adversarial thing. So properly scoping it out and telling people to be patient with you as you figure it out is a good place to start.

The other thing I’ll say about bug bounty programs is they can be a one-way door. Once you turn them on, there’s no real way to turn them off. People will send you bugs all hours of the day. That again comes back to why starting small and scoping crisply is important.

Segueing back to the people problem, the most taxing part of the bug bounty ecosystem is triaging. Taking all the incoming noise, filtering it, and trying to figure out what is what.

I agree that it’s a very hard job. At Facebook you’re getting a hundred emails a day with varying levels of ability to communicate a technical issue to you. They have all been incentivized to hype that it’s the biggest deal in the world. It often takes a series of back-and-forth emails to decipher what they really mean. Of course you sometimes have bug bounties that are like professional pen test reports, like the ones from Shubs. But most of them are like conversations, so you have to draw out the details and assume good intentions. It’s a really difficult job that we respect at Facebook.

It’s also a very essential job. We talked about shift-left earlier, and bug bounties are the ultimate feedback into the earlier parts of the security system. It gives us things that were by definition missed by all the other layers.

I find that the bug bounty guys who went off to look for riches are now coming back and helping us define what web security looks like. Do you agree that we are in good hands?

We absolutely are. It’s my strong belief that bug bounty has been an awesome on-ramp for the next generation of people, and that’s a big deal. It’s a really healthy outlet for people to hack stuff today without the threat of getting sued or being put in jail – there’s some freedom there. I feel this camaraderie in the community where they are all on Slack, encouraging each other, finding new stuff, and growing. I have high hopes for the next generation.

To leave on a more realistic note, I’d also add that we’re getting better but the pie is expanding at the same time. Things are getting better and worse at the same time. New security initiatives are coming in, mature ones are getting better, but it’s lumpy. It’s not evenly distributed among companies or software.

Attack surfaces are expanding faster than we can react to them.

Yes, that’s a better way of putting it.

Facebook
Twitter
Linkedin

Newsletter

Q&A: Facebook product security chief Collin Greene

Share

Tags