Issue #77: Incident Response
Thanks to @cseifert and @jalex206 for helping shape my perspective on this topic.
Smart contract exploits continue to happen on a near weekly basis, and 2022 losses already exceed $2B.
Fortunately our ability to detect exploits before they happen, and as they happen, is getting better. We can analyze past attacker behavior, identify common patterns, and use those patterns to monitor for and detect future attacks.
The evolution of “threat detection” is exposing another chink in Web 3’s armor though - protocols’ ability to react to threats when they are identified, something known as incident response.
The issue with incident response processes in Web 3 today is they are entirely manual. A group of humans have to collectively agree and intervene, which is painfully slow. When we’re talking about programatic attacks against smart contracts executed in seconds or minutes, human-based response is the equivalent of bringing a spatula to a gunfight.
The current dynamic between threat detection and incident response was articulated beautifully on a Zoom call this week by a protocol core dev - “great, so now we know we’re fucked sooner.”
Touché sir.
Across the board, teams are ill-equipped to respond to threats when we they see them. So what’s the solution - Automated response? Introducing friction into the transaction flow to slow down hackers? Incorporating reputation? Everything’s on the table.
Let’s zoom in…
Peanut Butter and Jelly
I know most folks reading this aren’t security experts. Neither am I, so I’ll keep it high level and use food analogies :)
Imagine you are operating a DeFi protocol today. What are all the things you need to do from a security standpoint to protect your system, and user funds? This simple framework highlights the big things…
On the right, you see real-time monitoring and incident response, and they are like the peanut butter and jelly of security. You just can’t have one without the other. You need to monitor your system so you know what’s happening in real-time. But even the best alerts aren’t helpful if you don’t have a good process for responding to them, quickly.
When they both come together (in the right ratios)…magic. Personally, I’m 60% peanut butter, 40% jelly, but to each their own.
Anyway, a year ago protocols weren’t doing any threat monitoring. Today, the biggest protocols are doing some, but incident response processes are lagging behind.
Incident Response
Most protocols’ incident response processes rely on two components - a multisig, and a “pause button”.
A multisig is a wallet that requires two or more signatures to sign a transaction. In the context of incident response, multisigs are used to distribute authority and ensure one person can’t do something unilaterally. The downside of distributing control is you sacrifice speed.
In the event of a hack, the manual process of contacting multisig signers, informing them of the situation, then coordinating and signing a transaction takes too long.
Current best practice when an attack happens is to invoke what we call the pause button. It’s the equivalent of a circuit breaker in DeFi that when triggered, pauses the protocol, or a part of the protocol, for a period of time.
A research paper published last month analyzed 181 smart contract exploits over the last four years. According to their data, 87 of the 183 victim protocols support the emergency pause mechanism (47.5%). However, only 51 of the 87 protocols (58.6%) pauses their protocol within 48 hours, and only one protocol pauses within the first hour of the incident.
The paper concluded what I’m sure you’re all thinking - the fact that it takes hours (!) to pause the exploited contract limits the effectiveness of an emergency pause mechanism, almost to the point of why bother.
Where we go from here?
So we’re getting better at detecting threats, and that is forcing us to rethink our incident response processes, which as you’ve learned are entirely manual today and too slow to actually prevent or mitigate an attack. So where do we go from here?
The natural evolution is from manual → automated. Here are a few ideas:
Circuit Breakers
Circuit breakers are temporary measures that halt trading to curb panic-selling on stock exchanges. U.S. regulations have three levels of a circuit breaker, which are set to halt trading when the S&P 500 Index drops 7%, 13%, and 20%.
Also, the single-stock circuit breaker is designed to prevent trades from occurring outside specific price bands of a security, which are 5% above and 5% below the average reference price of a security in a 5-minute interval.
These TradFi models could be implemented in some form at the protocol level in DeFi and automatically triggered in the event of attacks and extreme market turbulence.
Implementing circuit breakers isn’t without complications though. First, DeFi markets are global and 24/7. Just because one protocol is paused doesn’t mean the market stops. This can create complicated scenarios where collateral and liquidations are involved. Second, composability can create a ripple effect. Automatically pausing a protocol, particularly a large lending platform or AMM, could have serious downstream consequences.
Intentional Friction
Introducing friction into the user flow feels counterintuitive, but we do it all the time in the name of security. We impose speed limits even though cars can go faster. We make pilots go through a pre-flight checklist before take off. Your browser displays a warning when you visit a potentially malicious website, asking you to confirm you want to proceed. All of these are examples of intentional friction, and they work!
What if protocols required transactions over a certain dollar amount to be simulated before being included in a block? If the impact is benign, it proceeds. If the impact is malicious, it’s blocked. This may only impact 1% of users, and might add 30 seconds to the transaction flow. From my perspective…worth it to prevent large exploits.
Incorporating Reputation
One of the more promising, and least explored alternatives is incorporating address/contract reputation in the transaction flow. We don’t have a widely used on-chain reputation system for end-user addresses or smart contracts, but when we do, that reputation can be considered when determining collateral ratios, transaction thresholds and processing times.
If a hacker deploys a brand new smart contract with the intention of exploiting a protocol, it has zero transaction history and zero reputation. Reputation-less contracts that aren’t specifically whitelisted may receive lower thresholds that prevent them from causing too much damage.
All of these ideas, while possible, require a lot more careful thought and testing before being implemented. I’m hopeful that over the next year, we’ll see a few protocols experiment with some of these ideas.
Thanks for reading,
Andy
—
Not a subscriber? Sign up below to receive a new issue of 30,000 Feet on Sundays.