In the past, the CertiK team discovered a series of denial-of-service vulnerabilities in the Sui blockchain. Among these vulnerabilities, a new and highly impactful vulnerability stands out. This vulnerability can cause Sui network nodes to be unable to process new transactions, effectively shutting down the entire network.
Last Monday, CertiK received a $500,000 bounty from Sui for discovering this major security vulnerability. The incident was reported by CoinDesk, a leading media outlet in the United States, and subsequently covered by major media outlets.
The security vulnerability has been aptly named "hamster wheel": its unique attack method is different from known attacks, as attackers only need to submit a payload of approximately 100 bytes to trigger an infinite loop in Sui validation nodes, preventing them from responding to new transactions.
In addition, the damage caused by the attack persists even after network reboot and can automatically propagate in the Sui network, rendering all nodes unable to process new transactions like hamsters endlessly running on a wheel. Therefore, we classify this unique attack as a "hamster wheel" attack.
After discovering this vulnerability, CertiK reported it to Sui through Sui's bounty program. Sui promptly acknowledged the severity of the vulnerability and took appropriate measures to address the issue before the mainnet launch. Apart from fixing this specific vulnerability, Sui also implemented preventive mitigation measures to minimize potential damage caused by this vulnerability.
In appreciation of CertiK's responsible disclosure, Sui awarded a $500,000 bonus to the CertiK team.
In the following text, the details of this critical vulnerability will be disclosed from a technical perspective, elucidating the root cause and potential impact of this vulnerability.
The key role of validators in Sui
Blockchain platforms such as Sui and Aptos, which are based on the Move language, rely on static verification techniques to prevent malicious payload attacks. Through static verification, Sui examines the validity of user-submitted payloads before contract deployment or upgrade. Validators provide a set of checkers to ensure the correctness of structure and semantics. Contracts are only executed in the Move virtual machine when they pass the validation checks.
Malicious payload threats on the Move chain
Sui introduces a new storage model and interface on top of the original Move virtual machine, making it a customized version. To support the new storage primitives, Sui incorporates additional and customized verification methods for untrusted payloads, such as object safety and global storage access. These customized checks align with the unique features of Sui, so we refer to them as Sui validators.
Sui's inspection order for payloads
As shown in the image above, most of the checks in the validator are aimed at conducting structural-level security verification on CompiledModules (representing user-provided contract payloads). For example, the "Duplicate Checker" ensures that there are no duplicate entries in the runtime payload, and the "Restriction Checker" ensures that the length of each field in the runtime payload is within the allowed item limit.
In addition to the structural-level checks, the validator's static checks still require more complex analysis methods to ensure the robustness of untrusted payloads at the semantic level.
Understanding Move's abstract interpreter: linear and iterative analysis
The abstract interpreter provided by Move is a framework designed specifically for performing complex security analysis on bytecode through abstract interpretation. This mechanism allows for a more fine-grained and accurate validation process, where each validator is allowed to define their unique abstract state for analysis.
Upon runtime, the abstract interpreter builds a Control-Flow Graph (CFG) from the compiled module. Each basic block in these CFGs maintains a set of states, namely "pre-state" and "post-state". The "pre-state" provides a program state snapshot before the execution of a basic block, while the "post-state" provides a program state description after the execution of a basic block.
When the abstract interpreter encounters no jumps (or loops) in the control-flow graph, it follows a simple linear execution principle: each basic block is analyzed sequentially, and the pre-state and post-state for each instruction in the block are computed based on their semantics. The result is an accurate snapshot of the program's state at the basic block level during execution, aiding in the validation of program security properties.
Workflow of the Move abstract interpreter
However, when there is a loop in the control flow, this process becomes more complicated. The presence of a loop means that there is a back edge in the control flow graph, with the source of the back edge corresponding to the post-order state of the current basic block, and the target basic block (the loop header) being the pre-order state of a basic block that has already been analyzed. Therefore, the abstract interpreter needs to carefully merge the states of the two basic blocks related to the back edge.
If the merged state is found to be different from the existing pre-order state of the loop header basic block, the abstract interpreter will update the state of the loop header basic block and restart the analysis from this basic block. This iterative analysis process continues until the loop's pre-state stabilizes. In other words, this process repeats until the pre-order state of the loop header basic block no longer changes between iterations. Reaching a fixed point indicates that the loop analysis has been completed.
Sui IDLeak Validator: Custom Abstract Interpretation
In contrast to the original Move design, Sui's blockchain platform introduces a unique, target-centric global storage model. One notable feature of this model is that any data structure with a key attribute (used for on-chain storage indexing) must have an ID type as the first field of that structure. The ID field is immutable and cannot be transferred to another target because each object must have a globally unique ID. To ensure these properties, Sui establishes a set of custom analysis logic on top of the abstract interpreter.
The IDLeak Validator, also known as id_leak_verifier, works in tandem with the abstract interpreter for analysis. It has its own unique AbstractDomain, known as AbstractState. Each AbstractState is composed of multiple AbstractValues corresponding to local variables. The state of each local variable is monitored through AbstractValue to track whether an ID variable is brand new.
In the process of struct packing, the IDLeak validator only allows packing a new ID into a struct. Through abstract interpretation analysis, the IDLeak validator can track the local data flow state to ensure that no existing ID is transferred to other struct objects.
Inconsistent State Maintenance Problem in Sui IDLeak Validator
The IDLeak validator integrates with the AbstractState::join function and the Move abstract interpreter to implement. This function plays an indispensable role in state management, particularly in merging and updating state values.
Examine these functions in detail to understand their operations:
In AbstractState::join, this function takes another AbstractState as input and attempts to merge its local state with the current object's local state. For each local variable in the input state, it compares the value of that variable with its current value in the local state (defaulting to AbstractValue::Other if not found). If these two values are not equal, it sets a "changed" flag as an indication of whether the final state merging result has changed, and updates the local variable value in the local state by calling AbstractValue::join.
In AbstractValue::join, this function compares its value with another AbstractValue. If they are equal, it returns the value passed in. If not, it returns AbstractValue::Other.
However, this state maintenance logic contains a hidden inconsistency issue. Although AbstractState::join returns a result indicating that the merged state has changed (JoinResult::Changed) based on the differences between the new and old values, the merged and updated state values can still remain unchanged.
This inconsistency is caused by the order of operations: the judgment of changing state in AbstractState::join occurs before the state update (AbstractValue::join), and this judgment does not reflect the true result of the state update.
In addition, in AbstractValue::join, the value AbstractValue::Other plays a decisive role in the merged result. For example, if the old value is AbstractValue::Other and the new value is AbstractValue::Fresh, the updated state value is still AbstractValue::Other, even though the new and old values are different, and the state itself has not changed after the update.
Example: Inconsistency in state merging
This introduces an inconsistency: the result of merging block states is determined to be "changed," but the merged state value itself has not changed. In the process of abstract interpretation analysis, this inconsistency problem can have serious consequences. Let's review the behavior of the abstract interpreter when encountering a loop in the control flow graph (CFG):
When encountering a loop, the abstract interpreter adopts an iterative analysis approach to merge the states of the target back jump block and the current block. If the merged state changes, the abstract interpreter will start reanalysis from the jump target.
However, if the merge operation of the abstract interpretation analysis wrongly marks the merged state result as "changed," while the value of the internal variables of the state has not changed, it will result in endless reanalysis and create an infinite loop.
Further leveraging inconsistencies to trigger an infinite loop in Sui IDLeak validator
Exploiting this inconsistency, attackers can construct a malicious control flow graph to induce the IDLeak validator into an infinite loop. This carefully crafted control flow graph consists of three blocks: BB1, BB2, and BB3. It is worth noting that we intentionally introduce a back jump edge from BB3 to BB2 to construct a loop.
Malicious CFG+ states, can lead to infinite loop inside IDLeak verifier
This process starts from BB 2, where a specific local variable AbstractValue is set to ::Other. After executing BB 2, the flow transitions to BB 3, where the same variable is set to ::Fresh. At the end of BB 3, there is a back edge that jumps to BB 2.
In the process of abstract interpretation analysis of this example, the mentioned inconsistency plays a critical role. When the back edge is processed, the abstract interpreter tries to join the post-state of BB 3 (variable is "::Fresh") with the pre-state of BB 2 (variable is "::Other"). The AbstractState::join function detects this difference between the new and old values and sets the "change" flag, indicating the need for re-analysis of BB 2.
However, the dominant behavior of "::Other" in AbstractValue::join means that the actual value of the BB 2 state variable remains "::Other" even after the merge of AbstractValues, and the result of the state merge remains unchanged.
Therefore, once this looping process starts, as the verifier continues to re-analyze BB 2 and all its successor basic block nodes (BB 3 in this example), it continues indefinitely. The infinite loop consumes all available CPU cycles, preventing it from processing new transactions, and this situation persists even after the verifier restarts.
By exploiting this vulnerability, validating nodes can loop indefinitely like hamsters on a wheel, unable to process new transactions. Hence, we refer to this unique type of attack as "Hamster Wheel" attack.
The "Hamster Wheel" attack can effectively cause the Sui verifier to stall, leading to the paralysis of the entire Sui network.
After understanding the causes and triggering process of the vulnerability, we simulated a specific example using the Move bytecode, and successfully triggered the vulnerability in a simulated real environment:
This example demonstrates how to trigger the vulnerability in a real environment through carefully crafted bytecode. Specifically, an attacker can trigger an infinite loop in the IDLeak validator, using a payload of only about 100 bytes to consume all CPU cycles of the Sui node, effectively preventing new transaction processing and causing denial of service in the Sui network.
The persistent harm of "hamster wheel" attack in the Sui network
The vulnerability bounty program of Sui has strict criteria for vulnerability severity assessment, mainly based on the level of harm to the entire network. A vulnerability that meets the "critical" rating must shut down the entire network and effectively hinder new transaction confirmations, requiring a hard fork to fix the problem; if a vulnerability can only cause partial network nodes to deny service, it will be rated as "medium" or "high" severity.
The "hamster wheel" vulnerability discovered by CertiK Skyfall team can shut down the entire Sui network and requires an official release of a new version for upgrade and fix. Based on the severity of this vulnerability, Sui was ultimately rated as "critical". In order to further understand the severity and impact of the "hamster wheel" attack, it is necessary to understand the complex architecture of the Sui backend system, especially the entire process of on-chain transaction submission or upgrade.
Interaction overview of submitting transactions in Sui
Originally, user transactions were submitted via frontend RPC and passed to backend services after basic validation. The Sui backend service is responsible for further validating the incoming transaction payload. Once the user's signature has been successfully verified, the transaction is converted into a transaction certificate (containing transaction information and Sui's signature).
These transaction certificates are a fundamental component of the Sui network and can be propagated among various validation nodes in the network. For contract creation/upgrade transactions, before they can be recorded on the chain, validation nodes will invoke the Sui validator to check and validate the contract structure/semantics of these certificates. It is during this critical validation stage that the "infinite loop" vulnerability can be triggered and exploited.
When this vulnerability is triggered, it causes the validation process to be indefinitely interrupted, effectively obstructing the system's ability to process new transactions and resulting in a complete network shutdown. To make matters worse, this situation persists even after node restart, which means that traditional mitigation measures are far from sufficient. Once this vulnerability is triggered, it results in a situation of "persistent disruption" and leaves a lasting impact on the entire Sui network.
Sui's Solution
Upon receiving feedback from CertiK, Sui promptly acknowledged this vulnerability and released a patch to address this critical flaw. This fix ensures the consistency between state changes and the resulting flags, eliminating the critical impact caused by the "hamster wheel" attack.
To eliminate the aforementioned inconsistency, Sui's fix includes a minor but crucial adjustment to the function AbstractState::join. This patch removes the logic that determines the state merge result before executing AbstractValue::join, and instead first executes AbstractValue::join to perform state merging. The flag indicating whether the merge has occurred is then determined by comparing the final updated result with the original state value (old_value).
This way, the result of state merging will remain consistent with the actual updated result, preventing any infinite loops during the analysis process.
In addition to fixing this specific vulnerability, Sui has also deployed mitigation measures to reduce the impact of future validator vulnerabilities. According to Sui's response in the bug report, these measures involve a feature called Denylist.
However, validators have a node configuration file that allows them to temporarily reject certain types of transactions. This configuration can be used to temporarily disable the processing of releases and software package upgrades. Since this bug occurred when running the Sui validator before signing release or software package upgrade transactions, and the rejection list will stop the validator from running and discard malicious transactions, the temporary rejection of these transaction types is a 100% effective mitigation measure (although it will temporarily interrupt the service of those attempting to publish or upgrade code).
By the way, we have had this TX rejection list configuration file for a while, but we have also added a similar mechanism for certificates as a follow-up mitigation measure for the "validator infinite loop" vulnerability you previously reported. With this mechanism, we will have greater flexibility in dealing with such attacks: we will use the certificate rejection list configuration to make the validator forget bad certificates (breaking the infinite loop), and use the TX rejection list configuration to prevent releases/upgrades, thereby preventing the creation of new malicious attack transactions. Thank you for making us think about this issue!
Validators have a limited number of "ticks" (different from gas) for bytecode validation before signing transactions. If all bytecode included in a transaction cannot be validated within this number of ticks, the validator will reject signing the transaction to prevent its execution on the network. Previously, this metering only applied to a set of selected complex validators. To address this issue, we have extended metering to each validator to ensure that the work performed by the validator during each tick of validation is constrained. We also fixed a potential infinite loop error in ID leakage validators.
—From Sui developers' explanation of bug fixes
In conclusion, the Denylist allows validators to temporarily mitigate vulnerabilities in their validators by disabling the release or upgrade process and effectively prevent potential damage from certain malicious transactions. When the mitigation measures of Denylist come into effect, nodes sacrifice their own release/update contract functionality to ensure they can continue working.
Summary
In this article, we share the technical details of the "Hamster Wheel" attack discovered by the CertiK Skyfall team, explaining how this new attack leverages critical vulnerabilities to completely shut down the Sui network. Additionally, we have closely examined Sui's timely response to fix this critical issue and shared methods for vulnerability patching and subsequent mitigation of similar vulnerabilities.