IPFS Event: Solving Filecoin's Dec. 19 Chain Stop
The following content is from Filecoin's official "Resolving the Dec 19 Chain Halt: Cause, Impact, & Take Aways"
In December 2020, most of the attention paid to the Filecoin market is onCommunity Hosts One-Day Storage Market Summit. But at the same time, Shensuan Mining Pool also paid attention to,On December 19, 2020, the Filecoin network experienced a chain outage, which means that for a period of time new blocks can be created, but the miners cannot reach a consensus on this result (each block calculates a different value). Thanks to the quick response from community members, miners and developers, a fix was released within 4 hours and the network was fully restored within 7 hours.

01 reason
The underlying problem is potentially non-deterministic iteration over object mappings in storage miner participant implementations. Actors are implemented in Go. Iteration over Go maps is non-deterministic, and actors always sort the results of iterations before using them (enforcing static analysis). Unfortunately, there was a bug in the comparison function used to sort two such maps, causing the sort to be invalid (see #1335). therefore,Different nodes process map entries in different order, resulting in different results and Gas consumption。
02 Downtime impact
Thankfully, no data was lost during the outage. While the inability to create new blocks temporarily inhibits transactions on the network, all data at the storage provider is safe and readily available once the network is back up and running. Furthermore, it is worth noting that,The specification of the Filecoin protocol provides for data retrieval even in the event of a chain outage. Thus, while on-chain transactions were not possible for the duration of the event, the core functionality of the Filecoin network remained intact. Additionally, the implementation of fixes ensures that mining operations themselves are not penalized for downtime; instead, consensus cuts are temporarily put on hold in order to prioritize and encourage network recovery.
03 Quick Response
The speed at which potential issues are first discovered, identified, fixed, and deployed is also notable:
Automatic monitoring triggered the alarm within 15 minutes of the incident.
Within 30 minutes, miners and implementation developers came together to respond. Within 4 hours, the developer identified and released a fix for the issue. Within 7 hours, enough nodes had adopted the fix to pass the majority consensus power threshold and put the network on the path to recovery.
This recovery can only be achieved through the combined efforts of multiple groups around the world. All parties in the entire Filecoin ecosystem work together to achieve this goal: miners find and report the problem, and bring it to the attention of developers; the engineering team coordinates development and releases a peer-reviewed patch, and communicates through community channels Status of the fix; network actors around the globe are working hard to apply patches and get the network back up and running as quickly as possible. While an event of this urgency need not be repeated, it was an impressive "opportunity" to demonstrate engagement and attention within the Filecoin ecosystem.
04 What to do next


