IPFS Event: Solving Filecoin's Dec. 19 Chain Stop

特邀专栏作者

2021-01-18 06:13

This article is about 1875 words, reading the full article takes about 3 minutes

On December 19, 2020, the Filecoin network experienced a chain stop. This article describes the problems encountered during the chain outage, the impact of the outage, the quick response, and what the community will do next↓↓↓

AI Summary

Expand

The following content is from Filecoin's official "Resolving the Dec 19 Chain Halt: Cause, Impact, & Take Aways"

In December 2020, most of the attention paid to the Filecoin market is onCommunity Hosts One-Day Storage Market Summit. But at the same time, Shensuan Mining Pool also paid attention to,On December 19, 2020, the Filecoin network experienced a chain outage, which means that for a period of time new blocks can be created, but the miners cannot reach a consensus on this result (each block calculates a different value). Thanks to the quick response from community members, miners and developers, a fix was released within 4 hours and the network was fully restored within 7 hours.

This article describes the problems encountered during the chain outage, the impact of the outage, the quick response, and what the community will do next↓↓↓

01 reason

The underlying problem is potentially non-deterministic iteration over object mappings in storage miner participant implementations. Actors are implemented in Go. Iteration over Go maps is non-deterministic, and actors always sort the results of iterations before using them (enforcing static analysis). Unfortunately, there was a bug in the comparison function used to sort two such maps, causing the sort to be invalid (see #1335). therefore,Different nodes process map entries in different order, resulting in different results and Gas consumption。

In fact, this code path can only be reached by (a) a miner declaring multiple sector terminations simultaneously, or (b) a miner recovering from failures spanning multiple partitions simultaneously (the other two code paths would reach this point , but is highly unlikely in practice). Prior to this, none of these paths used multiple sectors/partitions for data in mainnet, exposing uncertainty.Chain stop is triggered by simultaneous termination of multiple sectors。

The testing of Filecoin participants covers the code involved, but does not include mechanisms to verify deterministic execution between different test runs. Integration tests for Lotus node implementations do not include termination of multiple sectors.

02 Downtime impact

Thankfully, no data was lost during the outage. While the inability to create new blocks temporarily inhibits transactions on the network, all data at the storage provider is safe and readily available once the network is back up and running. Furthermore, it is worth noting that,The specification of the Filecoin protocol provides for data retrieval even in the event of a chain outage. Thus, while on-chain transactions were not possible for the duration of the event, the core functionality of the Filecoin network remained intact. Additionally, the implementation of fixes ensures that mining operations themselves are not penalized for downtime; instead, consensus cuts are temporarily put on hold in order to prioritize and encourage network recovery.

03 Quick Response

The speed at which potential issues are first discovered, identified, fixed, and deployed is also notable:

Automatic monitoring triggered the alarm within 15 minutes of the incident.
Within 30 minutes, miners and implementation developers came together to respond.
Within 4 hours, the developer identified and released a fix for the issue.
Within 7 hours, enough nodes had adopted the fix to pass the majority consensus power threshold and put the network on the path to recovery.

That's an unbelievably fast response time for a young decentralized web. Even established blockchains experience chain breaks and forks from time to time,Filecoin's official time to resolve this incident is comparable to blockchains that have been running for years, the community should be proud of the speed with which this incident was handled.

This recovery can only be achieved through the combined efforts of multiple groups around the world. All parties in the entire Filecoin ecosystem work together to achieve this goal: miners find and report the problem, and bring it to the attention of developers; the engineering team coordinates development and releases a peer-reviewed patch, and communicates through community channels Status of the fix; network actors around the globe are working hard to apply patches and get the network back up and running as quickly as possible. While an event of this urgency need not be repeated, it was an impressive "opportunity" to demonstrate engagement and attention within the Filecoin ecosystem.

04 What to do next

Building a blockchain is like building a software rocket, they are very complex technologies and it is difficult to get everything right on the first try. Just like a real rocket, things can go wrong in unexpected ways, and when this happens,It is important to have the proper infrastructure, to resolve the issue as quickly as possible, minimize its impact, and reduce the likelihood of recurrence.

To achieve this goal, multiple teams have started writing and executing postmortems, identifying additional improvements in test coverage between actors/lotus, and alerting and problem escalation across network infrastructure/communications to avoid such incidents in the future Happen again.

Thanks to the patience, hard work, and commitment of the Filecoin community, the flaws of this new technology are constantly being ironed out. With all issues identified and resolved, the Filecoin network will further mature into a stable, reliable, and flight-proven platform.

Shensuan Mining Pool pays tribute to the official and the community! Although there have been problems such as high gas fees, bugs, and chain stoppages after the Filecoin mainnet was launched, the rapid response of Filecoin officials and the community has made Shensuan Mining Pool feel the official positive attitude towards the continuous and steady advancement of the Filecoin ecology. , I believe that the Filecoin network will develop more and more prosperously with the joint efforts of the official and many participants.Shensuan mining pool will also keep up with the official pace, make every effort to improve technology, and continuously create value for customers！

Filecoin

Welcome to Join Odaily Official Community