What does the "biological clock" of the blockchain look like?
"Time" is an eternal topic in the changing times. Discussions around time have been going on in blockchains and other distributed systems. Time connects processes and nodes, and we also use the "granularity" of time to measure the decentralized network that connects blocks into chains.
The difficulty about time in a distributed system is that it is difficult for the "physical clocks" of different participants to agree completely. Distributed system master Lamport provides a decentralized method, transforming the problem into the relationship between time and order, and put forward the concept of logical clock, just like introducing a "biological clock" for distributed systems including blockchain.
stakefish compiles an article by Vac analyst and ENS developer Dean Eigenmann, introducing Lamport's discussion on time, clocks and order, and reminding everyone to understand blockchain and distributed system time from another perspective.
Which topic is more appropriate to start a series of articles about distributed systems? Ethereum's privacy transaction protocol AZTEC? The Paxos algorithm known to be difficult to master? These topics are left to be written later. Today, I chose a basic topic as the beginning: the topic of time in distributed systems.
This article interprets the well-known paper "Time, clocks, and the ordering of events in a distributed system" by Turing Award winner and computer guru Leslie Lamport. It's fun to reread this post a long time later and distill the key concepts.
Friends who are not familiar with Leslie Lamport can get a general idea. He is famous for creating LaTeX, TLA+, Paxos, and also discusses the Byzantine general problem. Of course there is also the Lamport clock (the first logical clock), and in this article we will also introduce its basic concepts.
Let's first look at the definition of a distributed system. The definition given by Lamport is this:
"A system is called a distributed system if the delay in the delivery of information within the system is not negligible compared with the time between events in a single process."
I like this definition because it focuses on the delay between sending and receiving a message.
With the definition clearly defined, we begin the formal introduction.
secondary title
Sorting events locally couldn't be easier. You just assign each event a timestamp when it happened. We can obtain the overall order of all events, which means that all events can be arranged in a specific order.
But this problem is much more difficult in the context of distributed systems. why?
All because of a very simple nature of a distributed system: after a message is sent between nodes, it may arrive 0, 1 or more times at any point in the future. In this case, the various nodes in the distributed system cannot agree on the time. for example:
A node can send a message to another node to mark the current time as 12:00:00, but the receiver does not know how long the message took to deliver, so there is no way to confirm whether it is still 12:00:00 when it arrives. If so, there is no way to determine whether the information is synchronized even if the messages are sent back and forth for a whole day between the nodes. If we cannot agree on timing, we cannot agree on the sequence of events.
So how to solve this problem?
In a distributed system, multiple nodes communicate by sending messages to each other. When the node receives the information, it first confirms the information, and then executes its next event. Such an order originally shows a "causal relationship": information must be sent before it can be received.
Annotation: This causal relationship is a sequential relationship, not a logical relationship between cause and effect.
Then the order can be drawn based on the causal relationship: the message must be sent before it is received by him. Just looking at the two events A and B, we can describe the order by giving the relationship of "happens before".
Now, this relationship can be identified without a systematic notion of physical time: event A must have happened before event B if A had a causal effect on B. Causality allows us to determine the order of related events in a system, a partial order.
Partial ordering also has a limitation: without being able to determine dependencies, we may not know the exact order of every event in the system. Because there may be many events concurrently throughout the system, not all nodes are aware of the occurrence of these events.
secondary title
But now that we have a partial order, we can add a clock to the system to get the total order of all events in the system.
Just now we have known that it is not feasible to use physical clocks in distributed systems, then we need to use logical clocks. A logical clock is essentially a function capable of assigning a number to an event. This number represents when the event happened (from now on we will refer to this number as time), and has no relation to physical time.
text
∀a,b a → b ⟹ C(a) < C(b)
What does the above expression mean?
The arrow "→" means "happened before (happened before)", and C represents the clock function, which can be simply understood as time. So to express the meaning is: for each event a, b, if a occurs before b, then the time of a is less than the time of b.
But the reverse deduction is not true, just because the time of one event is smaller than the time of another event, it cannot be said that this event happened before, they can be concurrent.
In the picture above, we can see that on node α, one event occurred at time 1 and time 2 respectively; node β has one event occurred at its own time 1. Events at time 1 and time 2 at node α are concurrent with events at time 1 at node β, and have no causal connection.
If a and b are two events on a single node, and a happens before b, then a's time should be less than b's time.
If a is a node sending a message and b is another node receiving a message, then a's time should still be less than b's time.
Nodes need to let the clock tick between events. If not, the clock must be advanced to a later time than contained in any messages received from other nodes. b can happen after the clock is adjusted fast.
Example
Example
Finally we set up a state machine to see the usage of the logical clock. For example, we have a distributed system in which multiple nodes want to access shared resources, and only one node can be accessed at a time. The state machine needs to meet the following conditions:
Condition 1: A node that can access a resource must first release the resource before other nodes can access it.
Condition 2: Requests for resources must be granted access in the order in which the requests are made.
Condition 3: If every node granted access eventually releases the resource, then every request will eventually be granted.
Why not introduce an intermediate coordinator? Because in this case, if an earlier request occurs but arrives later, condition 2 cannot be satisfied; another reason is that we want to adopt a decentralized solution.
So we still have to build conditions to meet this logical clock. How to meet the conditions?
Lamport provides us with a decentralized solution. First, we want all nodes to store a request queue. Second, some simple assumptions must be satisfied:
Assumption 1: All messages are received in the order they were sent.
Assumption 2: All messages are eventually received.
Assumption 3: Every node can send messages directly to all other nodes in the system.
If there are more complex algorithms and protocols, the above assumptions can be ignored.
Now we can define an algorithm that satisfies these 3 conditions and show the functionality of the clock in practice:
1. If a node wants to request a resource, it creates a request with the current time, adds it to its queue, and sends it to every other node.
2. All other nodes put this request in their queue and send back a response message.
3. The node releasing the resource sends a release message with the current time and deletes the original request from its queue.
4. When the node receives the release message, it will clear the related request from its own queue.
5. When a node has its own request in its queue ahead of any other request (in total chronological order), he can freely access that resource, and he receives messages from all other nodes after that time.
The above algorithm is a decentralized algorithm that is completely independently executed by each node. It uses the clock to sort the requests in a general order, so as to achieve access to resources and coordination between nodes.
Well, we have roughly learned how to use these logical clocks to sort events in a distributed system through the article, and analyzed the practical application of determining the order when a distributed system accesses resources. Feedback is welcome, and I will continue to update more articles about distributed systems.
Original title: Time, clocks, and order
By Dean Eigenmann
text


