This article comes from a16zThis article comes from
, the original author: Joseph Bonneau, compiled by Odaily translator Katie Koo.
We need a more nuanced and thorough approach to measuring and comparing performance—breaking down performance into components and comparing tradeoffs across multiple data axes. This paper defines what blockchain performance means and outlines its challenges, and provides guidelines and key principles to keep in mind when evaluating blockchain performance.
secondary title
Scalability vs. Performance
First, scalability and performance have standard computer science meanings that are often misapplied in blockchains. Performance measures what the system is currently capable of achieving. As we will discuss below, performance metrics may include transactions per second or median transaction confirmation time. Scalability, on the other hand, measures a system's ability to increase performance by adding resources.
The distinction is important: if defined correctly, many ways to improve performance do not improve scalability at all. A simple example is using a more efficient digital signature scheme, such as BLS signatures, which are about half the size of Schnorr or ECDSA signatures. If Bitcoin switched from ECDSA to BLS, the number of transactions per block could increase by 20-30%, improving performance overnight. But we can only do this once - there is no more space-efficient signature scheme to switch (BLS signatures can also be aggregated to save more space, but that's another one-time trick).
Some other one-shot tricks (such as SegWit) can be used in blockchain as well, but you need a scalable architecture for continuous performance improvement, where adding more resources improves performance over time. This is also the conventional thinking for many other computer systems, such as building a web server. With a few common tricks, you can build a server that runs fast. But ultimately a multi-server architecture is needed to keep up with growing demand by continually adding additional servers.
Scalability inherently requires exploiting parallelism. In the blockchain space, L1 scaling seems to require a fork or something similar to a fork. The basic concept of forking is to split the state into chunks so that different validators can process independently, which matches the definition of scalability very well. There are more options at L2 that allow for the addition of parallel processing, including off-chain channels, rollup servers, and sidechains.
secondary title
Latency vs. Throughput
But measuring and comparing latency and throughput is complicated. Also, individual users don't really care about throughput (it's a system-wide metric). What they really care about is latency and transaction fees. More specifically, their transactions get confirmed as quickly and cheaply as possible. While many other computer systems are also evaluated on a cost/performance basis, transaction fees are a new performance axis for blockchain systems that don't really exist in traditional computer systems.
secondary title
Challenges of Measuring Latency
Latency seems simple at first: how long does it take for a transaction to be confirmed? But there are a few different ways to answer this question.
First, we can measure the delay between different time points and get different results. For example, do we start measuring latency when the user hits the "submit" button locally, or when the transaction hits the mempool? Do we stop counting time when transactions are in a proposed block, or when a block is confirmed by one or six subsequent blocks?
The most common approach is to measure the time from when a user first broadcasts a transaction to when it is reasonably "confirmed" from the perspective of a validator. Of course, different merchants may adopt different acceptance criteria, and even a single merchant may adopt different criteria depending on the transaction amount.
The validator-centric approach ignores some important things in practice. First, it ignores latency on the peer-to-peer network (how long does it take from a client to broadcast a transaction until a majority of nodes hear it?) and client-side latency (how long does it take to prepare a transaction on the client's local machine?) Client-side delays are very unlikely and predictable for simple transactions like signing an Ethereum payment, but can be significant for more complex cases like proving that a shielded Zcash transaction is correct.
Even if we normalize the time window over which we are trying to measure latency, the answer depends. To date, no cryptocurrency system offers fixed transaction latency. A basic rule of thumb is:
Latency is a distribution, not a number.
The network research community has long understood this. Special emphasis is placed on the "long tail" of the distribution, since high latency in even 0.1% of transactions (or web server queries) can severely impact end users.
In blockchains, confirmation latency varies for a number of reasons:batch processing:
Most systems batch transactions in some fashion, for example, on most L1 systems they batch transactions into blocks. This will cause variable latency, as some transactions will have to wait until the batch fills up. Others may be lucky enough to join last. These transactions are confirmed instantly and do not experience any additional delay.Variable congestion:
Most systems experience congestion, which means more transactions are issued than the system can process immediately. Levels of congestion can change when transactions are broadcast at unpredictable times, or when the rate of new transactions changes over the course of a day or week, or in response to external events like popular NFT launches.Consensus Layer Differences:
Confirming transactions at L1 typically requires a distributed set of nodes to reach consensus on a block, which can add variable latency regardless of congestion. Proof-of-work systems find blocks at unpredictable times. PoS systems can also add various delays (e.g. if there are not enough nodes online to form a committee in a round, or if a change of opinion is needed in response to a collapse of the leader).
For these reasons, a good guiding principle should be:
Claims about latency should show a distribution of confirmation times, not a single number like a mean or median.
While summary statistics like the mean, median, or percentile provide part of the picture, accurately evaluating a system requires considering the entire distribution. In some applications, if the latency distribution is relatively simple, average latency can provide good insight. But in cryptocurrencies, this almost never happens. Usually, the confirmation time is very long.
A network of payment channels such as the Lightning Network is a good example. This is a classic L2 scaling solution, these networks provide very fast payment confirmations most of the time, but occasionally they require channel resets which can increase latency by orders of magnitude.
Even if we have good statistics on the exact latency distribution, they may vary as the system and system requirements change. Also, it's not always clear how to compare latency profiles between competing systems. For example, suppose a system confirms transaction latencies evenly distributed between 1 and 2 minutes (mean and median 90 seconds). If a competing system correctly confirms 95% of transactions within 1 minute and the other 5% within 11 minutes (mean 90 seconds, median 60 seconds), which system is better? The answer may be that some apps prefer the former and some prefer the latter.
Latency is complicated. The more data reported, the better. Ideally, the complete delay profile should be measured under different congestion conditions. It is also helpful to break down latency into different components (local, network, batch, consensus latency).
secondary title
The Challenge of Measuring Throughput
Throughput also seems simple on the surface: how many transactions per second can a system handle? But there are two main difficulties: what exactly is a "transaction", and are we measuring what a system does today or what it might do?
While "transactions per second" (tps) is the de facto measure of blockchain performance, transactions as a unit of measurement are problematic. For systems that offer general programmability (smart contracts) or even limited functionality, like Bitcoin's multi-way transaction or multi-sig verification options, the fundamental questions are:
Not all deals are created equal.
This is clearly true in Ethereum, where transactions can include arbitrary code and modify state arbitrarily. The concept of gas in Ethereum is used to quantify (and charge for) the overall amount of work a transaction is doing, but this is highly relevant to the EVM execution environment. There is no easy way to compare the total amount of work done by a set of EVM transactions to a set of Solana transactions using a BPF environment. Comparing the two to a set of Bitcoin transactions is equally worrisome.
A blockchain that divides the transaction layer into a consensus layer and an execution layer can make this more explicit. In a (pure) consensus layer, throughput can be measured in bytes added to the chain per unit of time. The execution layer is more complex.
A simpler execution layer, such as a Rollup server that only supports payment transactions, avoids the difficulty of quantitative calculations. Even in this case, payments vary with the quantities of inputs and outputs. Payment channel transactions can vary in the number of "hops" required, which will affect throughput. Rollup server throughput depends on how well a batch of transactions is "networked" into smaller aggregated changesets.
Another challenge in throughput is going beyond empirically measuring today's performance to assess theoretical capacity. This introduces various modeling problems to assess potential capabilities. First, we must identify a realistic transactional workload for the execution layer. Second, practical systems almost never reach theoretical capacity, especially blockchain systems. For robustness reasons, we want node implementations to be heterogeneous and diverse in practice (rather than all clients running a single software implementation). This makes it more difficult to accurately simulate blockchain throughput.
Throughput claims require careful interpretation of the transaction workload and the number of validators (their number, implementation, and network connections). In the absence of any clear standard, historical workloads from popular networks like Ethereum will suffice.
secondary title
Latency and Throughput Tradeoffs
secondary title
transaction fee
transaction fee
Understandably, end users care more about the tradeoff between latency and cost than between latency and throughput. Users have no immediate reason to care about throughput at all, they only care about being able to confirm transactions quickly and with the lowest possible fees (some users care more about fees, some care more about latency). High fees are influenced by a number of factors:
How much market demand is there to trade?
What is the overall throughput achieved by the system?
What is the total revenue that the system provides to validators or miners?
How much of this revenue is based on transaction fees or based on inflationary rewards?
The first two factors are roughly the supply and demand curve leading to a market clearing price (although some claim that miners, like federations, drive fees above this point). All else being equal, higher throughput should result in lower fees, but there are more factors to deal with.
In particular, points 3 and 4 above are fundamental issues in blockchain system design, but we lack good principles for both of them. We have some understanding of the pros and cons of giving miners inflationary rewards relative to transaction fees. However, despite many economic analyzes of blockchain consensus protocols, we still don’t have a widely accepted model of how much revenue needs to go to validators. Most systems today are built on an educated guess as to how much revenue is enough for validators to work honestly without stifling actual usage of the system. In the simplified model, it can be seen that the cost of installing a 51% attack is directly proportional to the reward of the validator.
secondary title
in conclusion
in conclusion
Assessing performance fairly and accurately is hard. The same applies to measuring a car's performance. Just like blockchain, different people will care about different things. When it comes to cars, some users care about top speed or acceleration, others care about fuel consumption, and still others care about towing capacity. All of these are not easy to get exact values. In the United States, for example, the Environmental Protection Agency has detailed guidelines for how gas mileage is evaluated and how it must be displayed to users at dealerships.
