In addition to Solidity, what other EVM languages are worth paying attention to?

特邀专栏作者

2023-03-18 09:30

This article is about 7632 words, reading the full article takes about 11 minutes

Study the latest technology of EVM DSL design, how to design an excellent language?

Original compilation: 0x11, Foresight News

The Ethereum Virtual Machine (EVM) is a 256-bit, stack-based, globally accessible Turing machine. Because the architecture is significantly different from other virtual machines and physical machines, EVM requires a domain-specific language DSL (Note: a domain-specific language refers to a computer language that focuses on a certain application domain).

In this article, we examine the state-of-the-art in EVM DSL design, introducing six languages Solidity, Vyper, Fe, Huff, Yul, and ETK.

language version

Solidity: 0.8.19
Vyper: 0.3.7
Fe: 0.21.0
Huff: 0.3.1
ETK: 0.2.1
Yul: 0.8.19

Reading this article requires you to have a basic understanding of EVM, stack and programming.

Overview of the Ethereum Virtual Machine

The EVM is a 256-bit stack-based Turing machine. However, before diving into its compiler, some functional features should be introduced.

Since the EVM is "Turing complete", it suffers from the "halt problem". In short, before the program executes, there is no way to determine whether it will terminate in the future. The way the EVM solves this problem is through the "Gas" unit of computation, which is generally proportional to the physical resources required to execute an instruction. The Gas amount of each transaction is limited, and the initiator of the transaction must pay ETH proportional to the Gas consumed by the transaction. One of the implications of this strategy is that if there are two functionally identical smart contracts, the one that consumes less gas will be more adopted. This results in protocols competing for extreme gas efficiency, with engineers striving to minimize gas consumption for specific tasks.

Additionally, when a contract is invoked, it creates an execution context. In this context, the contract has a stack for operations and processing, a linear memory instance for reading and writing, a local persistent storage for contract reading and writing, and the data "calldata" attached to the call can be read but not be recorded.

An important note about memory is that while there is no definite "upper limit" to its size, it is still finite. The gas cost of expanding memory is dynamic: once the threshold is reached, the cost of expanding memory will increase quadratically, that is, the gas cost is proportional to the square of the additional memory allocation.

Contracts can also use a few different instructions to call other contracts. The "call" instruction sends data and optionally ETH to the target contract, then creates its own execution context until execution of the target contract stops. The "staticcall" directive is the same as "call", but adds a check that asserts that no part of the global state has been updated until the static call completes. Finally, the "delegatecall" directive behaves like "call", except that it retains some environment information from the previous context. This is typically used for external libraries and proxy contracts.

Why Language Design Matters

Domain-specific languages (DSLs) are necessary when interacting with atypical architectures. While compiler toolchains such as LLVM exist, relying on them to handle smart contracts is less than ideal where program correctness and computational efficiency are critical.

Procedural correctness is important because smart contracts are immutable by default, and given the properties of blockchain virtual machines (VMs), smart contracts are a popular choice for financial applications. While an upgradeable solution to the EVM exists, it is a patch at best and an arbitrary code execution vulnerability at worst.

Computational efficiency is also critical, as minimizing computation has economic advantages, but not at the expense of security.

In short, EVM DSL must balance program correctness and Gas efficiency, and achieve one of them by making different trade-offs without sacrificing too much flexibility.

language overview

For each language, we describe their salient features and design choices, and include a simple counting function smart contract. Word popularity is determined based on Total Value Locked (TVL) data on Defi Llama.

Solidity

Solidity is a high-level language with a syntax similar to C, Java, and Javascript. It's the most popular language by TVL, with ten times the TVL of the next best language. For code reuse, it uses an object-oriented pattern, where smart contracts are treated as class objects, utilizing multiple inheritance. The compiler is written in C++ with plans to migrate to Rust in the future.

Mutable contract fields are stored in persistent storage unless their values are known at compile time (constant) or deployment time (immutable). The methods declared in the contract can be declared as pure, view, payable, or non-payable by default but the state can be modified. Pure methods do not read from the execution environment, nor can they read or write to persistent storage; that is, pure methods will always return the same output given the same inputs, and they do not produce side effects. View methods can read data from persistent storage or the execution environment, but they cannot write to persistent storage, nor can they create side effects such as appending a transaction log. The payable method can read and write persistent storage, read data from the execution environment, produce side effects, and can receive ETH attached to the call. The non-payable method is the same as the payable method, but has a runtime check to assert that there is no ETH attached to the current execution context.

Note: attaching ETH to the transaction is separate from paying the gas fee. The attached ETH is received by the contract, which can be accepted or rejected by restoring the context.

When declared within the scope of a contract, methods can specify one of four visibility modifiers: private, internal, public, or external. Private methods can be accessed internally via the "jump" instruction within the current contract. Any inherited contracts cannot directly access private methods. Internal methods can also be accessed internally via the "jump" instruction, but inherited contracts can use internal methods directly. Public methods can be accessed by external contracts via the "call" instruction, which creates a new execution context, and internally via jumps when calling the method directly. Public methods can also be accessed from within the same contract in a new execution context by prefixing the method call with "this.". The external method can only be accessed through the "call" instruction, whether it is from a different contract or within the same contract, you need to add "this." before the method call.

Note: The "jump" instruction manipulates the program counter, and the "call" instruction creates a new execution context for the duration of the target contract's execution. Where possible, using "jump" instead of "call" is more gas efficient.

Solidity also provides three ways to define libraries. The first is an external library, which is a stateless contract that is deployed on the chain separately, dynamically linked when the contract is called, and accessed through the "delegatecall" command. This is the least common approach because of poor tooling support for external libraries, "delegate calls" are expensive, it has to load extra code from persistent storage, and requires multiple transactions to deploy. Internal libraries are defined in the same way as external libraries, except that each method must be defined as an internal method. At compile time, the internal library is embedded into the final contract, and during the dead code analysis phase, unused methods in the library are removed. The third way is similar to the internal library, but instead of defining data structures and functions inside the library, they are defined at the file level and can be directly imported and used in the final contract. The third method provides better human-computer interaction, you can use custom data structures, apply functions in the global scope, and apply alias operators to some functions to a certain extent.

The compiler provides two optimization passes. The first is an instruction-level optimizer that performs optimization operations on the final bytecode. The second is the recent increase in the use of the Yul language (described in detail later) as an intermediate representation (IR) in the compilation process, and then optimizes the generated Yul code.

To interact with public and external methods in a contract, Solidity specifies an Application Binary Interface (ABI) standard for interacting with its contracts. Currently, the Solidity ABI is considered the de facto standard for EVM DSLs. The Ethereum ERC standard that specifies the external interface is implemented in accordance with Solidity's ABI specification and style guide. Other languages also follow Solidity's ABI specification with few deviations.

Solidity also provides inline Yul blocks, which allow low-level access to the EVM instruction set. Yul blocks contain a subset of Yul functionality, see the Yul section for details. This is often used for gas optimization, to take advantage of features not supported by the high-level syntax, and to customize storage, memory, and calldata.

Due to the popularity of Solidity, the developer tools are very mature and well designed, Foundry is a prominent representative in this regard.

Here is a simple contract written in Solidity:

Vyper

Vyper is a high-level language with a syntax similar to Python. It's pretty much a subset of Python with some minor differences. It is the second most popular EVM DSL. Vyper is optimized for security, readability, auditability, and gas efficiency. It does not employ object-oriented patterns, inline assembly, and does not support code reuse. Its compiler is written in Python.

Variables stored in persistent storage are declared at the file level. If their value is known at compile time, they can be declared as "constant"; if their value is known at deployment time, they can be declared as "immutable"; if they are marked public, the final contract will expose a read-only function for that variable. The values of constants and invariants are accessed internally by their names, but mutable variables in persistent storage can be accessed by prefixing their names with "self.". This is useful for preventing namespace conflicts between stored variables, function parameters, and local variables.

Similar to Solidity, Vyper also uses function attributes to represent the visibility and mutability of functions. Functions marked as "@external" can be accessed from external contracts via the "call" instruction. Functions marked as "@internal" can only be accessed within the same contract and must be prefixed with "self.". A function marked "@pure" cannot read from the execution environment or persistent storage, nor can it write to persistent storage or create any side effects. Functions marked with "@view" can read data from the execution environment or persistent storage, but cannot write to persistent storage or create side effects. Functions marked with "@payable" can read or write to persistent storage, create side effects, and accept or receive ETH. Functions that do not declare this mutability attribute default to non-payable, that is, they behave like payable functions, but cannot receive ETH.

The Vyper compiler also chooses to store local variables in memory rather than on the stack. This makes contracts simpler and more efficient, and solves the "too deep stack" problem common in other high-level languages. However, this also comes with some tradeoffs.

Also, since the memory layout must be known at compile time, the maximum capacity of a dynamic type must also be known at compile time, which is a limitation. Also, allocating large amounts of memory can lead to non-linear gas consumption, as mentioned in the EVM overview section. However, for many use cases, this gas cost is negligible.

Although Vyper does not support inline assembly, it provides more built-in functions to ensure that almost every function in Solidity and Yul can also be implemented in Vyper. Low-level bit operations, external calls, and proxy contract operations can be accessed through built-in functions, and custom storage layouts can be implemented by providing overlay files at compile time.

Vyper does not have a rich suite of development tools, but it has more tightly integrated tools and can also plug into the Solidity development tools. Notable Vyper tools include the Titanaboa interpreter, which has many built-in tools related to the EVM and Vyper for experimentation and development, and Dasy, a Vyper-based Lisp with compile-time code execution.

Here is a simple contract written in Vyper:

Fe

Fe is a high-level Rust-like language that is currently under active development, with most features not yet available. Its compiler is primarily written in Rust, but uses Yul as its intermediate representation (IR), relying on the Yul optimizer written in C++. This is expected to change with the addition of Sonatina, a Rust-native backend. Fe uses modules for code sharing, so instead of using object-oriented patterns, code is reused through a module-based system where variables, types, and functions are declared within modules, which can be imported in a Rust-like manner.

Persistent storage variables are declared at the contract level and are not publicly accessible without a manually defined getter function. Constants can be declared at the file or module level and can be accessed inside the contract. Immutable deploy-time variables are not currently supported.

Methods can be declared at the module level or within a contract, the defaults are pure and private. To make a contract method public, the definition must be preceded by the "pub" keyword, which makes it externally accessible. To read from a persistent storage variable, the first parameter of the method must be "self", prefixing the variable name with "self." gives the method read-only access to the local storage variable. To read and write to persistent storage, the first argument must be "mut self". The "mut" keyword indicates that the contract's storage is mutable during method execution. Accessing environment variables is accomplished by passing a "Context" parameter to the method, usually named "ctx".

Functions and custom types can be declared at the module level. By default, module items are private and cannot be accessed unless the "pub" keyword is added. However, not to be confused with the "pub" keyword at the contract level. A module's public members can only be accessed inside the final contract or other modules.

Fe does not currently support inline assembly, instead instructions are wrapped by compiler intrinsics or special functions that resolve to instructions at compile time.

Fe follows Rust's syntax and type system, supporting type aliases, enums with subtypes, traits, and generics. Support for this is currently limited, but work in progress. Traits can be defined and implemented for different types, but neither generics nor trait constraints are supported. Enums support subtyping and methods can be implemented on them, but they cannot be coded in external functions. Although Fe's type system is still a work in progress, it shows a lot of potential for developers to write safer, compile-time checked code.

Here is a simple contract written in Fe:

Huff

Huff is an assembly language with manual stack control and minimal abstraction of the EVM instruction set. Through the "#include" directive, any included Huff files can be parsed at compile time to achieve code reuse. Originally written by the Aztec team for extremely optimized elliptic curve algorithms, the compiler was later rewritten in TypeScript and then in Rust.

Constants must be defined at compile time, immutables are not currently supported, and persistent storage variables are not explicitly defined in the language. Since named storage variables are a high-level abstraction, writing to persistent storage in Huff is done via opcodes "sstore" for writes and "sload" for reads. Custom storage layouts can be user-defined, or by convention start from zero and increment each variable using the compiler's intrinsic "FREE_STORAGE_POINTER". Making a stored variable externally accessible requires manually defining a code path that can read and return the variable to the caller.

External functions are also abstractions introduced by high-level languages, so there is no concept of external functions in Huff. However, most projects follow to varying degrees the ABI specifications of other high-level languages, most commonly Solidity. A common pattern is to define a "scheduler" that loads the raw call data and uses it to check for a match to the function selector. If it matches, its subsequent code is executed. Since schedulers are user-defined, they may follow different scheduling patterns. Solidity sorts the selectors in its scheduler alphabetically by name, Vyper sorts numerically and performs a binary search at runtime, and most Huff schedulers sort by expected frequency of function usage, rarely using jump tables. Currently, jump tables are not natively supported in the EVM, so an introspection instruction like "codecopy" is required to implement them.

Intrinsic functions are defined using the "#define fn" directive, which can accept template parameters for flexibility and specify the expected stack depth at the beginning and end of the function. Since these functions are internal, they cannot be accessed from the outside, and internal access requires the use of the "jump" instruction.

Other control flow such as conditional statements and loop statements can be defined using jump targets. A jump target is defined by an identifier followed by a colon. Jumping to these targets can be done by pushing an identifier onto the stack and executing a jump instruction. This resolves to a bytecode offset at compile time.

Macros are defined by "#define macro", otherwise they are the same as internal functions. The key difference is that the macro does not generate a "jump" instruction at compile time, but instead copies the body of the macro directly into each call in the file.

This design weighs the relationship between reducing arbitrary jumps and runtime Gas cost, at the cost of increased code size when called many times. The "MAIN" macro is considered the entry point of the contract, and the first instruction in its body will be the first instruction in the runtime bytecode.

Other features built into the compiler include event hash generation for logging, function selectors for dispatch, error selectors for error handling, code size checkers for intrinsic functions and macros, and more.

Note: stack comments like "// [count]" are not required, they are just used to indicate the state of the stack at the end of the execution of the line.

Here is a simple contract written in Huff:

ETK

The EVM Toolkit (ETK) is an assembly language with manual stack management and minimal abstractions. Code can be reused through "%include" and "%import" directives, and the compiler is written in Rust.

One notable difference between Huff and ETK is that Huff adds a slight abstraction to initcode, also known as constructor code, which can be overridden by defining a special "CONSTRUCTOR" macro. In ETK these are not abstracted away, initcode and runtime code must be defined together.

Similar to Huff, ETK reads and writes to persistent storage through the "sload" and "sstore" instructions. However, there are no constant or immutable keywords, but constants can be emulated using one of two types of macros in ETK, the expression macros. Expression macros are not parsed as directives, but instead generate numeric values that can be used in other directives. For example, it might not generate a "push" command exactly, but it might generate a number to include in a "push" command.

As mentioned earlier, foreign functions are high-level language concepts, so exposing code paths externally requires creating a function selector dispatcher.

Intrinsic functions are not explicitly defined like in other languages, instead you can specify user-defined aliases for jump targets and jump to them by their names. This also allows for other control flows such as loops and conditional statements.

ETK supports two kinds of macros. The first are expression macros that can accept any number of arguments and return a numeric value that can be used in other instructions. Expression macros do not generate instructions, but immediate values or constants. However, directive macros accept any number of arguments and generate any number of directives at compile time. Instruction macros in ETK are similar to Huff macros.

Here is a simple contract written in ETK:

Yul

Yul is an assembly language with high-level control flow and a lot of abstraction. It is part of the Solidity toolchain and can optionally be used in the Solidity build pipeline. Yul does not support code reuse as it is intended to be a compilation target rather than a standalone language. Its compiler is written in C++, and there are plans to migrate it to Rust along with the rest of the Solidity pipeline.

In Yul, code is divided into objects, which can contain code, data, and nested objects. Therefore, there are no constants or external functions in Yul. A function selector dispatcher needs to be defined in order to expose code paths to the outside world.

Except for stack and control flow instructions, most instructions are exposed as functions in Yul. Directives can be nested to reduce code size, and can also be assigned to temporary variables and then passed to other directives for use. Conditional branches can use "if" blocks, which are executed if the value is non-zero, but there are no "else" blocks, so handling multiple code paths requires the use of "switch" to handle any number of cases and a "default" fallback option. Loops can be performed using a "for" loop; while its syntax is different from other high-level languages, it provides the same basic functionality. Intrinsic functions can be defined using the "function" keyword, and are similar to function definitions in high-level languages.

Most functionality in Yul is exposed in Solidity using inline assembly blocks. This allows developers to break abstractions, write custom functionality, or use Yul in functionality not available in the high-level syntax. However, using this feature requires a deep understanding of Solidity's behavior in terms of calldata, memory, and storage.

There are also some unique functions. The "datasize", "dataoffset" and "datacopy" functions manipulate Yul objects via their string aliases. The "setimmutable" and "loadimmutable" functions allow immutable parameters to be set and loaded in constructors, although their use is restricted. The "memoryguard" function means that only a given range of memory is allocated, allowing the compiler to use memory outside the guarded range for additional optimizations. Finally, "verbatim" allows the use of directives that the Yul compiler does not know about.

Here is a simple contract written in Yul:

Characteristics of a Good EVM DSL

A good EVM DSL should learn from the strengths and weaknesses of each language listed here, and should also cover the basics in almost all modern languages, such as conditional statements, pattern matching, loops, functions, and so on. Code should be unambiguous, adding minimal implicit abstractions for code aesthetics or readability. In high-stakes, correctness-critical environments, every line of code should be explicitly explainable. Also, a well-defined module system should be at the heart of any great language. It should clearly state which items are defined in which scope, and which are accessible. Every item in a module should be private by default, with only explicitly public items publicly accessible outside.

In a resource-constrained environment like EVM, efficiency matters. Efficiency is often achieved by providing low-cost abstractions such as compile-time code execution via macros, a rich type system to create well-designed reusable libraries, and wrappers for common on-chain interactions. Macros generate code at compile time, which is great for reducing boilerplate code for common operations, and in cases like Huff it can be used to trade off code size versus runtime efficiency. A rich type system allows for more expressive code, more compile-time checking to catch errors before runtime, and when combined with type-checked compiler intrinsics, may eliminate the need for much of the inline assembly . Generics also allow nullable values (such as external code) to be wrapped in "option" types, or error-prone operations (such as external calls) to be wrapped in "result" types. These two types are examples of how library writers can force developers to handle each outcome by defining code paths or transactions that restore failed outcomes. However, keep in mind that these are compile-time abstractions that resolve to simple conditional jumps at runtime. Forcing developers to handle every result at compile time increases initial development time, but has the benefit of far fewer surprises at runtime.

Flexibility is also important to developers, so while the default for complex operations should be the safe and possibly less efficient route, there are times when more efficient code paths or unsupported features need to be used. For this, inline assembly should be open to developers, and without guardrails. Solidity's inline assembly puts some guardrails in place for simplicity and better optimizer delivery, but developers should be granted these rights when they need full control over the execution environment.

in conclusion

Original link

ETH

smart contract

Safety

Welcome to Join Odaily Official Community

Subscription Group

https://t.me/Odaily_News

Chat Group

https://t.me/Odaily_CryptoPunk

Official Account

https://twitter.com/OdailyChina

Chat Group

https://t.me/Odaily_CryptoPunk