1
Q: What is your position on Ethereum governance?
Answer: We have no position at the moment. We feel that many questions should be left to the community to answer, such as whether or when to adopt ProgPoW. We are responsible for proposing new algorithms and are happy to answer technical questions about them.
2
Q: Where did ProgPoW come from?
A: IfDefElse is a small team that analyzes and optimizes the PoW algorithm. We have observed that the ETH community has repeatedly requested a new PoW algorithm, in which professional ASIC mining machines have little advantage over conventional hardware facilities. It is heartbreaking to see many algorithms vulnerable to ASIC mining machines. Every time a new ASIC mining machine comes out, the entire ETH community will fall into frustration.
So one day in the spring of 2018, we had the idea of modifying the Ethash algorithm to achieve the expected effect of GPU mining. After initially editing the algorithm, we put it on a GitHub public forum for development and fine-tuning.
3
Q: Who reviewed ProgPoW?
Answer: In the process of collecting feedback on the use of algorithms, we were fortunate to receive feedback emails from Ethereum Foundation engineers, Ethereum core R&D engineers, NVIDIA engineers, and AMD engineers. Both NVIDIA and AMD engineers commented generally positively on the algorithm.
It is worth mentioning that there are two algorithm updates and optimizations based on the evaluations of community members mbevand and Schemeykh.
Q: How did AMD react?
A: AMD's response addressed two major concerns:
If the ProgPoW algorithm is used to replace the Ethash PoW algorithm, is there no way for ASIC mining machine manufacturers to quickly research open source code and manufacture specialized ASIC mining machines?
Will the ProgPoW algorithm make it more difficult for GPU miners to mine Ethereum?
An AMD engineer gave an affirmative answer. In theory, it is possible to build a new ASIC mining machine for ProgPoW, but this requires the manufacturer to have a specialized background in GPU knowledge, especially memory controller technology.
Not only that, but they also expressed concerns about the size of the cache (there is local data sharing and data on the AMD chip).
They mentioned in the email that there is no big difference in performance between AMD and NVIDIA regardless of whether the cache is 8KB or 16KB. But at 32KB and 64KB there may be a major impact on both GPU vendor architectures, and there will be incompatibilities on Polaris and Vega.
Based on their feedback, we set the size of PROGPOW_CACHE_BYTES to 16KB.
5
Q: How did NVIDIA respond?
A: NVIDIA engineers generally agree with our approach. The algorithm, they say, fills the gaps between memory accesses with computation, rather than leaving the GPU sitting idly by like a noble memory controller.
Their main concern was that if they added too much randomness to the algorithm, it would end up being computationally bound rather than memory bound. As a result, ASIC miners built for computationally constrained algorithms may achieve greater efficiency and gains.
Based on their feedback, we fine-tuned PROGPOW_CNT_CACHE and PROGPOW_CNT_MATH to ensure that the algorithm remains memory-bound on most modern GPUs.
Q: If ProgPoW calls the module on the main loop and uses kiss99() to select random instructions, wouldn't ASIC miners designed for this algorithm be more efficient?
A: This is a common misconception when looking at the algorithm for the first time. In fact, the calling of modulus and kiss99() on the main circuit is calculated by the CPU to generate a random program, and then compiled by the CPU. The GPU is responsible for executing optimized code that has figured out what instructions to execute and what mixed states to use.
As Alexey mentioned, ProgPoW generates source code every 50 blocks. For an example of a generated program, see: kernel.cu.
We will also explain further in the standard.
Q: Do miners need to install AMD or NVIDIA software development kits in order to compile the generated source code?
Answer: No need. AMD and NVIDIA drivers include OpenCL, DirectX and Vulkan compilers. For CUDA, the binary kernel files are distributed with a small software development kit.
Q: Does the ProgPoW algorithm have a preference for GPU architecture?
Answer: No, the original intention of the ProgPoW algorithm is to ensure fairness as much as possible. There is no difference in execution between OpenCL and CUDA, and the 16KB cache size runs smoothly on both architectures.
We avoid doing 16-bit or 24-bit operations on only one architecture, whether it's AMD's indexed register file, or NVIDIA's LOP3, all operations are well supported across generations of architectures.
The performance of a GPU using the ProgPoW algorithm in mining workloads will also reflect the average gaming performance of that GPU.
Q: Why is the speed difference between Ethash and ProgPoW more than 2 times slower than expected for a GPU with heavily modified VBIOS?
Answer: ProgPoW reads twice as much memory per hash as Ethash, so the expected hash rate is 1/2. All tuning and sample hashrates we previously reported (see "Results: Hashrate" for details) were done on GPUs running at normal frequency. Extensive modification of the VBIOS to reduce the core frequency will cause the miner to run the algorithm to be computationally limited rather than memory limited.
If the user needs to switch to a new algorithm, the modification and tuning of VBIOS will need to be done again.
Q: Can you explain how Ethash ASIC miners are twice as efficient as GPU miners?
The Ethash algorithm only needs to execute 3 components:
High bandwidth memory (for DAG access)
Keccak f1600 engine (for initial/final hashing)
Tiny compute core (for inner loop FNV and module calls)
FPGA data show that Keccak calculations consume almost negligible power. We estimate that only about 1/2 of the GPU power is spent on memory accesses when executing the Ethash algorithm. However, the power of the Keccak and computing core of the Ethash ASIC mining machine is negligible, and its power is mainly consumed in memory access, so there is still room for twice the improvement in mining efficiency of the GPU.
A quick summary of current Ethash mining hardware:
Except for Titan V, all data is from whattomine.com and asicminervalue.com.
The first generation of Ethash ASIC mining machines, Bitmain's Antminer E3 has no efficiency advantage over GPU mining machines. This is because its DDR3 memory consumes more power than the GDDR memory of GPU miners.
As far as we know, the yet-to-be-released Innosilicon A10 ETHMaster is said to be more efficient. Because Innosilicon uses GDDR6 IP technology on this series of mining machines, this will make it twice as efficient as the current most efficient mining GPU RTX 2070.
Q: How practical is HBM?
A: Our initial algorithm evaluation is a benchmark comparison using the same memory type. HBM has low power consumption, but it is expensive, so it is not practical enough. For example, the NVIDIA Titan V with HBM is only slightly less efficient than the A10 ETHMaster, but at a cost of $3,000, it's clearly not practical.
The price of AMD Vega card with HBM is reasonable, but for some reason its computing power can only reach 175 KH/s/W. We're not sure what limits Vega's efficiency, increasing the access size improved the situation significantly (bandwidth utilization increased from 61% to 75% - see "Results: Hashrate" for details), but the power consumption of the Vega graphics card was still too high. We're expecting significant improvements in efficiency from the just-announced Double Bandwidth AMD Radeon VII graphics card.
We estimate that the power of HBM is about half that of GDDR6. If HBM is used to manufacture expensive Ethash ASIC mining machines, the computing power will exceed 1 MH/s/W, which is about 4 times the efficiency of conventional GPUs on the market.
Q: How efficient can a ProgPoW ASIC be?
A: ProgPoW aims to drastically reduce the efficiency gains of dedicated ASIC miners. The algorithm execution needs to meet the following components:
High bandwidth memory (for DAG access)
Keccak f800 engine (for initial/final hashing)
Large register files (for mixed state)
High-throughput SIMD integer math (for random operations)
High-throughput SIMD cache (for random cache access)
Keccak has a smaller capacity, so its power consumption on the GPU is already negligible. In this way, the advantages of ASIC mining machines in reducing power consumption will no longer exist.
In order to execute a random sequence, a ProgPoW ASIC miner needs to execute something very similar to a computing core on a GPU. All SIMD register accesses, math operations, and cache accesses require a GPU-like runtime.
That's right, the ProgPoW ASIC ISA can be precisely designed to match the ProgPoW algorithm, such as removing floating points, adding explicit merge() and other operations. However, this specialization provides only marginal benefits, not orders of magnitude gains.
Optimistically, we assume that a well-designed ProPoW ASIC ISA can remove 1/4 of the computing core power consumption. Since the GPU cores are much more active when performing ProPoW, we estimate that the memory interface consumes about 1/3 of the GPU power. Then the relative power consumption of the Prop PoW ASIC mining machine using GDDR is:
1/3 (memory) * 1 + 2/3 (compute) * 3/4 = 5/6
1.2x advantage
If HBM is used, the relative power consumption of the ProgPoW ASIC mining machine is:
1/3 (memory) * 1/2 + 2/3 (compute) * 3/4 = 2/3
1.5x advantage
13
Q: Can ProgPoW be run on FPGA?
A: First, there are practical problems with running ProgPoW on FPGAs. Because the random program changes every 12.5 minutes, new bitstreams need to be compiled and loaded frequently. The tools and facilities to accomplish this task are largely non-existent.
Even ignoring this issue, ProgPoW does not map well to FPGAs, which work well for computationally intensive algorithms such as Keccak or Lyra. These algorithms can significantly improve performance and reduce power consumption by packing multiple operations into a single clock cycle and running multiple operations simultaneously.
The ProgPoW algorithm loop has many cache reads interleaved in sequence, which greatly reduces operations that can be packed into a single clock cycle or run in parallel. Under the ProgPoW algorithm, the packaging operation of the FPGA not only reduces the performance of the mining hardware, but also increases the length of the information channel. The increased message channel length is also a problem because of the large mixed state (16 lanes * 32 regs * 4 bytes = 2 kilobytes).
If this large mixed state was replicated in stages along each information channel, a lot of power would be wasted. Of course, we can also store the mixed state in the register file, so that the computing core of the FPGA looks a lot like an ASIC or GPU, but in that case, the computing efficiency of the FPGA will be significantly lower than that of the ASIC.
14
Q: All the above questions and answers seem to be very lengthy, can you give a brief summary?
Answer: of course
Relative Efficiency of Mining Hardware
Original link:
Original link:
https://medium.com/@ifdefelse/progpow-faq-6d2dce8b5c8b
Original Author: IfDefElse Translation & Proofreading: There is a fish
This article was translated and edited by Mining Vision. If you need to reprint, please indicate the source.
