
In December 2018, a report released by the Blockchain Transparency Research Institute (BTI) pointed out that the total trading volume of the top 25 exchanges on CoinMarketCap was 2.5 billion US dollars, while the actual trading volume was only 324 million US dollars, which is higher than that of CoinMarketCap. The data published on the Internet is 87% less, and Coinbase, the leading exchange in the United States, has not even entered the top 25 of CoinMarketCap. The "hidden rules" of the industry's exchange volume have also aroused public doubts about the exchange data.
How to solve the information asymmetry and understand the real asset status of the exchange?
secondary title
Exchange Address Classification
In order to introduce the principle of exchange address mining, we must first understand the classification of exchange addresses. Generally speaking, the addresses of exchanges can be divided into three categories: ordinary deposit addresses, hot wallets and cold wallets, and their circulation methods are roughly shown in the figure.

(1) Ordinary deposit address
This type of address accounts for more than 99.9% of the total addresses of the exchange, and is the deposit address of each user. Users can deposit Bitcoin outside the exchange into the exchange through these addresses. After that, the money in these addresses will go in two directions: exporting (someone withdraws money), or remitting to a hot wallet.
(2) Hot wallet
The hot wallet is a networked wallet belonging to the exchange, and its main function is to be responsible for the flow of funds between the ordinary deposit address and the cold wallet and the user's withdrawal of coins. In other words, ordinary deposit addresses and cold wallets cannot conduct direct capital circulation, and user withdrawal transactions are transferred from hot wallets.
Compared with ordinary coin-charged wallets, the proportion of hot wallets is very small, the approximate number is between 10 and 30, but its number of transactions (the number of transactions of an address refers to the number of all transactions that this address is used as an input address or an output address, The same below) is extremely large, which is an order of magnitude different from the transaction number of ordinary deposit addresses and cold wallets. Hot wallets can be well distinguished from other addresses by this feature.
(3) Cold wallet
secondary title
Transaction Structure Introduction
Bitcoin transactions use the UTXO (Unspent Transaction Output) model. UTXO is the unspent transaction output. Each UTXO belongs to an address. An address can contain multiple UTXOs. Each UTXO is indivisible. During the transaction, the user who initiates the transaction uses his own UTXO as the transaction input, and constructs a new UTXO as the transaction output. The transaction initiator uses the private key to unlock and spend his own UTXO, and uses the public key of other addresses to lock the newly constructed UTXO with this address. Each UTXO is removed from the UTXO set after being used as a transaction input. Except for special coinbase transactions, ordinary transactions contain one or more inputs and one or more outputs.
In order to facilitate understanding, we use an actual transaction as an example to explain:

This is an ordinary transaction. There is an input address 1B3AHCVxKkRern499D5DXQdZ6R3qH6asY6 (hereinafter referred to as 1B) on the left, and two output addresses 19TAUBkne9x3CrPVYDUtwCNuEDsZrY1ddu (hereinafter referred to as 19) and 35hK24tcLEWcgNA4JxpvbkNkoAcDGqQPsP (hereinafter referred to as 35 ).
We assume that the private key at address 1B belongs to a certain user U1, and the private key at address 35 belongs to another user U2. When U1 wants to transfer 0.005 BTC to another user U2, U1 uses a UTXO with an amount of 1 BTC at address 1B as the transaction input, and uses the public key at address 35 to lock a UTXO with an amount of 0.005 BTC to address 35. In this way, user U2 has an additional UTXO with an amount of 0.005 BTC, which means that the balance of user U2 has increased by 0.005 BTC, and the UTXO input as a transaction has been spent and no longer belongs to the UTXO set, which means that the balance of user U1 1 BTC less.
Such a transaction is not complete. The input amount of the transaction is 1BTC, and the output amount of the transaction is only 0.005BTC. So where does the remaining 0.995BTC go? If no transaction fee is required, then 0.995 BTC will be returned to the address belonging to U1 as the change amount. But in fact, the transaction initiator U1 still needs to pay a certain transaction fee, so as shown in the figure, a UTXO with an amount of 0.9949853 BTC is locked in address 19, and address 19 is a change address, so we can know Address 19 also belongs to user U1.
This is a complete transfer and change transaction. The address on the left side of the figure is the input address, and the amount after the address is the amount of a certain UTXO belonging to the input address spent in this transaction; the address on the right side of the figure is the output address. The amount after the address is the newly generated UTXO amount locked in the output address.
secondary title
Principles of Address Mining Technology
In order to understand the principle of exchange address mining, it is also necessary to understand the technical principles of Bitcoin address mining. Bitcoin address mining includes vertical mining, forward mining and backward mining. Vertical mining is to use the mined address as the input address of the transaction to mine other addresses that are also the input; forward mining is to mine the characteristic address on the output side when the mined address is used as the input address of the transaction; backward mining is to use the mined address as the transaction The output address of mining the characteristic address of the input side.
(1) Vertical excavation
Definition: Centering on the mined address, mining other addresses that are the input of the same transaction. According to the characteristics of Bitcoin transactions in Section 2, multiple addresses appearing on the input side in a transaction usually belong to the same subject. Therefore, if the mined address appears on the input side in a certain transaction, other addresses that appear on the input side together with it can be considered as belonging to the same subject. The detailed mathematical principles of vertical mining can be found in reference [1].
For example: txid=25836a89ee24ce0b3ca7c62a525139fa59aebce0ffd222474b484bb73802c76f

The address in the red box is the mined address, and the other addresses in the yellow box are regarded as the same owner because they are the input party in the same transaction as the mined address.
(2) Forward digging
Definition: The address to be mined must appear on the input side, and the number of addresses on the input side must not be two, and the number of addresses on the output side must be two. Among them, if the bitcoin value of an output address has more than 4 decimal places, then the qualified address belongs to the same subject as the input party.
For example: txid=20c0430466a876e84d75a8319cfe9dcf9a36b2f8773c7bbfb14489919bbb29c0

The address in the red box is the address to be mined, it appears on the input side, and the number of addresses on the input side is not 2. The number of addresses on the output side is 2, and the bitcoin value of one output address has more than 4 decimal places, which meets the conditions of forward mining, then the address in the yellow box and the address in the red box belong to the same subject.
Forward mining is easy to understand. In fact, this is an ordinary transfer and change transaction. The output address with more than 4 decimal places is actually a change address (since there is only one input, and the transaction fee must be borne by the input party, the transaction fee is usually The amount is small, so the change amount after deducting the transaction fee usually has more than 4 decimal places). The change address and the input address belong to the same subject, so the characteristic address can be mined.
(3) Backward digging
Definition: The mined address must appear on the output side, and the number of addresses on the input side is one, and the number of addresses on the output side must be two. Among them, the mined address on the output side is the hot wallet address (the mining method of the hot wallet will be given later), and the amount sent to the hot wallet address in this transaction must be greater than 100 BTC, then the qualified transaction The three addresses of belong to the same subject, and the address on the input side may be a cold wallet address.
For example: txid=ade2be579a0c58d38a6a812ce85ed96980313c3aca59d762a1779233bd64ede4

The red box is the address to be mined, it is a hot wallet address, it appears on the output side, and the number of addresses on the input side is 1 (multiple inputs are the same address, because they are different UTXOs of the same address), the output The number of side addresses is 2, and the amount transferred to the address in the red box is greater than 100 BTC. Through backward mining, it can be known that the two addresses in the yellow box and the address in the red box belong to the same subject, and the address on the input side may be a cold wallet address.
secondary title
mining process
With the previous foundation, we can officially start to introduce the address mining process of the Bitcoin exchange. It can be roughly divided into three processes:
Mining all common deposit addresses and hot wallets of the exchange
Filter out hot wallets
Dig out the cold wallet
(1) Mining all common deposit addresses and hot wallets of the exchange
The deposit address of the exchange is easy to obtain, and we use this address as a breakthrough for address mining. Use vertical mining to obtain a sample library, and then use this sample library as a blueprint to continue vertical mining, and the sample library will be further expanded. This process is repeated until the sample pool no longer expands. At this point, it can basically be considered that all common deposit addresses and hot wallets of this exchange are included in this sample library (I’m not sure if the cold wallet is in this library, but it doesn’t matter, cold wallets will have new mining solutions) .
(2) Screen out hot wallets
The sample library contains all common deposit addresses and hot wallets (and possibly cold wallets) of the exchange. Due to the extremely large number of transactions in the hot wallet, the method of finding the maximum area in the sample library is adopted, which can be Filter out hot wallets from the sample library.
We process the addresses and the transaction volume data corresponding to the addresses mined by the Huobi Exchange as follows: take each address number as the horizontal axis, and take the transaction number of the address as the vertical axis to draw the distribution of addresses and transaction volume, as follows As shown in the figure:

It can be clearly seen from the figure that the addresses marked in orange are hot wallets, because their transaction volume is much larger than that of other addresses (the transaction volume of other addresses is very small, and some of them cannot even be seen in the figure).
(3) Dig out the cold wallet
The starting point for cold wallet mining is the hot wallet. First, use backward mining, that is, in a transaction, as the only input, more than 100 BTC has been sent to the hot wallet. Such an input address may be a cold wallet address. Afterwards, according to another feature of the cold wallet: the number of transactions at this address is small (generally less than 1000), and the total total income is relatively large (generally greater than 10000BTC). After filtering again, you can find the cold wallet, or the cold wallet that has been used before (balance < 10 BTC). Finally, according to the forward mining principle and the above constraints, more cold wallets can be mined.
secondary title
Example of address mining
After understanding the mining process, we can deepen our understanding of exchange address mining through a simple example. This example finds a hot wallet and a cold wallet of Huobi through a common deposit address of Huobi. The specific process as follows:
(1) Find a hot wallet through the common deposit address
First, we got a common deposit address of Huobi: 12V9PLbaaewZmwFogen1bighovFZvMW138, after that, we found the transaction 087e0449d86858ba15d4549235240e900c198bd030e2eb26a6418525135dbe4b

According to the principle of vertical mining, the addresses on the left side of this transaction can be considered as the addresses of Huobi Exchange. Among these addresses, we noticed that the number of transactions of the addresses in the yellow box is much higher than that of other addresses, as shown in the figure

From this we can judge that the address 1LAnF8h3qMGx3TSwNUHVneBZUEpwE4gu3D is a hot wallet of Huobi Exchange.
(2) Find a cold wallet through the hot wallet
Continue to dig on the basis of the hot wallet 1LAnF8h3qMGx3TSwNUHVneBZUEpwE4gu3D and find the transaction ade2be579a0c58d38a6a812ce85ed96980313c3aca59d762a1779233bd64ede4

references
references
[1] Ermilov D, Panov M, Yanovich Y. Automatic Bitcoin address clustering[C]//2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2017: 461-466.
