First UTXO Commitment on Testnet
If you are scrutinizing every block on the Bitcoin Cash Testnet, you may have just noted an oddity. In block 1237565, the coinbase transaction contains an additional output:
This metadata is a UTXO Commitment as it is currently being developed across Bitcoin Cash implementations. Let us see how it works.
Fast Syncing Nodes
Currently new full nodes download the entire historical blockchain. Not only does this cause an annoying delay in the initial setup, but this also results in a disproportionate burden on existing nodes that spend a large portion of their bandwidth providing these old blocks.
Why do full nodes need this? They don’t actually validate these blocks. Most implementations contain a parameter called "assumevalid" which contains a hardcoded default:
This isn’t as trusting as it sounds. When you would validate these blocks, you would validate them according to the rules as written in the software. All that the assumevalid hardcoded value claims, is that the these blocks follow the rules as programmed in that very same software. It’s merely a “precompiled validation” that does not increase trust or decrease security.
The only reason that these full nodes need the entire history is to initiate their bookkeeping: To validate incoming transactions and blocks they must know which outputs of which transactions are currently unspent. They must create an Unspent Transaction Output set, or UTXO set. Generating this set requires them to browse through all blocks gathering all outputs, and crossing off outputs that are spent by inputs.
It would be a great improvement if full nodes would be able to just download the UTXO set (~2gb) instead of the entire history (~140gb).
A UTXO Commitment: First attempt
A commitment of a dataset is a value that is deterministically calculated for that dataset.
A good example is how transactions are committed to blocks. When we colloquially say that a transaction is in a block, we actually mean that the transaction is committed to the transaction commitment ("merkle root") of the block header. This means the commitment can only be constructed using that transaction. This allows nodes to download the transactions and verify whether with them, the commitment can be constructed and thus whether the transactions are “in” the block.
We can do the same with the UTXO set. If we create a commitment of the entire UTXO set somewhere in the block (in a coinbase output), new full nodes can download the UTXO set and verify it against this commitment.
A simple way to construct the commitment is to order all UTXOs by some key and then hash them all together:
Such a commitment (24d0…) would work for syncing full nodes. It is uniquely determined by the set, so new full nodes can download the UTXO set and verify it against this commitment.
It has a major flaw though: Nodes will need to completely reconstruct it from the entire UTXO set every block. This would take way to much time for it to be feasible.
Making it iterative
We need to construct a commitment that is iterative: implementations should be able to update a commitment from the transactions in a block, without completely reconstructing it.
A simple change to make this possible, would be to hash each UTXO separately, treat these hashes as numbers and simply add them all together:
Now new syncing nodes can verify this the same way as before, but the same commitment can also be updated per block. The implementation can simply subtract the hashes of outputs that are spent, and add the new outputs and the resulting commitment will be the same as when it is constructed from the current set.
This also has a flaw however: It isn’t unique. It is possible for an attacker to construct a UTXO set that will result in the same commitment. Although the commitment is too large to try every combination, the attacker can abuse the fact that we’re using simple addition to apply clever algorithms to find such set easily.
Making it secure
Luckily to make it iterative doesn’t require us to use addition. We can also use something that behaves like addition. As long as we have an operation that is commutative and has some way to apply it in reverse it will work. We can use any group, and a good candidate for a group with better security properties than addition would be the secp256k1 elliptic curve and its group operation.
Instead of adding the hashes together, we interpret each hash as x and then find y such that y² = x³ + 7. Then (x,y) is a point on the curve, and we can combine these points using the Elliptic Curve group operation “⊕”. (Exceptions aside, A ⊕ B = C means finding C such that there exists a linear equation that matches A,B and C).
Such a construction is called an Elliptic Curve Multiset Hash, or ECMH.
There is one caveat to this approach: For many values of x, the curve equation has no solution. Only for approximately half of them.
To solve this, we will simply add a number (a "nonce") to our hash, hash it again, and increment the number as long as the algorithm fails:
Now we have a secure commitment that can be updated iteratively per block and that can be used for new full nodes to check the UTXO set they receive. This is the commitment as it is found in the block on Testnet.
There is still some work to do before we have fast syncing nodes.
We can distinguish 4 steps:
- Maintain and include an informational UTXO commitment in the coinbase.
- Implement utxo/getutxo P2P messages to allow transfer of UTXO sets
- Verify the UTXO commitment as part of the block validation rules.
- Implement a fast-syncing bootstrap method.
What we are witnessing in block 1237565, is an initial test of the code for step 1 which is currently being reviewed and discussed by various implementations.
This UTXO commitment would greatly help with full nodes initial syncing. It does not however enable UTXO inclusion or exclusion proofs. Although you can verify the entire set against the commitment, you cannot verify individual UTXOs this way.
UTXO inclusion and exclusion proofs may be interesting to allow a new type of wallet that doesn’t rely on tracking back blocks to find transactions when it has been offline, but instead can just sync up current balances.
Future version may make this possible. For instance, it may be possible to divide the UTXO set in buckets where each bucket maintains its own ECMH and the buckets are hashed in a tree.
The first version of the commitment is focused on the primary goal: to make the initial sync of full nodes blazingly fast and to reduce the bandwidth burden of providing old blocks to new nodes.