Abstract: This article describes the design of a new opcode for the Bitcoin Cash scripting language called “OP_CHECKDATASIG” [1]. Originally designed to allow Script to import and validate arbitrary messages from outside the blockchain, the evolved design also opens open exciting possibilities for cross-chain atomic contracts.

Background

When someone sends a Bitcoin Cash transaction, they sign it to prove to the network that the owner of the private key authorizes the transaction. The way the signature works is that it uses an opcode called “OP_CHECKSIG” that calculates a hash based on a portion of the data in the transaction, and checks the signature against that. The signature is equivalent to a contract that says the owner of the private key authorizes the transfer. OP_CHECKSIG has a few different ways it can hash the transaction (known as the “Sighash”), to allow different conditions on the transfer. In general, however, it is equivalent to a contract that only defines the transfer of money. But what if you could also sign other pieces of information in the transaction? This would allow other information to be included and signed as part of the transfer contract. This is the idea behind OP_CHECKDATASIG.
One way to think of OP_CHECKDATASIG is as an un-bundling of OP_CHECKSIG. The design of OP_CHECKSIG bundles two distinct concepts together: calculating the Sighash, and checking the signature of that hash. If we imagine re-implementing OP_CHECKSIG as two instructions, OP_CHECKDATASIG would be the second instruction. It just checks a signature against a supplied message and public key. Because of this unbundling, OP_CHECKDATASIG can be used more flexibly to verify signatures for any message from outside the blockchain.

The Story

The motivation behind OP_CHECKDATASIG is to continue the process of improving the Bitcoin Cash Script language. As such, the instruction is intended to be a generic and flexible building block that can find many uses.
OP_CHECKDATASIG is based on Andrew Stone’s OP_DATASIGVERIFY proposal [2]. The original motivation behind the opcode is to be able to validate signatures from an “oracle” on messages from outside the blockchain. This use case has been described in an article by Andrew Stone [3]. This use opens up a vast array of possibilities, since it allows scripts to operate on messages from the outside world.
After the original proposal was made, it went through a series of changes and design refinements based on review and discussion by stakeholders and subject-matter experts [4].
Overall, the theme of the changes was to make the design mirror existing opcodes more closely. This makes it a more conservative and minimal change than the original proposal. Though the design choices of the original proposal may have had certain advantages, in general the reviewers preferred an approach of avoiding implementation complications and sticking close to the design of what is already in the protocol. This helps lower risk by creating minimal change in the implementation. All the little quirks of OP_CHECKSIG are well understood, having been battle-tested on the blockchain for many years. Sticking to the same underlying primitives keep the design in a well understood and safe territory. As a result, the implementation details of OP_CHECKDATASIG are very close to OP_CHECKSIG [5].
An interesting thing happened on this journey, however. Though the driving motivation of the design changes were conservatism and safety, by making the implementation mirror OP_CHECKSIG, some novel potential use-cases were discovered.
The original OP_DATASIGVERIFY proposal took a message of any size as input, then hashed it before checking the signature. Through the review process, following the idea of structuring OP_CHECKDATASIG as the second step in OP_CHECKSIG, it seemed like a nice design to make the opcode take a hash value as input. This also fit with the philosophy of keeping the opcode as a simple minimal building block.
Upon further review, it was realized that passing in a value to be signed without hashing it is potentially insecure. This security hole was identified by Andrew Stone on June 16th. The security hole It would not be a problem if used properly, but made it possible for Script authors to use it insecurely if they were careless. So the design was changed back to hashing the input with double-SHA256, which is the standard hash method in Bitcoin.
However, in the period where the design did not hash the input, people (awemany in particular) realized something interesting: the fact that the opcode did not hash the input meant that you could pass in a Sighash value from another transaction as the message, and the signatures would match. This means that it becomes possible for OP_CHECKDATASIG to test whether it has been supplied with the valid signature for a completely different transaction.
Changing the opcode to doing a double-SHA256 hash on the input would still allow this sighash-as-message use, however now you would have to pass in the pre-image of the hash, which is the serialized transaction data. This data would typically be be hundreds of bytes which would need to be included in the transaction, making this use unwieldy. Luckily, the reviewers found a solution that threaded the needle between both options, yielding both safety and convenience: do a single-SHA256 hash in the input.
Doing a single-SHA256 means the input is hashed, so it is secure, but it also means that a partially-sighash which has only gone through one round of SHA256 can be passed in (only 32 bytes on the stack), which when hashed again with another round will yield the full sighash.
It is also notable that PGP signatures use a single-SHA256. So moving to single-SHA256 makes the signature also potentially compatible with PGP, which also opens up potential uses.
These capabilities open up exciting possibilities. It means that OP_CHECKDATASIG can make spendability dependent on completely separate unrelated transaction being signed. This can work even for transactions on different blockchains, as long as the transaction signing algorithm is compatible with Bitcoin Cash. This group includes Bitcoin, Litecoin, Dash, ZCash. I will leave it to people more creative that I to find novel ways to use this capability. There have already been several ideas floated such as atomic digital goods purchase, and double-spend prevention [6, 7]. I look forward to many new uses I haven’t thought of.

Conclusion

The addition of OP_CHECKDATASIG will add a useful new capability to the Bitcoin Cash scripting language: the ability to import and validate messages from outside the blockchain. By reaching out and engaging subject matter experts and different stakeholders, the initial proposal was modified to address concerns. Though this process, not only did it retain the core utility, but the capabilities were expanded into exciting new areas. The modifications made the new version a more conservative change that sticks closer to the existing system, and also a more flexible tool with novel capabilities.

Acknowledgements

My (Antony Zegers aka Mengerian) role in this process was to reach out to reviewers and coordinate technical discussion and feedback. The information in this article, and the design of the opcode, are a synthesis of the ideas and input of all the reviewers.
I would like to thank Andrew Stone for making the original proposal, Amaury Sechet for coding the first complete implementation in Bitcoin ABC, and all the reviewers, including Clemens Ley, Chris Pacia, Amaury Sechet, Andrew Stone, Mark B. Lunderberg, awemany, and others for their contributions.

References

[7] Use of OP_CHECKDATASIG for double-spend fraud assurance https://bitco.in/forum/threads/gold-collapsing-bitcoin-up.16/page-1213#post-75916

Appendix A: Design Details

OP_CHECKDATASIG has some different design choices than Andrew Stone’s OP_DATASIGVERIFY proposal, resulting from the peer-review and feedback. The changes largely consisted of making the opcode closer to the existing system, and minimizing potential risks. The design is intended to best balance requirements of functionality, minimizing the attack surface, and risk avoidance. The goal of this process of review and refinement was to converge to implementation that has the best set of properties for the long term future of Bitcoin Cash.
This is a list of differences between OP_DATASIGVERIFY and OP_CHECKDATASIG:
  • Signature format same as OP_CHECKSIG, rather than pubkey-recoverable signature.
  • Takes pubkey as input, rather the pubkeyhash.
  • No “type” field with the signature. All signatures are treated as strict DER encoded ECDSA.
  • Hashes the input with single-SHA256 rather then double-SHA256.
  • Includes non-verify version, which does not immediately mark transaction as invalid if it fails, simply returns “False”. Returns “True” if successful. For verify behavior, use OP_CHECKDATASIGVERIFY.
  • Does not leave input message on the stack, all three input values are removed.
  • Order of inputs is different.
The following sections will expand on some of the reasoning for the changes.

Signature format

The original design of OP_DATASIGVERIFY used pubkey-recoverable a signature format similar to what is used in the signmessage/verifymessage RPC. All of the reviewers suggested making the signature format mirror the existing OP_CHECKSIG implementation.
The reason for sticking close to the OP_CHECKSIG format is largely to lower risk, and keep the implementation manageable. Since OP_CHECKSIG has been part of consensus for a long time, its characteristics are well understood. This means that specific of the encoding can be treated exactly the same as what is already there. It is possible that Andrew Stone’s suggested signature format had advantages, but the reviewers felt it also introduced potential unknowns. Issues such as potential malleability, and sighash accounting, would have taken significant amounts of work and study to resolve, and even then would have some risk just because it is different from what is already there. Mirroring OP_CHECKSIG closely allowed all the quirks such as low-s, nullfail, and sighash counting to be done in exactly the same way, thus not introducing any unknowns.
This choice also means that the opcode has to take the public key as an input, rather than the public key hash.

No “type byte” field with the signature.

All signatures are treated as strict DER encoded ECDSA. At first glance, it may seem that to keep the signature format similar to OP_CHECKSIG, we may want to add a “type byte” at the end, in place of the sighash byte. The sighash byte, however, has nothing to do with the signature, it specifies how to process the transaction data to generate the hash that is to be checked against the signature. Since the message checked by OP_CHECKDATASIG comes from externally supplied data, it is unnecessary to have a flag specifying how the data is to be generated.
Any potential future migration to a new signature type such as Schnorr would have to accommodate the OP_CHECKSIG family of opcodes. Since they have no explicit provision for signature versioning, some method would have to be used that does not rely on signature version byte. This implies that there is little benefit to including a type byte for OP_CHECKDATASIG.

Message hashed with single-SHA256

The original OP_DATASIGVERIFY proposal took a message of any size as input, then hashed it with double-SHA256 before checking the signature. Changing this to single-SHA256 is just as secure, and makes the opcode far more flexible by being compatible with Sighashes from other transactions, and other signature systems such as PGP.

Stack Handling

The reasoning for changing the order of inputs on the stack is that message could either be supplied by the scriptSig, or the scriptPubKey, depending on the use-case. For example, in the future maybe it could be generated by an opcode in the scriptPubKey. Changing the order to [<signature>, <messageHash>, <pubKey>] makes it easy to accommodate both cases.
For similar reasons, it was decided to remove all inputs from the stack after execution, like all other opcodes do. It is easy to construct a script that leaves the message on the stack, as OP_DATASIGVERIFY did, using OP_OVER.

Appendix B: Bitcoin ABC Implementation

Implementation of this feature in Bitcoin ABC consisted of 32 sets of changes, catalogued as follows:
Prepare for activation: D1563, D1564
Refactors and code improvements: D1565, D1569, D1573, D1574, D1576, D1575, D1578, D1589, D1595
Test additions and fixes: D1566, D1567, D1568, D1570, D1571, D1580, D1596, D1599, D1619, D1620
Separate signature and sighash-type-byte handling: D1572, D1577, D1579
Sigops counting: D1597, D1601, D1605
Implementation: D1621, D1646, D1653, D1666
Activation: D1625

Appendix C: Implementation Notes

Disabled vs. Reserved Op Code numbers: Other op codes that have been re-enabled on Bitcoin Cash were formerly disabled. When these op codes were disabled, their use was disallowed from all transaction Scripts.
OP_CHECKDATASIG and OP_CHECKDATASIGVERIFY, on the other hand, use op code numbers that were never previously defined and were considered “reserved”. These reserved op codes were treated differently then disabled op codes, and could appear in transaction script if they were in unexecuted IF branches. This has a few consequences:
  • The opcode numbers for OP_CHECKDATASIG and OP_CHECKDATASIGVERIFY appear many times in the blockchain in unexecuted IF branches.
  • Because of this, activation has to be handled differently than for the “re-enabled” opcodes (see https://reviews.bitcoinabc.org/D1563).

Handling Signature without SigHash byte: Some code refactoring was required to handle signature without the Sighash byte. A nice side effect was to greatly increase the speed of the sigencoding tests (https://reviews.bitcoinabc.org/D1580).
 

$9.75
$2.70
Comments
  earned $2.50
Awesome, the cross-chain idea hadn't occurred to me yet. Great that new use cases are coming out!
One more thing -- from my very brief checks, it looks like if someone prepares a message using the 'signmessage' functionality, it is possible to import that result into OP_CHECKDATASIG. Some massaging of the signature would be required (convert to DER format) however it looks like all signmessage does to the message is a double-SHA256 hash, nothing weird. Therefore you can take advantage of the single-hash shortcut to check 'signmessage' results on giant messages, referring to those messages only by their 32-byte single-SHA256 hash.
$2.75
   4mo ago
25.0¢ $2.50 25.0¢
  spent 25.0¢
Great article. Thanks for putting this together.
0.0¢
   4mo ago
25.0¢
  spent 25.0¢
If 2 Opcodes weren’t already wasted on CLTV and CSV to achieve stuff that could be fine with nLockTime this would be a worthy experimental opcode. But as it stands all your good hard work not withstanding the base use case of oracles is better served using pay to hash puzzles for economic and legal reasons ( oracles should be paid and need not be telling the world what service they are fulfilling by publicly signing data ). The side effect of cross chain swaps is interesting but also atomic swaps can also be done via hash puzzles. And any real application of such beyond developer toy examples would never want to announce to the world what pay-trigger your transaction was using. Doing this kind of business logic visibly on the Blockchain is opening yourself up to a lot of exploit and possible legal entanglements.
(You can mentally construct a contract that could be annulled by a legal court because it was determined that basing payment on say the death of a person was deemed an assassination contract, for instance)
That is why hash puzzles are the preferred way of doing atomic settlements and information triggers. They are anonymous, and look the same as any txn. Only the counter parties know that there even was a contract to begin with because the payments happen when a hash puzzle is matched. Nothing is to be said about how you obtained the solution. Which means nothing can be proved legally about it either. Except that you now unlock the payment.
I recognize a lot of good work was put into this proposal, but between this post and the original one by theZerg Andrew Stone, you both focus on more the technical implementation of the OPcode and not enough for the actual business reason why this feature is needed. I believe that alone is enough to put scrutiny on its demand. I do understand that many developers want to help BCH and as a protocol developer, there is little one can contribute to the project if it wasn't for protocol enhancements. That enthusiasm and initiative should well be commended. But speaking from the business side of things given that our supply of spare opcodes have already been wasted, it seems a bit too risky to add a new one if by doing so we jeopardize the validity of our digital money system that needs to last 100 years at least.
0.0¢
   4mo ago
25.0¢
  spent 25.0¢
Hey @WallStreetTechnologist or anybody that knows, what is the reason why we cannot just add more spare opcodes?
0.0¢
   4mo ago
25.0¢
  earned 50.0¢
There's something like 70 unallocated opcode numbers remaining within the 1-byte range, so I'm not too worried about running out of numbers anytime soon.
50.0¢
   4mo ago
25.0¢ 25.0¢
  spent 25.0¢
What happens if we creep into the 2-byte range?
0.0¢
   4mo ago
25.0¢
  earned 10.0¢
I would appreciate credit for first identifying the security hole you state here:
"Upon further review, it was realized that passing in a value to be signed without hashing it is potentially insecure. It would not be a problem if used properly, but made it possible for Script authors to use it insecurely if they were careless."
If you have forgotten, please check my email review to you dated june 16th, second piece of feedback:
"2. The ECDSA signature algorithm requires that the message be hashed. So why hash it first explicitly in the script, and then hash it again during OP_CHECKDATASIG? And if you are considering not hashing in OP_CHECKDATASIG, you are inviting someone to write a script that does not hash message data that is <= 256 bits. This is a large security problem. You are creating a nasty hole in Bitcoin Script that people unfamiliar with ECDSA could stumble into."
Note: the original spec was not extremely clear about whether OP_CHECKDATASIG hashed the message, which is why my feedback is formulated as 2 possibilities.

Additionally, when you say "However, in the period where the design did not hash the input, people realized something interesting: the fact that the opcode did not hash the input meant that you could pass in a Sighash value from another transaction as the message, and the signatures would match. This means that it becomes possible for OP_CHECKDATASIG to test whether it has been supplied with the valid signature for a completely different transaction. "
"people" did not realize this. The Bitcoin Unlimited developer "awemany" did and he deserves credit for both coming up with this idea and for working hard on last minute changes to ensure that this use case remained possible.
Thanks,
Andrew
EDIT: @Mengerian, thanks! I felt it was particularly important in these instances so that people understood that BU was significantly participating in the transformation of DATASIGVERIFY to CHECKDATASIG
35.0¢
   4mo ago
25.0¢ 25.0¢ 10.0¢
  earned 0.0¢
Andrew Stone: OK, I added in explicit credits to you and awemany.
You should realize though, the whole article is a synthesis of other people's ideas and work. It would just be unwieldy to write the whole thing referring to who thought of every different thing first. I don't even always know who thought of things, sometime the first I heard of it was not the first to actually say it. I don't try to claim credit, I state this clearly in the acknowledgements section.
0.0¢
   4mo ago
  spent 25.0¢
So.... for the rest of us who don't write codes. Is OP_checkdatasig safe or not? Does it have any security holes?
Go easy on the non-coder pls.
0.0¢
   4mo ago
25.0¢
  earned 25.0¢
@8888
The way this opcode was implemented, was to make it extremely similar to OP_CHECKSIG.
This means that in practice, the implementations will share the OP_CHECKSIG code for the vast majority of it's functions. All the cryptography re-uses what is already there. The details of the signature format are the same. etc.
The way it reads in data from the stack and hashes it is exactly the same as the OP_SHA256 opcode.
The only "new" thing that it does is allow the output from the SHA256 to be checked against a supplied signature, like OP_CHECKSIG does with SigHash.
So all this to say: Yes, it should be very safe. Everything it does is the same as already existing opcodes, and it re-uses the existing code. This means that the "attack surface" for unforeseen issues or mistakes is extremely small.
25.0¢
   4mo ago
25.0¢