When I started learning about bitcoin I was able to grasp the concepts behind the cryptography, mining, and how bitcoin operates as a currency after watching a few videos. It took me a few days to figure out how transactions worked. This is my attempt at simplifying it by mapping the mechanics back to the json in the blockbrowser www.webbtc.com coupled with illustrations to support what I’m babbling about.
If you are new to the concept of blockchain and/or bitcoin transactions this diagram from Bitcoin Bazaar is a good primer.
You may need to magnify your browser to see the detail in the images that I created in this document since I haven’t figured out how to make wordpress content take up the full width of the screen.
First I’ll introduce the schema by walking through example current (spending) and previous (source) transactions, then i’ll talk about how the two are linked. Finally I’ll talk about how the transaction is validated by bitcoin miners before its added to the blockchain using bitcoin script.
- transaction schema in json
- relationships between current and previous
- verifying the transaction
Let’s start by looking at a random simple transaction and go through some important elements top to bottom. I chose f200c37aa171e9687452a2c78f2537f134c307087001745edacb58304053db20 because it happened to be near the top of the most recent block when I was running through this.
Webbtc.com adds some extra derived elements to the json that aren’t in the actual data structure that’s stored in the bitcoin blockchain, but they help us make sense of what is going on so I kept them in my diagrams. If you want to look at the actual data structure take a look at it in hex.
At the top we have a transaction hash which is considered a unique identifier for the transaction in the blockchain. Blockchains are essentially a linked list with hash pointers instead of sequential pointers. This gives them the unique properties that make them useful as distributed ledgers, detail about the overall data structure is beyond the scope of this article, but you can read about it a reasonable explanation with a picture I like here.
Following that we have a bunch of meta-data including the version, number of inputs, number of outputs, as well as lock time.
Then we see an array of inputs and an array of outputs. Bitcoin is transaction based and not account based as an ordinary checking account might be. This makes it easier for it to be decentralized as well as supports anonymity. You can think of the transfer of bitcoin in the same way as the transfer (signing over) or subdivision of real property. In order to transfer value you have to reference the output of a prior transaction you have a private key to and send it to a target public key, represented in bitcoin as hashed and encoded as an address.
The input (in this case we only have one for simplicity) contains a reference (previous transaction hash) to the prior transaction as well as the index (n) of that transaction in the output array of that prior transaction.
It must be signed with a sender signature as well as their public key. You will see later that this is the first time the public key is revealed since the output in only a public key hash.
We will visit the output in the discussion about the previous transaction since that is the source for this transaction. The outputs you see above are what the recipient of this transaction can use to fund a future transaction.
Following the outputs we see some meta-data provided by the blockbrowser. Block is the hash of the block that this transaction is recorded in. Block number is sequence number of this transaction in the blockchain. Time is the block timestamp.
The outputs of this transaction are used to fund our current transaction. This transaction contains two outputs. I only displayed the first output here since it is the one that is used by our current transaction.
First we see the value of the transferred bitcoin. It is useful to note that in order for this transaction to be valid the sum of the value of inputs need to be greater than or equal to the sum of the value of outputs. Any difference is a spender defined transaction fee to the mining community for adding this transaction to the blockchain.
We then see a scriptPubKey this is the unlock script that needs to evaluate to true when it is concatenated with a valid signature. We will talk about this later. In this example, this script contains the public key hash of the recipient. Something that makes bitcoin confusing is it uses different combinations of hashes for different parts of the protocol. The public key is hashed by ripemd-160( sha-256( $publickey ) ). A number of other script possibilities are possible using the bitcoin script language including the common: p2sh, bare multisig, op_return. Some miners may limit which scripts or op_codes they will accept.
The recipient bitcoin address is derived by the blockbrowser for us by using the bitcoin base58check encoding. Base58check is used to make reading addresses less error prone.
How are these two transactions related? How does signing the current transaction represent a spend of the prior transaction? How are all these keys and addresses related? This is the part that took me some time to figure out. Hopefully my illustrations make it easier for you to grasp.
The transaction hash from the previous transaction will match the transaction hash from the hash in the input array of the current transaction.
A private key (which we do not know) is used to generate the public key hash (which can be derived from a bitcoin address) in the previous transaction. The private key is used to sign the current transaction and the output of the source transaction and generate the sender public key. This action effectively spends the source bitcoin.
The whole current transaction is not signed because you cannot sign the signature that you don’t have at the time of signing. In practice it’s a bit more complicated. You can explore more here.
I really needed to see all the json to make it all click, so I drew the same diagram above with both the previous and current transaction json so that you could see for yourself.
Now that we understand what the transactions are made up of and how a transaction is related to its inputs we can talk about how transactions are verified.
As part of the mining process each transaction is verified. Verification was made as a script in bitcoin so that you can specify interesting conditions for how your bitcoin can be spent. The most common script is a Pay to Public Key Hash (P2PKH). It represents the vast majority of bitcoin transactions.
In theory you can do a lot of fancy stuff in your scripts, but in practice it is limited by what miners will accept. Bitcoin has its own scripting language called bitcoin script. It is a simple stack based language with
- commands that operate on the top of the stack,
- data that can be added to the stack and
- notably no loops to make sure that each script terminates.
- This makes it not a Turing-complete language.
To assemble our P2PKH script we take the scriptSig from the current transaction and concatenate it with the scriptPubKey from the previous transaction. This is a nice marriage of the current transaction and its input.
I have seen people call the scriptPubKey the locking part of the script and the scriptSig the unlocking part. The scriptPubKey specifies the conditions that must be met in order to use the bitcoin (or in order to unlock it) in this case the recipient should be able to sign with a public key that is represented by the destination public key hash. The scriptSig should unlock this transaction output by signing the transaction, transferring value to the new address, and being included in the accepted blockchain. I say “should” because in theory there are ways to subvert this process with what’s called a double spend, but it is generally accepted today as impossible, or at least incredibly impractical.
In order to verify our transaction we have to make sure
- the signature matches this transaction,
- and is signed by a public key that has rights to the transaction input.
I have put the script on the left side with a pointer indicating where we are. The stack is illustrated on the right with an * for each item pushed (appended) by the current script token and a
strike-though for each token that is popped off (removed) by the current token.
The first thing our interpreter encounters is a piece of data, the sender signature. In bitcoin script data is simply pushed onto the stack. We have pushed the sender signature onto the stack.
Next, the interpreter pushes our sender’s public key onto the stack in the same way.
OP_DUP unsurprisingly duplicates the top item on the stack.
OP_HASH160 takes the duplicate public key from the current (spending) transaction and hashes it to turn it into a bitcoin public key hash using the ripemd-160( sha-256( public key ) ) function we mentioned earlier.
Now we have a hashed public key that should match the destination public key hash specified in the previous transaction. If the spender of this transaction is honest this will be the same public key that should have rights to spend this previous transaction output.
The interpreter pushes the destination public key hash from the previous transaction.
The interpreter then encounters OP_EQUALVERIFY. It compares the top two items on the stack (destination public key hash from the source (previous) transaction and the hashed public key from the signature in the spending (current) transaction.
If they are unequal it would throw an error and quit which would indicate to anyone running this script that this spend is not valid or unauthorized. This is a nice short circuit which spares the interpreter from computing the entire signature if the public keys don’t match, but we are not quite there yet.
If these top two stack items are equal the interpreter proceeds.
The last token in our script is an OP_CHECKSIG. It is nice that there is one command in bitcoin script which takes a public key and signature and then verifies if those are valid to sign the current transaction.
If the signature is valid it pushes a true token back onto the stack.
The interpreter sees there is nothing left to execute. It checks to see if the stack contains only one token and that the value of that token is true. If it is our transaction is valid!
Thank you to the above which helped me better understand blockchain and bitcoin and made this article possible. I highly recommend these resources to anyone trying to understand blockchain and bitcoin!
- ed felton, et al: bitcoin and cryptocurrency technologies online course https://www.youtube.com/channel/UCNcSSleedtfyDuhBvOQzFzQ
- ken shirriff: bitcoins the hard way: using the raw bitcoin protocol http://www.righto.com/2014/02/bitcoins-hard-way-using-raw-bitcoin.html
- bitcoin wiki: op_checksig bitcoin script op code
- bitcoin wiki: transaction specification
- webbtc: bitcoin blockchain browser
Thank you to icons8 who made their beautiful icons available for free which hopefully made following this document easier.
- icons8: icons in this document
These are some of the questions that I encountered when going through this as well as questions others have asked:
What is signed by the signature in the spending transaction? How do we know the sender authorized this transfer if the whole spend is documented over two transactions?
This is a good question. It is neither the current or previous transaction, but rather a hybrid of both. It includes
- the scriptPubKey from the previous transaction, which gave them ownership in the first place and
- most of the current transaction, except the signature since that hasn’t been created yet. This specifies to whom the value should be transferred, or who will now have rights to the bitcoin.
Why does the sender need to sign the whole transaction? Shouldn’t the recipient be allowed to change how the bitcoin is spent?
The sender or sender(s) must decide which address they wish to send the bitcoin to. Once the transaction is published the recipient can create a new transaction and spend the bitcoin value as they wish.
Where is the bitcoin address in the transactions?
The bitcoin address is not exactly included as you see it in your wallet when you generate a new address. That address is a special encoding of the public key hash in each output element of the output array of a transaction. Take a look at the previous transaction above to see where that is.
What does it mean for a public key to sign something in bitcoin? I thought that you only need a transaction output to spend. Is that not an address?
Think of a public key as an identity. The identity has permission to spend transaction outputs and can sign those over to other identities, subdivide, merge, and/or destroy those outputs. The public key hash in each output element of the transaction output array is an encrypted version (hash) of the public key. That public key hash is encoded in a special way (wallet interchange format or WIF) to make it readable as an address in bitcoin wallets. Take a look at the previous transaction above to see where that is.