Blockchains in 10 minutes

Develop an intuition for what blockchains are and how they work, in less time than it takes to mine the next Bitcoin block. Basic exposure to git and hashing algorithms assumed.

Edmund J Sullivan Illustrations to The Rubaiyat of Omar Khayyam First Version Quatrain-051

The Moving Finger writes; and, having writ,
Moves on: nor all your Piety nor Wit
Shall lure it back to cancel half a Line,
Nor all your Tears wash out a Word of it.

—Omar Khayyam

A blockchain is a tamper-proof, append-only transaction log shared over a network. Anonymous, untrusted peers can add valid blocks of transactions to the end of the log, without involving a central co-ordinator or trusted authority. While BitTorrent lets us share fixed, read-only data like ISOs via a swarm of untrusted peers, a blockchain let us share content and add to it, making it a sort of appendable torrent. The Bitcoin currency is built on a blockchain shared on the internet. Other large-scale distributed problems like identity, smart contracts and asset registries can be addressed with blockchain-based solutions.

In computer science terms, blockchains are a practical solution to the distributed consensus problem. This, like the Infinite Improbability Drive, was considered almost impossible. Then one day, young Satoshi brewed a cup of really hot tea…

Let us sidle up to blockchains by considering how a typical open-source project works. It has a git repository, a mailing list, a Benevolent Dictator-For-Life, and a community. Standard operating procedure:

With git, every participant has their own copy of the repo with the entire commit history. Each commit is identified by hash(commit contents + parent commit-id). This makes commit history immutable and tamper-proof. The very name of the latest commit depends on its entire ancestry. Everybody knows the rules for what is considered valid code (it must compile, pass the regression tests, etc.), so each community member can independently verify the validity of patches and the commit history.

Tamper-proof chain: change the contents of any block — names of that block and all subsequent blocks change.

Thus, there are limits to the BDFL's power: they cannot change history or commit spurious patches without everyone noticing. They can, however, simply ignore someone's patches out of spite.

Now let's look at the blockchain.

Which of the maintainers in this anarchist paradise gets to issue the next commit, and how does everyone agree? All maintainers will have perfectly valid candidate blocks. This is the great distributed consensus problem. Made much worse by the fact that participants don't have an identity, join and leave when they like, there is no global time, messages can get reordered, etc. etc.

Well, suppose God rolls dice and anoints a lucky maintainer as BDFL-du-jour, and the chosen one commits the next block to universal acclaim. Everybody updates their blockchain with the latest commit. After a few minutes, God rolls again. (In the future, everyone will be dictator for 15 minutes.)

But we don't like the idea of God playing dice. Instead, we run a decentralized lottery by asking each miner to flip, say, 30 coins. The one who gets all heads wins, and is anointed BDFL-du-jour. To avoid the tedium of passing around videos and allegations of cheating, we use the equivalent process of selecting the miner whose proposed block-id is a “lucky number” with 30 leading zeros. Miners keep tweaking the contents of their candidate block and compute hash(block contents, parent block-id) until they get a lucky number. There is a special nonce field in the block which can be tweaked for this purpose without affecting the transaction payload. Whoever finds a winner promptly publishes their block. Everyone can (easily) verify the fact that this is a winning block by computing its hash, and append it to their copy of the blockchain.

Since the output of the hash algorithm is random, miners have no alternative to brute-force tweaking of block contents to produce a block whose hash (and therefore, id) happens to be a lucky number. There are no heuristics to shorten the process. If you miss by a single digit, you are no closer to the prize than if you missed by 10. In the above example of 30 leading zeros, the probability of getting a lucky block in a single attempt is one in a billion, the same as tossing 30 coins and having them all land heads. It is this asymmetry between finding a lucky number (very hard) and verifying a lucky number (very easy), and the ability of each participant to independently carry out both activities without external co-ordination, which makes the decentralized lottery feasible.

Toss 5 coins, win if they all turn up heads

sha1()

Tweak input to SHA-11, win if the leading 5 bits are 0

Since there are many many possible blocks which may be found as successors to the current latest block, the blockchain can potentially diverge into many branches at each step. So there is a rule that you must consider the longest valid chain as the blockchain, discarding all others. As long as the majority of participants follow this rule, they will all eventually agree on a single chain.

As a maintainer, you gain nothing by sulking and refusing to acknowledge a lottery winner. If a block named LEIA is acclaimed as heir to the latest block ANAKIN, then you're better off accepting her and trying to find a BEN child of LEIA child of ANAKIN. You could dispute LEIA's succession to the Skywalker blockchain by putting up a LUKE child of ANAKIN, but unless you can quickly come up with an ANAKIN → LUKE → REY before an ANAKIN → LEIA → BEN chain is found, your efforts will be wasted. Remember, you alone will be trying to find LUKE and REY against the combined efforts of the rest of your peers who only need to find a BEN.

However, if by incredible luck, you do find ANAKIN → LUKE → REY when the current consensus blockchain is ANAKIN → LEIA and no one has yet found BEN, everybody will acknowledge your chain as the legit heirs to the Skywalker dynasty. All transactions unfortunate enough to be only in block LEIA will be undone and will need to be retried. Once six generations have gone by, however, it's kinda safe to assume that transactions will not be undone. Almost impossible for a pretender to show up suddenly with six generations of impeccable lineage because of the amount of work and luck needed.

If there are only a few participants, the lottery can take a distressingly long time to produce a winner, and transaction commit rate would be very low. Conversely, if we required a short zero prefix — relative to the combined CPU power of all participants — then we would get many winners in the same interval, and there would be a war of succession as the consensus on the longest chain flickers between different lines of descent.

So the Bitcoin blockchain requires all miners to measure the rate of lucky number generation (i.e. the rate at which blocks are added) and dynamically adjust the zero prefix length requirement, such that there is on average one win, or block added, every 10 minutes. This interval lets news of an heir propagate through the whole network before a rival is found.

Alice queen2
“Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”

Of the six impossible things Satoshi did to create Bitcoin, this is the greatest and most hilarious: to achieve distributed consensus, slow down the commit rate with a self-scaling Red Queen's race, so blocks are found and appended one at a time. The race is so exhausting that it is well-nigh impossible for someone fallen six blocks behind to overtake the consensus by creating an alternate longer branch. Combined with the tamper-proof nature of the blockchain, we get an inexorable Moving Finger, an arrow-of-time, with the same mechanisms defending against both subversion and race conditions.

What's in it for the miners? They perform several expensive activities: keeping a local copy of the blockchain, validating transactions, maintaining the P2P network, and above all, brute-force hashing to find new blocks. So Bitcoin rules allow miners to insert a special first transaction in the block which mints a fixed reward of (currently) 25 bitcoin as the lottery prize, and assign it to their own address. BitTorrent has similar incentives, like points and sharing ratios, to encourage participants to seed and contribute network bandwidth to keep torrent ‘standing waves’ alive.

We aren't going to discuss what gets stored in blocks and the transaction rules, because they are orthogonal to our central theme of explaining the consensus mechanism for extending the blockchain. So long as the miners agree on data format and transaction rules and validate proposed blocks against existing blockchain state, we are fine.

This is what the Bitcoin blockchain looks like around late 2016:

000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f (1st block)00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048
…(~417,000 blocks later)… 000000000000000000fb5c0704b121a50e67d19e4d75af0ba88aeaa34511eb6f000000000000000001eb2f52e3df222211a8f45480d958ebd6702f5fa2bc5398000000000000000001eb2f52e3df222211a8f45480d958ebd6702f5fa2bc5398

Note that the zero prefix requirement has gone up from ~32 bits in 2009 to ~72 in 2016, each additional bit representing a doubling of difficulty and a corresponding increase in mining power and popularity2.

Distributed systems tend to be complex beasts with lots of edge cases and error recovery mechanisms. Satoshi has thrown out all the epicycles, synchronization mechanisms, co-ordinators, multi-phase commits, blah blah, and achieves consensus by doing exactly opposite to what the sacred values of throughput and efficiency demand, i.e. drastically reduce transactions/sec by burning up CPU. Clearly, the blockchain tortoise is not a drop-in replacement for the database hare, but it makes slow and steady progress in extremely challenging environments like the internet.

Rather wisely, Satoshi has decided to remain anonymous and forego any Galactic Prize for Extreme Cleverness, or he'd have been lynched by a rampaging mob of distributed systems researchers.


Footnotes

  1. Bitcoin uses SHA-256, git uses SHA-1, but they have the same properties as far as our discussion is concerned.
  2. Technically, Bitcoin uses a slightly different, more flexible ‘difficulty’ function than a straight n zero leading bits, but it is close enough in spirit for the explanation to stand.