The Building Blocks of Blockchain: Cryptographic Hashes
This is the 2nd post in the series “The Building Blocks of Blockchain”. If you haven’t yet, I encourage you to read the introductory post of the series first.
Hashes are certainly a core part of blockchain jargon. People talk about transaction hashes, block hashes, hash rate, hash power, even hash paprikaš! And even though the name — hash — describes pretty accurately how they’re made, it may not be that clear what their purpose is, or how they fulfil that purpose. I’ve decided to cover hashes as the first building block of blockchain because a) they’re what makes a blockchain a block-chain, and not a block-list; b) because they’re fundamental to all of the other building blocks I will cover.
What Do I Need Hashes For?
The simplest use of hashes is being able to check whether somebody changed some piece of data that you didn’t want to be changed.
You probably heard people saying that blockchains are immutable. This is just a fancy way of saying they can’t be changed. Hashes are what blockchains use to ensure their data can’t be changed. I will explain how exactly that works in this post.
On top of that, hashes are also the basis of blockchain mining. Mining is a pretty huge topic, which is why it will get its very own post in this series. I just wanted to mention it though, to get you amped-up for understanding hashes 😬. For now, though, let’s focus on getting a comfortable understanding of what hashes are and how they work.
Hashing Potatoes
The verb “to hash” in itself is not a computer science term at all. It actually means “to chop up” in plain English. That’s also where hash-browns got their name from — they’re made of chopped-up potatoes, pressed back together into a patty!
In much the same way — what we call a hash in the context of blockchain, is simply chopped up data!
I knew you would be looking at me like that. Anyway, I just wanted to tell you what the word means, and why that’s relevant. Now we can break it down step by step, and figure out what the heck I just said up there.
What Is Data?
I mentioned “some input data” a few times now, but it may not be clear what that means. It is really any piece of information. It may be some text, it may be an image, or it may be transaction information. When it comes to computers, they really don’t care. Whatever the data you want to work with, under the hood, it’s always stored as a sequence of numbers.
A simple example of this is text. For text, your computer can represent each character as a single number. E.g. the text “The Frog Prince” can be represented as the following sequence of numbers:
84 104 101 32 70 114 111 103 32 80 114 105 110 99 101
T h e F r o g P r i n c e
Similarly, a picture can be represented with a number for each of its pixels, and a transaction with 2 account numbers and an amount. In the end, it always boils down to a sequence of numbers like the one above.
How Do You “Chop Up” Data?
Now that we know that — in a computer — data is just a sequence of numbers, it will be easier to understand what we mean by chopping it up. The process of chopping up data, i.e. hashing, is done by a hash function. But what is a hash function?
Think of it as a cooking recipe for making hash-browns. The ingredients are the input data, the dish you’re making is the hash, and the recipe is the way how you chop up and stick together the ingredients to make that hash.
In addition, this recipe has to fulfil the following properties:
- It can take data of any length as input
- Its output (the hash) is always the same length, regardless of the length of input
- The same input always produces the same output
- It’s one-way, i.e. you can’t recreate the input from the hash by reversing the recipe
Alright, let’s see an example of such a hash recipe right now:
- Divide the input data into groups of 4 numbers
- Add the numbers together group-wise
- If any of the resulting numbers is 256 or larger, subtract 256 from it until it’s less than 256
We’ll call this the Goldilocks hash recipe. Let’s see how it works on our example text— “The Frog Prince” — from the previous section.
Steps 1 & 2 —divide the data into groups of 4 numbers and add them together group-wise:
84 104 101 32
+ 70 114 111 103
+ 32 80 114 105
+ 110 99 101
-----------------
= 296 397 427 240
Step 3 — if any of the resulting numbers is 256 or larger, subtract 256 from it until it’s less than 256:
296 397 427 240
- 256 256 256
-----------------
= 40 141 171 240
Bam! There we have it — our first hash 🎉: 40 141 171 240
. This really is the basic principle of how hash functions commonly used today work. They may get a bit weirder on the operations they perform on the numbers, but in essence, they divide the data into groups, they do arithmetic, shuffle their order, shuffle their digits, and out comes a new sequence of numbers.
Has Someone Been Eating Your Porridge?
Alright, we now know how to compute the hash of some data, but we haven’t yet shown how hashes tell us whether some data changed. This is best shown on an example.
Imagine someone sends you a message, and in addition, they tell you that the Goldilocks hash of the message is 40 141 171 240
(the same hash we computed above). When you receive the message, it reads “The Frog Quince”:
84 104 101 32 70 114 111 103 32 81 117 105 110 99 101
T h e F r o g Q u i n c e
That message sounds suspicious to you. In order to figure out whether this was really the message that was sent to you, you decide to compute its Goldilocks hash and compare the computed hash to the hash you received.
84 104 101 32
+ 70 114 111 103
+ 32 81 117 105
+ 110 99 101
-----------------
= 296 398 430 240
- 256 256 256
-----------------
= 40 142 174 240
The hash you computed is 40 142 174 240
, which does not match the received 40 141 171 240
, so you conclude that someone has fiddled around with your message!
How Do You Know Someone Didn’t Change The Hash Too?
Aha! Excellent question. What if somebody changes both the message and the hash you use for verifying the data? To prevent changing the hash, the sender will usually digitally sign the hash. If somebody changes the hash, the signature will not check out. There’s no way of changing the signature because only the original sender can generate a valid signature.
We’ll cover digital signatures in the next post. The only thing you need to know now is that they are a way of mathematically proving that a message came from a particular sender.
Why Do Goldilocks Hashes Look Different Than The Hashes I’ve Seen Before?
That’s a good point. If you’ve seen actual hashes on blockchains before, you may remember they always start with 0x
and look something like this:
0xec620a4b0422641eabc024686c608903aa43299f621619d94242f039a556b08f
So why does our Goldilocks hash look like 40 141 171 240
? Well, they’re really not that different. It’s just a convention for hashes to write each number in your sequence not as a “normal” decimal number, but as a hexadecimal number. This is really just another way of writing numbers. I won’t get into how it works now — instead, I’ll resort to witchcraft to convert the numbers in our example hash into hexadecimal notation:
40 141 171 240
28 8d ab f0
This means 40
is 28
in hexadecimal notation, 141
is 8d
in hexadecimal notation, and so on. We can now just paste all the numbers together, add a 0x
at the beginning, and there you have it — our Goldilocks hash 40 141 171 240
is really 0x288dabf0
in hexadecimal notation. By the way, the 0x
at the beginning is used as a sign to indicate that this is hexadecimal notation.
Before I wrap up, I should note one more thing: our made-up Goldilocks hash recipe is a legit hash function, but it is not a cryptographic hash function. Cryptographic hash functions additionally have to meet the following conditions, which make them secure for usage in cryptography and blockchain:
- Irreversible: Given a hash, it should be infeasible to make up a message that results in that hash
- Unique: It should be infeasible to find 2 different messages that result in the same hash
- Avalanche Effect: A small change in the input message should result in a completely different hash
Cryptographic hash functions achieve all of this with a way of computing hashes that’s a bit more involved than our Goldilocks recipe. Nevertheless, all the other principles of how hashes work that I’ve described here also apply to cryptographic hash functions.
Note also that hashes are not encrypted messages. This is a common misconception. An encrypted message can be decrypted using a secret key, i.e. encryption is two-way. Cryptographic hashes, on the other hand, are one-way, i.e. there’s no way you can figure out the input if you only have a hash.
🎉 Thanks for Reading This Far 🎉
I hope this story gives you a good idea of how hashes work. In contrast, I expect that the bit about what hashes are for will fall into place better in the upcoming posts as we learn how they’re applied in blockchain. In the meantime, give yourself time to let this settle in, and please let me know if I left something unclear.
Next building block: Digital Signatures
I find the challenges of modern energy generation and power grids fascinating, and I feel galvanized to help solve them as quickly as possible. This is why I will try to clarify these challenges in a series of blog posts, along with our ideas and attempts at tackling them.
Am I doing a good job? Was that interesting and understandable? Did I share some horrible misinformation? Do you disagree with me? Do you have any questions? Please let me know in the comments or on LinkedIn, or on Twitter :)
If you haven’t yet, you can read the introductory post on what it is I even do here.