VDF FAQ or Making an Open Source ChipSun Mar 17 2019
The purpose of this project is to create a robust source of knowledge on Verifiable Delay Functions in order to bootstrap a freely accessible research community for the potential development of open source hardware for VDFs.
Our stance is that VDF research provides the crypto community with an excellent opportunity to advocate for openness, fairness, and education in hardware design, given that this is the first time a major blockchain platform has spearheaded an effort to create and provide ASIC hardware for the general public and not for private for-profit mining.
Here we will start by discussing the bare basics (the "Explain it like I'm five") of how VDFs work and what their purpose is, and hopefully in later installments move into more and more intricate technical territory that gets us closer to implementation.
The goal is for this document to be understandable and fairly easy to read by anyone who has basic knowledge of what a blockchain is and how it works. We want this knowledge to be accessible to anyone in the community who might find it of interest.
If you spot an error in this document or want to contribute to its further creation, please tell us in Qi Hardware's Telegram Group
What is a VDF?
VDF stands for Verifiable Delay Function.
A VDF is any function that takes a long time to calculate on available hardware (say, an hour - this is the delay we are speaking of), but has an output that can be verified near-instantly.
So, just to paint a charming little picture with humans instead of computers, imagine we give a man named Alex sitting at a desk a number, say,
55, and tell him to run the VDF function and let us know what the output is. He furiously crunches numbers for an hour, and then, wiping sweat off of his forehead, he stands up and tells us proudly "ok, the answer is
Bob is standing next to Alex and glances over his shoulder at the number. He makes a few scribbles in a notepad and says "Yep, that looks like that checks out!". He does not need to spend an hour making the same calculations that Alex did to know that Alex did the work correctly. Once the answer is in front of Bob, it is easy for him to verify that it is correct.
We will get into how this is possible at a later date, but for now just know that this is what it does.
What is the purpose of the VDF?
VDFs can be used for a few different purposes, but we are interested in using them to generate random numbers on the blockchain.
Why do we want to generate random numbers?
The major reason is that we (and by we, we mean the Ethereum community), want to replace the proof-of-work system Ethereum runs on with a proof-of-stake system. Proof-of-work ("mining") is when block producers compete by expending massive amounts of energy to solve the puzzle that will let them publish the next block. In proof-of-stake, a user simply needs to have 32 Ether in their account to be allowed to publish blocks; they do not need to waste electricity. However since there is no competition involved, we need to have a way to collectively choose who actually gets to publish the block, and to keep it fair we need to do this at least somewhat randomly.
Why do we need this special way to generate random numbers on the blockchain? Why can't it be done the way normal software does it?
So, first of all, computers cannot truly generate random numbers from scratch because a computer program is simply a set of directions followed precisely with no room to do something new or improvise. We need to provide the computer with a "seed" for the randomness function, or a source of randomness from the real world which it can perform math on to give us a random number in the range we want. A typical source of randomness is the number of milliseconds on the system clock at the specific moment, which is going to be different every time the program is run and pretty much unpredictable. The most sophisticated random number generators like random.org will use fluctuations of air molecules in the atmosphere for this purpose since it is even less possible to predict.
On the blockchain, all operations are run by every participating computer in the same order (though not necessarily at the same time) and expected to produce the same output. So we need a source of randomness that all participants can agree on. The only possible sources for this are values published on the blockchain itself.
So for example, instead of each computer looking at the number of milliseconds on their system clock and using it as a source of randomness, maybe it could use the number of milliseconds on the timestamp of the previous block, since all participants can agree on this.
However there is a huge problem with this idea! Namely - the block producer is totally in control of the amount of milliseconds on the time they choose to submit the block! Let's say a participant figures out that if the RNG (random number generator) takes
444 as an input, then they will earn the right to submit the next block. Therefore, as they are assembling the current block, they will simply record a time ending with 444 milliseconds, and thus easily earn the right to publish one block after another, totally monopolizing block production.
Obviously there are things we can use as a RNG seed that aren't quite as easily gamed as the timestamp, but the general idea will always hold - the block producers will always have influence over what data they include in the current block and can game it to increase their chances of earning the right to the submit the next block.
Now, imagine that instead of using the timestamp of the last block as the seed of randomness, we use the timestamp of the block published an hour ago, but ran through a VDF.
Now, it is much harder for the block publisher to rig the input. They can't figure out when publishing a block if using a certain timestamp will help their odds of publishing another block an hour later, because it takes a whole hour to calculate what will happen! In reality, using the timestamp is still not a good idea, but we are getting much closer to a viable solution.
Essentially, the situation is like block producers are all grabbing lottery cards and waiting for a speaker to call out the number on the winning ticket. The VDF ensures no one can "cheat and look ahead".
How is VDF different than a proof-of-work?
In a proof-of-work system (mining), miners also expend a lot of effort to find a value (called a "nonce") given a certain input (the block they are proposing) that can be verified instantly as correct.
The main difference is that mining is parallelizable, meaning that the more computers one has working at the same time that one dedicates to this problem, the faster one can find a solution. A VDF is an algorithm that can only be run sequentially. Adding more computers to the problem will not help you. No matter how many computers each participant has, they must be able to reach the same solution at roughly the same time. Otherwise the "delay" can easily be beaten by someone with many computers.
VDF is not meant to be a competition or a race either. The goal, again, is for everyone to reach the same solution at approximately the same time.
Also unlike proof-of-work, we are not expecting block producers to run the VDF themselves (explained in more detail later). It is not some sort of "tax" that must be paid in order to submit a valid block. All that matters is that someone (any one person in the world) is running the VDF and facilitating the whole process that allows our random number generation to be fair.
What are other applications for VDFs outside of generating random numbers in blockchains?
In the original paper that outlined VDFs, two additional use cases are proposed outside of what we described here.
Proof of replication
One is "proof of replication". Imagine that there is some data you want to be absolutely sure is backed up even if some catastrophic situation occurs. You therefore go out and pay somebody to have your data backed up across 100 devices.
Now you want to make sure that this person is not cheating you and that they actually have your data backed up 100 times. You ask one of their servers, say server #78 out of 100, to show you a chunk of your data to prove that they have it. The server gives it to you as requested, and you are happy… until you realize that you have no way of knowing for sure that server #78 didn't just go and ask server #1 for it, since given a good internet connection this would be basically instantaneous to retrieve! Maybe they just have your data backed up on one server, and not one hundred like they are supposed to, since cheating you in this way would save them money and resources.
You can fix this problem by demanding that instead of simply storing your data, they store your data on each server ran through a different VDF. It will take, say, a few hours to get the VDF output from the original data, but you will be able to get the original data from the output of the VDF near-instantly. If they can give you your data ran through a VDF, you know that they must be storing your data as they should, since there is no way they could possibly be calculating it on the fly.
Preventing fake history attacks
Another application of VDF also applies to securing proof-of-stake blockchain systems. One issue with proof-of-stake is that the possibility opens up for malicious users to create a fake history of the blockchain system and present it to new users joining the network for the first time. After all, these users need to get the history from someone, and immediately after logging on they have no way to tell who is honest and who is lying. Naturally these malicious users would probably present a fake record of events in which they come out having more money than they do in the history that truly happened.
In a proof-of-work system, faking the history is impossible, because block publishers must spend enormous amounts of energy to find the secret key (the nonce) which will allow them to publish a valid block, and if this key is not valid it can be detected by anyone else instantly. Therefore in order to make a plausible fake history, someone would need to spend roughly as much electricity as it has taken to mine the entire Bitcoin history up to this point, which is not plausible. No constraint like this exists in proof-of-stake.
But using VDFs, we can come up with a system in which we can prove that data is truly old. If, say, at a certain point we feed data from the blockchain's history into a VDF that takes a year to run, a year later we will be able to prove that the history up to that point is at least a year old, which no forger will be able to equivalently do.
What is the alternative to using VDFs for random numbers? Is this the only possible way to generate random numbers or will something else work?
VDF is a proposed upgrade (from a researcher named Justin Drake) to a system called RANDAO, which when Ethereum 2.0 launches will be the system used for random number generation. RANDAO is a sort of virtual committee in which each participant in block creation are requested to present a random number to the network. RANDAO will then combine all the various inputs it received in a certain way to use as a seed for our randomness functions.
The numbers are submitted in a way that prevents cheating, by requesting submissions to be send first in an encrypted fashion, then sent revealed. This is basically the equivalent of a game where everyone plays a card face down, then only after all players have played their card, the players one by one turn the cards face up. The way Ethereum 2.0 is planning on doing this is like such: first they must submit a public key to the rest of the chain. Then once everyone has done this, all users will each sign the same number with the public key they just submitted. The cryptographic signatures that are produced are used as the pool of random numbers.
The proposal behind VDF is not to replace RANDAO, but to improve RANDAO by using VDF afterwards, i.e. feeding the output of RANDAO into the input of VDF.
The reason that we are using both RANDAO and VDF instead of just running VDF on some existing on-chain value similar to the timestamp idea we proposed earlier is basically for double security - in case VDF breaks (like if quantum computers are built allowing users to run VDF much faster than the desired time) we will still have RANDAO, which is not perfect (explained in the next paragraph) but not completely broken.
Why is VDF an improvement on existing solutions?
There is one way to cheat at RANDAO, but you can only do it if everyone else has already revealed their cards, and you are either the last person who is expected to reveal your "card" (the number you submitted to the randomness function), or are conspiring together with everyone else who has yet to reveal their cards. It is very simple - imagine you look at all the cards already face up and notice that with the cards the way they are, you "win" the randomness game (and therefore the right to produce the next block). You then simply choose not to reveal any more cards, and your number will not be taken into account. This may seem like a small flaw, but any amount of influence users can have over our random number generator is a bad thing - functions used to secure billions of dollars cannot be "leaky" in this way!
An often-considered solution is requiring a security deposit of sorts to play the RANDAO. If a player cheats, we can take away their security deposit as punishment, and thus make cheating not worth it. In Ethereum 2.0, RANDAO is played by block producers, who already must have at least 32 Ether locked into the system in order to produce blocks. In order to make this system work, we must punish people who try to defraud the network or go offline for too long by taking some or all of their 32 Ether. So one might think that we can just punish cheating at RANDAO the same way we already are planning to punish several other forms of bad behavior.
The problem is that we can't punish cheating aggressively enough to prevent people still potentially being able to gain from it despite the loss of their deposit. This is because RANDAO is not meant to be used solely for choosing block producers, but also for any application running on Ethereum that requires a random seed for any purpose whatsoever, and there's no telling how much money might be at stake in any one of these apps. Imagine that someone has an Ethereum application that facilitates a lottery game in which the winner will receive millions of dollars worth of Ether. Even if we took away all 32 Ether, a block publisher might still be able to make cheating very profitable by rigging the lottery outcome in a situation like this!
In reality, we can only punish by a very small amount because if someone doesn't show up for the final round of RANDAO, it could be cheating, but could also just be poor network conditions. Forcing someone to risk losing tons of money in the event of a slight hiccup in their internet service makes becoming a block publisher seem like a very unattractive option. This would not be a good thing for the security of the Ethereum network.
How long would the delay in the VDF likely actually be in practice?
The current proposal is to set the delay time to 102.4 minutes, or a little over an hour and a half.
This is because blocks are produced in 6-second increments, and it takes an "epoch" of 64 blocks to play the RANDAO. This is a 6.4-minute period in total, which means that as long as it takes someone longer than 6.4 minutes to solve the VDF, there is no possible way they can "cheat" and bias the RANDAO before it ends. (In reality, you still are going to need to solve it quite a bit faster than that to get a decent shot.)
So why set the delay in the VDF for 102.4 minutes, and not just 6.4? The reason is that we anticipate that bad actors will attempt to create specialized hardware (this is called an ASIC, a single-purpose computer similar to those which have been invented for Bitcoin mining) that will solve the VDF faster than the computers available to the average person, and thus allow them to cheat. We protect ourselves against this bad outcome by setting the delay so much longer than the epoch time so that that even if hardware manufacturers create computers that allow them to solve the VDF much faster than the average person, they will still not be able to cheat.
The 102.4 minute number is therefore the length of time we expect the average person to solve the VDF in, with certain people likely attempting (and succeeding) in solving it faster, but not so fast that they can actually calculate it in time to bias the RANDAO.
Does this mean that if it takes one and a half hours to calculate the VDF, we can only get a new random number every one and a half hours?
No. This is because we can stagger several VDFs running parallel to one another but beginning and ending at different points. We start one per epoch (6.4 minute period), meaning that 102.4 minutes after the very first one is run we start getting regular random numbers every 6.4 minutes. Each random number is then available for any function anyone on the blockchain wants to use it for.
In Ethereum 2.0, who will be running the VDFs?
In the design Justin Drake has outlined, block producers are not required or expected to run the VDF themselves. Rather after block producers play RANDAO, another group called the "evaluators" will run the VDFs on the output of the RANDAO.
Obviously there is nothing stopping the same person from using different computers to be both a block producer and a VDF evaluator, nor would this be inappropriate to do.
Once the evaluators are done calculating, they will broadcast to the block producers the output of the VDF.
Only a single accurate evaluator is necessary for the system to be working at any time. The VDF only has one possible output, so it's just one person that actually must go through the labor of running it after which they can announce the output to everyone else. And at that point everyone else will be able to see that the evaluator is honest, given the function's verifiability.
How will users be incentivized to run/evaluate VDFs? Why spend the electricity?
First of all, it's estimated that running VDFs for Ethereum will only require 160 watts of power, which is a little more than keeping a ceiling fan going in your living room. This costs about 40 cents per day. So this is not exactly a major sacrifice.
There is a reward however. The block producers are responsible for verifying the result of the VDF given to them by evaluators. In order to prevent block producers from being "lazy" and simply waiting for other people to listen to the evaluators rather than doing it themselves, Drake suggests that we might want to give a small incentive to the first validator who listens for and correctly reports the VDF output to the rest of the network.
This doubles as an incentive to run the VDF quickly, since the same person can be a block producer and a VDF evaluator. The quicker their VDF runs, the more of a chance they have of reporting their own VDF output before any other evaluators are done, which means that they will get a reward.
This is a good thing because it's better for VDF evaluators to run the function as fast as possible and not get lazy. This keeps everything going smoothly and securely on the Ethereum network.
Why is the Ethereum Foundation spending $15M to research VDFs?
First of all, it's not strictly accurate that Ethereum themselves are spending $15M (as widely reported), but rather they are trying to pool this money together and split the cost with other blockchain protocols that are planning to use VDFs as well.
Anyway, this money is meant to go to R&D to produce ASIC hardware for VDFs that will be open and available for any Ethereum user. Remember that we need to ensure that no one is running VDFs many times faster than the typical user in order for our algorithm to be secure. If we assume that regular users are running these algorithms on a standard CPU, then it will be very easy to make a chip that can run, say, ten times as faster. However, if the Ethereum Foundation does their own R&D to produce a chip and distributes it to anyone to use, then any bad actor needs to outperform the Ethereum Foundation's project times ten. Even if they can raise more money, this will be quite difficult assuming that the good guys have done a decent job.
The purpose of this document is to help facilitate this process of developing ASICs and advocate for open knowledge and community participation at every stage of creation.
If you enjoyed this article, follow me on twitter: @maxwellsfoley
Original paper on VDFs - https://eprint.iacr.org/2018/601.pdf
Justin Drake's proposal - https://ethresear.ch/t/minimal-vdf-randomness-beacon/3566
RANDAC describing incentives of RANDAO - http://randac.org/RANDAC.html
Everything VDF, read as much as your heart desires - https://vdfresearch.org/