During the 2018 stress test some people expressed the opinion that doing a live-test without better preparedness and methodology was akin to childs play blocking the good work of grown ups.
I subscribe to this opinion and think that the test could've gained a lot from having more people involved in data collection, analytics and participants could've cooperated much more on real-time statistics.
Interestingly, however, the way the stress test ended up being done uncovered bottlenecks that better preparedness might've not - and by looking at and understanding the differences in methodology for the test compared to other tests that are more carefully designed we might get insights into value of our findings.

Comparing the stresstest with the gigablock testnet initaitves ramp-up tests

In the presentation by Peter Rizun and Andrew Stone from the 2017 Scaling Bitcoin Stanford, we learned that the gigablock testnet initiative had 18 nodes, where more than two thirds were generating transactions to stress the network. The nodes all ran a modified version of the Bitcoin Unlimited fullnode software on a virtual machines hosted in various locations across the world.
The 2018 stresstest was a community organized effort to create stress on the real network and all users were encouraged to find ways to help out, ranging from small actions like tipping and liking on memo.cash to building their own transaction generating scripts.
One difference here that is important to note, is that the gigablock testnet initiatives ramp-up tests were done in a scientific fashion by people with significant knowledge in the field, while the 2018 stress test shared the costs across the community by employing a much more mixed skillset.

Examining the findings of the 2018 stress test

Before we look at the main findings from the 2018 stress test, if you are a node operator and have not yet submited your debug.log file for analysis, we would appreciate your help to get more accurate data on block propagation.
During the stress test, it was quickly discovered that there were issues with generating a similar load to the gigablock testnet initiative, and later on some data indicated that block propagation could have been more efficient, some other data indicated that there was issues with transaction validation and well after the test someone pointed out that relay may have been rate-limited all along.
It is easy to take any piece of data, look at it out of context and come to conclusions that feels urgent but that may, in fact, not be related to the issues uncovered by the stress test at all.
If there is something we really should learn from this stress test it is that it had insufficient data collection, analysis and that the methodology was undefined, meaning that we can only make educated guesses or assumptions about the results.
One such educated guess would be that since most people weren't going to make their own stress test utilities, the vast majority of all load created will have come from a small subset of tools, and as such the vast majority of all load will have entered the network from a small subset of available entrypoints, which is a poor simulation of real-world usage.
With this in mind, the findings that rate-limiting was a significant factor to being unable to scale the generated load may not be as bad as it seems, and altering those limitations should probably not be done without further research.
After all, they were there to protect the network against the very threat that we have now subjected it to, and it turns out they worked reasonably well.
My hope with writing this article, is to help people depoliticize the debate about these findings and to put them back into the context where they belong so that they don't get mischaracterized as being representative of real-world usage.
 

$3.00
35.0¢