Decentralized Web Server: Possible Approach with Cost and Performance Estimates

A comprehensive collection of phone data for research analysis.
Post Reply
mostakimvip04
Posts: 83
Joined: Sun Dec 22, 2024 4:32 am

Decentralized Web Server: Possible Approach with Cost and Performance Estimates

Post by mostakimvip04 »

At the first Decentralized Web Summit Tim Berners-Lee asked if a content-BK and TBLaddressable peer-to-peer server system scales to the demands of the World Wide Web. This is meant to be a partial answer to a piece of the puzzle. For background, this might help.

Decentralized web pages will be served by users, peer-to-peer, but there can also be high-performance super-nodes which would serve as caches and archives. These super-nodes could be run by archives, like the Internet Archive, and ISPs who want to deliver pages quickly to their users. I will call such a super-node a “Decentralized Web Server” or “D-Web Server” and work through a thought experiment on how much it would cost to have one that would store many webpages and serve them up fast.

Web objects, such as text and images, in the Decentralized Web are generally retrieved based on a computed hash of the content. This is called “content addressing.” Therefore, a request for a webpage from the network will be based on its hash rather than contacting a specific server. This object can be served from any D-Web server without worrying that it will be faked because the contents will be checked to make sure it is the right content by rehashing it and checking to make sure it was right.

For the purposes of this post, we will use the basic machines telemarketing data that the petabox-in-great-roomInternet Archive currently uses as a data point. These are 24-core, 250TByte disk storage (on 36 drives), 192GB RAM, 2Gbit/sec network, 4u height machines that cost about $14k. Therefore:

$14k for 1 D-Web server
Let’s estimate the average compressed decentralized web object size is 50KBytes (an object is page, javascript, image, movie—things that make up a webpage). This is larger than what the Internet Archive web crawl average, but it’s in the ballpark.

Therefore, if we use all the storage for web objects, then that would be 5 billion web objects (250TB/50KB). This would be maybe 1 million basic websites (each website would have 5 thousand web pieces which I would guess is much more than the average WordPress website, though there are of course notable websites with much more). Therefore, this is enough for a large growth in the decentralized web and it could keep all versions. Therefore:

Store 5 billion web objects, or 1 million websites
How many requests could it answer? Answering a decentralized website request would mean to ask “do I have the requested object?” and if yes, to then serve it. If this D-Web server is one of many, then it may not have all webpages on it even though it seems we could probably store all pages for a long part of the growth of the Decentralized Web.

Let’s break it into two types: “Do we have it?” and “Here is the web object”. “Do we have it?” can be done efficiently with a Bloom Filter. It is done by taking the request, hashing it eight times and looking up those bits up in RAM to see if they are there. I will not explain it further than to say an entry can take about 3 bytes of RAM and can answer questions very, very fast. Therefore, the lookup array for 5 billion objects would take 15GB, which is a small percentage of our RAM.

I don’t know the speed this can run, but it is probably in excess of 100k requests per second. (This paper seemed to put the number over 1 million per second.) A request is a sha256 hash, which, if recorded in binary, is 32 bytes. So 3.2MBytes/sec would be the incoming bandwidth rate, which is not a problem. Therefore:
Post Reply