Isolates+storage over http+orchestrators is the future that has arrived

By Niraj Zade  |  2023 Jan 03  |  6m read


Update 11 Sept 2023 - this idea has matured and transformed a lot by now. However, am too lazy to collect all writings into a publishable format.

Context

I am trying to put a feeling into words.

I have spent the past 4-5 years with a very itchy feeling, where at scale computing doesn’t feel like it fits. Like we are running with a good enough solution.

For example: “black and white digital” isn’t a good fit for computing “gray and probabilistic” AI. Because digital in its nature has a desire for perfection in its compute. AI in its nature aims for good enough for now. Like computing on a time budget.

But my itch was on a broader level: Servers don’t feel like a good fit.

And now I realized the reason why I’ve been so obsessed with p2p, distributed storage, networking etc. Somehow, I was trying to break it down to the atomics, and make it truly plastic.

State of the industry

We are at a state which is due to some major ideas being in the industry at the same time.

Virtualization and Containerization

virtualization and then containerization slowly moving in deeper and deeper into the layers of abstraction of compute, and becoming more and more granular.

VMs -> containers -> firecracker -> isolates ->???

Http based storage

With hadoop creating the environment for mature experiments to happen, and the learnings maturing to become more comfortable to use, we have ended up with blob storage over http.

Also, networking speeds in datacenters has been going up at ridiculous speeds.

Idempotency in tasks

Historically, everyone loves idempotency. But fields like data engineering finally made everyone learn it and become conscious about it.

Idempotency, combined with network-based task orchestrators lead to massive scales of compute over servers.

Now mix these three, and we have the current cocktail we are enjoying, and will be enjoying for a long time.

The actual article

With distributed storage over http, and micro-isolates in computing, it feels like data-processing has finally arrived.

Functional programming is a very natural fit to data processing. Idempotent functions, with stable predictable states is perfect for data engineering.

It is what we end up with tasks in airflow.

The nature of these systems make them fit for orchestration. One master or a set of masters orchestrating tasks and managing the flow of data.

If we take the airflow paradigm deeper, into the level of lambdas or v8 isolates, we end up with a whole new beast.

At this level of orchestration, servers and server sizes don’t matter. Everything is a task that is outsourced by the master node cluster. Doesn’t matter if the workers are t2small, m4xl or whatever. If it has the compute quota available, and if it has the ram available, then it can be delegated a task.

Think of this as plastic nano-compute. Individual computers lose all meaning, and computing becomes generalized. The same microtask can execute on anything from a raspberry pi, to a supercomputer.

The cost of this plasticity is network overhead and latencies. Everything is done over the network. So every function needs telemetry. Every function needs inflow and outflow of data. Data that was generated by a task on a node will have to be sent back to the dedicated data storage before the nano-compute’s environment teardown begins.

The environment is created, data is shipped in, data is processed, data is shipped out, environment is torn down. Rinse and repeat.

Just like what we saw in gpus, the bottleneck is the inflow and outflow of data, and not the processing power.

But this system has infinite leverage. Just sit down and imagine it. The orchestrators are simply directors, you can bring in layers of middle management as the tasks break down more and more until reaching their atomic sizes, at which point they are executed by the final employees. In the digital world, we have control over the inefficiencies and layer bloat.

But this isn’t something that will be a problem. This is simply a new paradigm (not new to computing, where it is quite old, it is simply relatively newer in industry).

As an not-so-accurate example: in the current world of networked distributed microservices (sadly), hft firms are still doing single thread processing. Those who need the performance will keep on sticking with the server paradigm.

It feels like cloud computing, and computing at scale in general, has finally arrived. Servers will eventually dissolve away.

Financial efficiency

Now, with a level of control this granular, you can finally achieve the goal your wallet loves:

You can user servers at near peak capacity.

Since you are controlling compute and setting compute budgets on such a fine level, you can keep the server at 99.99% load, if not 100% load.

Log into your console, and look at the fleet of servers running on 40% load. Your wallet funding them at 40% efficiency. Watch your wallet scale up when the load hits 80%.

100% capital efficiency isn’t possible in real life. But in compute? We can get freakishly close.

And the best pat? Since you are working with micro-tasks of short durations, you can reliably run the entire thing on your cloud provider’s spot instances. Take it a step further, and p2p compute clusters are finally here. Holy shit this is WILD.

Ending notes

Here is an example. It is not a perfect analogy, but it has the same feeling.

If you ever worked with raw spark, and then worked with snowflake, you immediately know how free snowflake feels.

Your mind stopped thinking about servers and memory and if they will run out, and what tasks may choke up and break your compute. You just set the compute credits to use, and delegated tasks to it. It itself managed everything by itself, and your org gladly pays the bills.

Not having to think about servers is THE way forward. And the way is paved with rocks of independent worker-isolates glued together with the tar of network calls.


Other Articles

© Niraj Zade 2025 - Website, Linkedin
Website was autogenerated on 2025-12-01
Whoever owns storage, owns computing | Silicon is this generation's atomics