Surya's Blog

Consistency Models

Surya Sathi — Sun, 08 Feb 2026 18:30:31 GMT

Background

One of the most confusing experiences when working with real systems is this:
you do something, the system tells you it worked, and then immediately behaves as if it didn’t.

You update a profile photo and still see the old one.
You change a setting and refresh the page — nothing.

You refresh the screen two or three times. Nothing changes. You check again after half an hour, and suddenly it’s there.

The system is doing exactly what it was designed to do. We may call it a bug but the problem is that most of us don’t have a clear mental model of what the system is actually promising.

That’s where consistency comes in.

One might think of consistency as a binary property — either a system is consistent or it’s not. But once you start looking into distributed systems, that mental model starts breaking down quickly. Consistency isn’t a yes-or-no feature — it’s a set of tradeoffs about agreement, time, and visibility.

This article is my attempt to understand those tradeoffs without hiding behind buzzwords.

What does “Consistency” even mean?

At a very basic level, consistency is about agreement.

If multiple parts of a system are observing the same data, consistency describes if those observations agree, and how fast they agree. That’s it. It’s not about whether the data is “correct” in some moral sense. It’s about whether different observers see the same thing at the same time.

This distinction matters more than it sounds.

A system can accept a write successfully — meaning the system has decided that the change is valid — and still not show that change everywhere immediately. That gap between acceptance and visibility is where most confusion comes from.

Whenever something is written to state, there are two questions one can ask:

Did the system accept the write?
How fast can the other component see that write?

Consistency is often more about the second question.

Why consistency feels simple on one machine

If you’re used to single-process programs, your intuition is solid. You write to a variable, you read it back, and you get the new value. There’s no distance, no delay, no disagreement.

Distributed systems break this intuition because time becomes a problem.

Messages take time. Machines fail independently. A write that succeeds on one machine has to travel before other machines know about it. While that information is in transit, different parts of the system are living in different versions of reality.

This isn’t a rare edge case — it’s the default state of distributed systems.

Replication: visibility, not truth

The databases we use are also systems themselves which have lots of nodes within them. So naturally, for backups or for better availability, we would like replicas.

But replication of data from one node to another takes time.

While replicas are catching up, they can return older data. So, you’ve now encountered inconsistency until the data is copied here — eventual consistency.

A subtle but important thing to know is this: replication “mostly” affects what you can see, not what is true.

A write is accepted by the system somewhere. Replication determines how quickly that decision becomes visible elsewhere. Confusing those two leads to a lot of incorrect reasoning about correctness.

Consistency Models

Strong consistency: Comforting but expensive promise

sequenceDiagram
    participant Client
    participant Leader
    participant Replica1
    participant Replica2

    Client->>Leader: Write X
    Leader->>Leader: Decide X is accepted
    Leader-->>Client: Success (write confirmed)

    Leader->>Replica1: Replicate X
    Leader->>Replica2: Replicate X

    Client->>Replica1: Read X
    Replica1-->>Client: X

    Client->>Replica2: Read X
    Replica2-->>Client: X

Strong consistency is the model most people assume by default, even if they don’t use the term.

Behaviorally, strong consistency (often meaning linearizability) means this:

Once a write succeeds, every subsequent read sees that write — meaning the system waits until global agreement before confirming success.

The cost of this promise is coordination overhead and latency.

To ensure that everyone agrees immediately, systems have to wait and synchronize. Latency goes up. Availability can go down. But you accept those if what matters is correctness for your use case.

This is the experience we expect from things which can cause panics — like bank balances, inventory counts, or account permissions. If money moved or access changed, seeing stale data is unacceptable.

Eventual consistency: Acceptable discomfort

sequenceDiagram
    participant C as Client
    participant R1 as Replica 1
    participant R2 as Replica 2
    participant L as Leader

    Note over R1,L: Initial state: x = v1

    C->>L: Write x = v2
    L-->>C: OK

    L-->>R2: Replicate x = v2
    Note over R1: Replication delayed

    C->>R1: Read x
    R1-->>C: x = v1

    Note over L,R1: Eventually replication completes
    L-->>R1: Replicate x = v2

Under eventual consistency, the system guarantees that all observers will eventually converge to the same state — but not necessarily immediately.

The write can be accepted immediately — but its visibility can be delayed.

This temporary disagreement is allowed in this model.

Eventual consistency sounds scary until you realize how often you already rely on it.

Social feeds, likes, view counts, recommendations, and profile updates often use this model.

Seeing an outdated like count for a few seconds doesn’t break trust. Waiting long for the system to respond would result in bad user experience.

What’s important here is intent. Eventual consistency isn’t a failure to be strongly consistent; it’s a decision to optimize for responsiveness and availability over immediate agreement.

Other Models

There are still many other models such as causal consistency, read-your-writes, monotonic reads, monotonic writes etc. These don’t replace strong or eventual consistency — they refine what a client experiences on top of them. As an example, I’ve gone through just one of them — monotonic reads below.

Monotonic Reads Consistency

sequenceDiagram
    participant C as Client
    participant R1 as Replica 1
    participant R2 as Replica 2
    participant L as Leader

    Note over R1,L: Initial state: x = v1

    C->>R1: Read x
    R1-->>C: x = v1
    Note over C: lastSeenVersion = v1

    C->>L: Write x = v2
    L-->>C: OK
    L-->>R2: Replicate x = v2
    Note over R1: Replica 1 still at x = v1

    C->>R2: Read x
    R2-->>C: x = v2
    Note over C: lastSeenVersion = v2

    C->>R1: Read x (minVersion = v2)
    Note over R1: Replica 1 still at x = v1
    R1->>L: Fetch x >= v2
    L-->>R1: x = v2
    R1-->>C: x = v2

There’s another kind of inconsistency that feels especially jarring, even in systems that are otherwise acceptable.

You read some data.
Later, you read the same data again — and it looks older.

This can happen when one read request went to one replica but the next one went to another replica which was not yet updated.

Nothing about this violates eventual consistency in theory. The system may still converge correctly in the long run. But experientially, this feels wrong. It feels like time moved backwards.

This is where monotonic reads consistency comes in.

Monotonic reads guarantee a simple property: once you’ve observed a particular version of data, you will never see an older version of that same data again. You may not always see the latest update immediately, but you won’t regress.

What’s interesting is that monotonic reads are not about freshness — they’re about directionality. Humans expect systems to move forward, not backward.

This matters more than it initially sounds.

Imagine reading a comment thread, refreshing the page, and suddenly seeing fewer comments than before. Or checking order status and seeing it move from “shipped” back to “processing.” Even if the system eventually fixes itself, user trust takes a hit immediately.

Consistency is a product decision, not just a technical one

One of the most useful things to remember is this: inconsistency is not a bug unless it violates expectations.

Showing the wrong bank balance erodes trust immediately. So it is a bug.

Showing an old profile picture doesn’t (unless your product requirements say otherwise). So it isn’t.

Systems are designed around these expectations, whether explicitly or not.

Once you see consistency as part of user experience and business logic — not just system internals — design decisions start making more sense.

Where this leaves us

We now understand what systems promise, but not how they enforce those promises.

One more thing you can notice — databases “feel” more consistent than caches by default. The answer to why we feel that way is pretty clear once you try to rationalize the concepts we explored so far.

But this feeling can be misleading. Not all databases are strongly consistent in all scenarios — and can deliberately relax consistency under certain conditions.

I will explore this concept further in upcoming articles.

Exactly-Once Delivery is Impossible

Surya Sathi — Sun, 01 Feb 2026 18:30:40 GMT

Background

We discussed this in previous articles that when engineers first encounter distributed systems, they subconsciously carry an assumption from single-machine programming: if something happens, the system knows it happened. If it didn’t happen, the system knows that too. This assumption is rarely stated out loud, but it shows up everywhere—in how we design APIs, how we reason about failures, and how confident we are when we say things like “the request failed.”

We dismantled this assumption as well already. We saw that timeouts do not tell us what happened, only that we waited long enough to become uncomfortable. We saw that retries are not sloppy engineering, but a rational response to uncertainty. Once a system operates across processes, machines, or networks, it can no longer observe reality directly. It must act based on incomplete information.

Once this is accepted, a deeper question naturally follows: if systems cannot know with certainty whether an operation happened, how can they ever guarantee that it happened exactly once?

This is where the idea of exactly-once delivery enters the conversation. It is one of the most attractive promises in distributed systems, and can easily be one of the most misunderstood.

Exactly-once delivery sounds like the ideal world. A message is sent once, delivered once, processed once, and never duplicated. Nothing is lost, nothing is repeated, and downstream systems can stay simple. If such a guarantee were possible in the general case, an enormous amount of complexity would disappear. There would be no need for deduplication or idempotency. We could trust the infrastructure to “just handle it.”

It is therefore not surprising that many of us believe this should be achievable. After all, databases often feel exactly-once. Function calls feel exactly-once. Even some distributed systems documentation uses the phrase casually. But this intuition quietly assumes something that no distributed system can have: perfect knowledge of what is occurring.

To talk about this clearly, it helps to introduce the standard delivery semantics used in distributed systems: at-most-once, at-least-once, and exactly-once. These terms represent fundamentally different tradeoffs.

At-most-once delivery

As the name suggests, it means a message will be delivered at most once or won’t be delivered at all. If something goes wrong, the system does not intentionally retry. The message may simply be lost. This model aligns closely with the simplest possible behavior: send a request, wait for a response, and if nothing arrives, move on. No retries, no tracking, no recovery.

Note: Technically, some retries by default can happen from clients or load balancers, but the system won’t retry intentionally.

The appeal of at-most-once delivery is that duplication is impossible. Because the system never retries, downstream logic never has to worry about processing the same message twice. There is no risk of double-charging a user or duplicating a record due to retries. The system is simple, fast, and predictable.

The cost of this simplicity is correctness. If a request reaches a server, is partially processed, and the response never makes it back to the caller, the system has no way to recover. At-most-once delivery assumes that occasional loss is acceptable. This is why it is better to be used for telemetry, metrics, logging, and other best-effort data. If a log line or metric point is dropped, no invariant is violated. The system still functions.

At-least-once delivery

At-least-once delivery sits on the other side of the spectrum. Under this model, the system guarantees that a message will eventually be delivered, even if it has to be delivered multiple times. Loss is unacceptable, so retries are mandatory.

This model emerges naturally once uncertainty is acknowledged. As discussed in previous articles, if a system cannot know whether a message was processed successfully, and correctness matters, the only rational response is to retry. Over time, this guarantees delivery, but it introduces duplication. The same message may be processed twice, concurrently, or long after the original attempt.

Most real-world distributed systems choose at-least-once delivery precisely because it aligns with reality. Load balancers retry. Clients retry. Message queues retry. Even infrastructure you don’t control retries on your behalf.

At-least-once delivery preserves correctness in the face of failure, but only if the system is designed to tolerate repetition. Without additional safeguards, duplication leads to corrupted state: double charges, duplicate orders, repeated notifications, and inconsistent counters. This is why at-least-once delivery must always be paired with idempotent processing. Delivery is allowed to repeat; effects are not.

Exactly-once delivery

Exactly-once delivery appears, at first glance, to offer the best of both worlds: no loss and no duplication. The reason it is so tempting is that it seems like a small step beyond at-least-once. Add acknowledgements. Track offsets. Retry carefully. Surely, with enough bookkeeping, the problem can be solved.

The flaw in this reasoning is not in the bookkeeping, but in the assumption that acknowledgements represent truth.

An acknowledgement is also a message.

It can be delayed, duplicated, reordered, or lost in exactly the same ways as the original message. If a sender does not receive an acknowledgement, it cannot tell whether the message failed or succeeded. This is just another variant of the two generals’ problem we discussed previously. What matters is the consequence: retries are unavoidable, and retries imply possible duplication.

This leads to a fundamental conclusion that we need to eventually internalize: exactly-once delivery in the general case is impossible. Not difficult. Impossible. It would require perfect knowledge of events in an imperfect world.

This is why serious systems stop trying to guarantee delivery and instead focus on something more achievable: guaranteeing effects.

Exactly-once delivery may be impossible; its effects are not

Delivery is about movement. It is about messages traveling through space and time. Effects are about state. They are about whether money was charged, whether a record was created, whether a balance was updated. Delivery can be duplicated. Effects must not be allowed to accumulate incorrectly.

Once this distinction is made, the design goal shifts. The system no longer tries to prevent messages from arriving more than once. Instead, it ensures that processing the same message multiple times produces the same final state. Repetition is no longer an error; it is a condition to be handled.

This is what we discussed about in idempotency. What idempotency means is exactly-once effects. A message might be delivered ten times. The system ensures that the business outcome occurs once.

This reframing explains why many products have, at various points, claimed to offer exactly-once guarantees without actually contradicting reality.

How products promise exactly-once delivery

Kafka’s “exactly-once semantics” is a good example. Kafka does not guarantee exactly-once delivery across arbitrary consumers, failures, and side effects. What it guarantees is exactly-once processing within a carefully defined boundary.

In Kafka’s case, that boundary is a single Kafka cluster with transactional producers and consumers, where state changes are coordinated with offset commits. A consumer processes records, updates its state, and commits offsets as part of a single transaction. If the consumer crashes before the transaction commits, the transaction is aborted and the records are replayed. If the transaction commits, both the state update and the offset commit become visible together.

Within this boundary, effects appear exactly once.

The hidden condition is crucial: all relevant state must participate in the same transactional system. As soon as side effects escape that boundary—sending an email, calling an external API, charging a credit card—the guarantee no longer applies. So, Kafka did not solve exactly-once delivery in the open world. It solved exactly-once effects inside a controlled domain with a trusted coordinator.

Databases make similar promises. When a database executes a unique transaction, it guarantees that the transaction’s effects are applied once, even if the client retries after a timeout. This works not because the request arrived once, but because the database enforces atomicity and uniqueness. The database is the authority that decides whether a state change is allowed to occur.

Conclusion

In every real system that claims exactly-once behavior, this same pattern appears. The guarantee is never about delivery itself. It is about effects within a boundary, enforced by a system that has the authority to reject duplicates and serialize conflicts. Once you leave that boundary, the guarantee dissolves.

This is why one needs to be skeptical of broad exactly-once claims. Without precise definitions, the promise is usually illusory.

This conclusion is not pessimistic. It is clarifying. Exactly-once delivery is the wrong goal. Exactly-once effects are achievable, practical, and sufficient. Systems that embrace this reality are easier to recover, easier to reason about, and more honest about their limitations.

Now, if delivery cannot be trusted, then correctness must come from somewhere else. Understanding how systems decide on correctness, requires talking about consistency and authority. That is where we go next.

Idempotency

Surya Sathi — Mon, 26 Jan 2026 06:30:14 GMT

When you read about idempotency the first time, it sounds reasonably precise and simple. The definition is short: performing the same operation multiple times should have the same effect as performing it once. On paper, it feels like a solid guarantee. If something can be retried safely, then retries stop being scary.

But the more you try to reason through real scenarios, the harder it becomes to hold that definition true. Idempotency only describes a property a system should have — not the conditions, constraints, or consequences required to actually maintain it.

To understand why, it helps to start with a…

Very ordinary situation

Assume a user added some pizzas in their food delivery app and on the checkout screen, they clicked pay. Now the client, the user’s mobile app, waits for a response — 30 seconds, 1 minute, 2 minutes — but nothing comes back. Maybe the server was slow. Maybe the response was lost. From the client’s point of view, there’s no reliable way to know what happened. The request might have failed, or it might have succeeded and the response just never arrived.

At this point, retrying is not a design choice but a necessity. Waiting forever isn’t acceptable, and assuming failure is risky. So the client retries the request.

Now imagine you’re responsible for handling that request on the server. The obvious concern is duplication. You don’t want the same user to be charged twice. A natural first idea is to check whether the action has already been performed. If it has, do nothing. If it hasn’t, proceed.

That logic feels careful. It also aligns nicely with the intuitive definition of idempotency. If the operation already happened, don’t repeat it.

You will see if two requests have same content like same pizzas from the same customer — if so, mark it as a duplicate and ignore it. But what if the customer deliberately ordered the same pizza two times, they may have ordered lesser amount the first and ordered again - or even maybe, one of the user’s friends friends also uses the same account and ordered from their device again.

So checking the request content can’t be a reliable way. What if you check the device IP or some other property to check if it’s a duplicate. You can hash the user id, device IP and send it in the headers and check on your server. But sending the IP and storing it on your servers to check may not be a very best practice.

So you decide to just generate a UUID and send that in the headers and on your server you store it in a cache and whenever a request arrives, you will first check your cache to see if the request was already received. And you call it…

An Idempotency-Key

If requests were processed strictly one at a time, this approach would be enough. The first request would cache the key and create the payment. Any retry arriving afterward would see that the the key exists and exit early.

But you are clever enough to know that it’s not practical. You do horizontal scaling and storing state in a server locally is a bad idea so you use a Redis cluster. Now even if the request first lands on an unfortunately slow server and the retry falls on a faster one, the second one will first check the Redis and know that the request was already received and return a generic response, say, “your request is being processed”.

But then you recall that that your Redis cluster prioritizes availability and performance and only offers asynchronous consistency. And you immediately recognize that it is a problem here.

What if the second server checked for the key in a Redis node that hasn’t been updated yet due to a lag in replication?

Forget Redis, even in the case of a strongly consistent database, if a request and its retry was processed concurrently at the same time, if you first check and then make an entry for idempotency key, without it being an atomic transaction, you are risking duplication.

Both your servers think the system received the request for the first time, both assume it’s safe to proceed and both start processing the request.

Check-then-act no longer works

Our “check then act” logic no longer works when both the request are processed parallelly.

Even if the cache was replicated almost instantly, just in case the retry arrives an hour later and by that time the cache was already cleared, the retry succeeds the idempotency key check and can be processed again.

Nothing exotic had to happen for this to happen. There was no bug in the code. There was no unexpected input. The failure came, purely, from assuming that the state observed during the check would be consistent enough or even long enough for the action to be safe.

The logic was correct in isolation, but unsafe in the presence of concurrency and network delays.

In this setup, ‘check before I act’ stopped being enough. Idempotency has to mean something stronger. So…

What exactly is idempotency?

A more useful way to think about idempotency is in terms of final state. An operation is idempotent if executing it multiple times — whether sequentially or concurrently — leads to the same final state as executing it once. The focus shifts away from how many times the code runs and towards what the system looks like afterward.

Once you think in terms of state, another complication becomes obvious. The system’s state is not limited to database rows. Sending an email changes the world. Sending an SMS changes the world. Calling an external payment provider changes the world. These effects often live outside transactional boundaries and cannot be undone.

A system might successfully prevent duplicate database records and still send two emails. From the database’s perspective, everything is consistent. From a user’s perspective, something is clearly wrong. This makes it clear that idempotency can’t stop at data storage. It has to account for side effects as well.

Now this doesn’t negate the use of idempotency keys. While idempotency keys can recognize duplicates, they aren’t the complete solutions either.

An idempotency key can only protect what is gated behind it. If some side effects happen outside that protection, retries can still cause duplication in places that matter.

At this point, a pattern becomes hard to ignore. A fragile solution relies on checking state and hoping it doesn’t change at the wrong moment. A robust solution seems to require something stronger — a way to ensure that certain states cannot exist at the same time, no matter how requests interleave. So, is there…

A potential solution?

One pattern I could find, that seemed to handle this better, is gating every sensitive action to prevent duplicates. We can do our check-then-act at the first boundary, but then even in the subsequent parts of the system, we ensure that duplication is not possible.

For example, in this case, generating an order id even before the request hits the server or generating an order id consistently for a request and its retries somehow — so a request and its retry must have the same order id and when both try to create a record in a strongly consistent database, one of them will inevitably fail due to a collision in the order id.

Depending on the system, this may or may not be worth the complexity. You may need your own tailor-made solution.

But wait…

What if the instance that is processing the request at a later part of the process crashes? The retries won’t be triggering the request again and the request has already failed. So instead what we can do is, store the progress and position along with the key.

A retry should recheck; not re-execute

Instead of treating a request as a single, all-or-nothing action, we should start treating it as a process with memory.

The idempotency key can no longer represent just “this request has been seen before.” It should represent where the system currently is with respect to that request.

When the request arrives for the first time, the system creates an entry associated with the idempotency key. But instead of only recording that the request exists, it also records progress. A status. A position. A checkpoint.

Something like:

received
payment initiated
payment confirmed
order created
notification sent

The exact steps don’t matter. What matters is that the system now has a way to answer a much more useful question than “have I seen this request?”

It can answer: “How far did I get last time?”

The server successfully initiates the payment, but crashes before sending a response. The client retries. The retry carries the same idempotency key. This time, instead of blindly starting from the beginning or blindly rejecting the request, the system looks up the key and sees that the process is already partway through.

If payment was already initiated, the system doesn’t initiate it again.
If the order record was already created, it doesn’t create another one.
If a notification was already sent, it doesn’t send it again.

Retries should not be treated as duplicates that must be blocked at the gate but as re-entries into a workflow. The same key should lead to the same execution path, starting from wherever the system last stopped.

Yes, this is difficult. This means different parts of your system, potentially handled by different teams, must be in sync about how they are handling idempotency. But this may be needed for the problem you are solving.

This is where a subtle but important distinction appears.

Idempotency is not about preventing work from happening more than once. It’s about preventing observable effects from happening more than once.

The system may execute code multiple times. It may retry steps. It may re-run logic after crashes. None of that violates idempotency as long as the outside world cannot observe duplicated outcomes.

“Performing the same operation multiple times has the same effect as performing it once” is technically correct — but only if we’re clear about what effect means. It doesn’t mean “the code ran once.” It means “the world ended up in the same state.”

An important observation

At this point, something uncomfortable becomes hard to ignore.

If idempotency requires memory, progress tracking, guarded side effects, and careful coordination across boundaries, then it’s no longer just a retry optimization. It’s a systemic property — one that depends on how work is claimed, how progress is recorded, and how effects are exposed to the outside world.

And that raises a deeper question.

If we have to tolerate retries, crashes, partial execution, and re-entries at every step — how can you be sure that something happened exactly once?

That’s a discussion for another article.

Why Distributed Systems Can't Know What's Happening

Surya Sathi — Mon, 19 Jan 2026 06:30:54 GMT

Background

When we write programs, we have a quiet assumption that goes unnoticed until it gets explicitly questioned — when you send a request, it either succeeds or fails; even when you query a database, it either succeeds or fails. It always holds true when you are developing programs locally — because a local function either returns a value or throws an error, your local database either connects and queries or throws an error, your local API call reaches your program or fails.

Nobody can notice this on local runs because everything runs on same machine — your program, your database, your cache etc. There is no ambiguity about whether an operation ran or not. The program is the authority on its own actions. You get immediate feedback.

The moment network enters the picture, it no longer holds true — your program runs somewhere, database runs somewhere, cache runs somewhere. They communicate through network calls.

Two Generals Problem

There is a well-known thought experiment in distributed systems called the Two Generals Problem. It describes two armies positioned on opposite hills. They want to coordinate an attack, but the only way they can communicate is by sending messengers through a valley where their common enemy’s establishment is positioned. The only way they can communicate is by sending messages, that have to travel through the valley. If first general sends a message proposing an attack time, the message can be captured in the valley by the enemy — so the second general does not receive the message. Or if the second general receives the message and send an acknowledgement, which again has to pass through the valley — it can again get captured by the enemy — So the first general does not know if their message was delivered at all. So the first general sends another message for confirmation. Now the problem repeats — that confirmation might also be lost. And so on. Either ways, there is no way both the generals can know whether their messages were delivered or not.

The core insight of the problem is not tactical or military. It is about certainty. No matter how many messages are exchanged, neither general can ever be completely certain that the other side knows what it knows. At some point, a decision must be made without certainty.

Now, the generals are two systems and the valley is the network. This is not a hypothetical edge case or a missing protocol. It is a fundamental limitation of systems that communicate through unreliable channels. Once messages can be delayed or dropped, certainty becomes impossible.

Every networked system lives inside this constraint.

The Two Generals Problem does not tell us how to fix this situation. It tells us that it cannot be fixed.

Systems that communicate over networks must operate under permanent uncertainty.

Latency is uncertainty

Different components of a distributed system communicate through messages; but when a program sends a message to another program asking it to do something, certainty disappears. The sender no longer has a direct visibility into what happened on the other side. It can only wait for a response, and waiting introduces latency.

Latency, in distributed systems, is not just delay. It is uncertainty.

The issue is not that responses are slow; but that client does not know whether it is “slow” or “never.” When a request is sent across a network, silence does not mean failure, but also not success.

Silence simply means the system does not know.

Timeout is a guess

Timeouts are how systems cope with this uncertainty — you mark a request failed, if it didn’t respond in time — but timeouts are not facts, they are guesses. A timeout does not tell you that an operation failed; it tells you that you waited long enough.

From the client’s perspective, why it didn’t receive response doesn’t matter — it just didn’t receive the response in time, so it thinks the request might have failed; that’s the only thing that matters to it.

Logs are leading and misleading

If a system can sometimes not know what happened, partial failures stop being edge cases and starts looking like default conditions.

A system can be half broken in ways that are invisible from any single point of view. A request can reach the server, be processed fully, and commit changes to the database, while the response packet is dropped on the way back to the client. Or processing happened in time but the response might arrive just a millisecond later. But that doesn’t matter to either of the components.

From the server’s logs, the request succeeded. From the client’s logs, it failed. Both logs are accurate. Neither tells the whole story.

This is why logs feel leading and misleading at the same time. Logs from a single process describe what it believes happened. They do not describe what actually happened globally, because they don’t have global visibility. In a distributed system, every component tells a local story, and those stories can contradict each other without any of them being wrong.

Important Note:

This is where observability and tracing come into the picture; to observe the lifecycle of a request and get the whole picture — but that’s a story for another time.

Also, while global logs might give you the visibility of what happened to a request — it is — after the fact; you notice an issue only when the problem has already started or a user has experienced it; not during decision time. It cannot help prevent problems, just diagnose what happened.

Forced to decide in uncertainty

Can’t we just wait longer? No. How do you decide how long a program should wait?

Systems cannot wait forever. Users are on the other side of these systems, and users have expectations. A user clicking a button expects something to happen within seconds, not minutes.

These expectations force systems to make decisions while living in uncertainty.

So distributed systems are not allowed to wait until they know the truth. They must act while still uncertain. They must decide based on incomplete information.

It is a fundamental constraint of systems that communicate over networks. Once messages can be delayed, reordered, or dropped, certainty disappears. The system must move forward anyway.

Retries are not optional

Users expect progress. Products promise availability. When uncertainty appears, the system must choose an action, and the most common action it chooses is to try again.

Retries exist not because engineers love complexity, but because users hate waiting. A slow system feels broken even when it is technically correct. From a user’s perspective, a retry that succeeds is indistinguishable from a system that worked the first time. From a technical perspective, retries smooth over temporary failures.

What makes retries especially dangerous is that they rarely belong to a single place in the system. Even if you decide not to retry in your applications, something else almost certainly will. Browsers retry requests when connections are dropped. SDKs retry when you switch networks. Load balancers retry when upstream appears slow. Message queues redeliver messages when acknowledgements are delayed.

So you cannot fully control retries. You should design for them whether you like it or not.

A retry is not a repeat; it is a correctness problem

One would often think of a retry as a repeat of the same action, as if the system tried again under identical conditions. That is not what a retry is. A retry is not a replay. It is a new attempt after some time, under different load, possibly on a different machine, with different activity surrounding it.

Between the original request and the retry, the state may have changed. Caches may have warmed or expired. Locks may have been acquired or released. Other requests may have modified shared state.

This turns retries into a correctness problem.

Once retries exist, the same logical request can arrive more than once. It can arrive twice in sequence because the client timed out. It can arrive concurrently because a load balancer retried while the original request was still executing. It can arrive after partial success because the server crashed after committing data but before responding. It can even arrive out of order, with a later retry processed before the earlier attempt finishes.

From the system’s point of view, these are not edge cases. They are normal conditions.

Side effects

Up to this point, it is easy to think of a request as something that “runs code” and produces a result. But systems don’t just compute values. They change the reality around them. These changes are what we call side effects.

The moment an action changes something outside the process that executed it, you’ve created a side effect. Writing to a database is a side effect because it changes shared state. Publishing a message to another service is a side effect because it causes the other program to do some work. Sending an email is a side effect. Charging a credit card is a side effect.

Effect of retries on state and side effects

The moment a system processes the same request more than once, state becomes vulnerable. Data that was meant to be written once may be written twice. Counters may increment incorrectly. Emails may be sent twice. Money may be charged twice. Workflows may be triggered twice.

The code often reads as if a request has a clear beginning, a clear end, and a single effect on the world. Retries break that narrative. They turn one logical action into multiple physical executions, each capable of producing side effects.

What makes side effects different from state is that they are not reversible by default. State can sometimes be corrected. Side effects often cannot.

State lives inside the system and can sometimes be corrected.

Side effects leak outside the system and require compensation, not correction.

If a database row is written incorrectly, it can be updated later. But many side effects cannot be undone cleanly. An email cannot be unread. An API cannot be uncalled. A payment cannot simply be “taken back” without a compensating action that introduces even more complexity.

The cost of duplication is not evenly distributed across actions.

A retry does not just risk doing the same work twice. It risks affecting the world twice.

So, it becomes impossible to dismiss duplicates as rare edge cases. If retries are inevitable, and retries create multiple execution attempts, then duplicate processing is not an accident. It becomes a property of the system.

Correctness must be a property of the system

Correctness is no longer about success in a single execution. It’s about not producing redundant side effects under repeated execution. That is a much harder guarantee to provide, and it cannot be achieved by hoping retries won’t happen.

Once duplicates are guaranteed, correctness must be designed. It cannot be assumed and cannot be bolted on with logging or monitoring. It must be a property of the system, just like latency or throughput.

Now the real question is no longer whether an operation ran but whether running it more than once is safe.

This is where idempotency becomes unavoidable.

Caching: Performance vs Consistency

Surya Sathi — Mon, 12 Jan 2026 06:30:49 GMT

Every computing system, no matter how low level or high level, is constrained by the same fundamental problem: fetching data takes time. Computation itself is fast, but fetching data is slow.

A modern CPU is extraordinarily fast. But what slows a system down is not only the processing speed but the access to data. How fast can a system get the data it required to perform a computation?

Any computation that is involved with a state, must fetch the state from somewhere. Be it from Register/L1/L2/L3 cache at the CPU level, be it from memory (RAM) at code level, be it from disk (SSD/HDD) at OS level or even be it from network at higher levels.

The further you go from the place the data is needed, the longer you wait, the higher the latency is and the lower the performance is.

A way around this is to store copies of most frequently accessed/predicted to be accessed data, as near as possible. This is what’s called a cache.

If you need to know why cache is so important, you need to have a look at the physical reality of how long it takes to access data:

Storage Layer	Approximate Latency
CPU Register	~0.3 ns
L1 Cache	~1 ns
L2 Cache	~3–5 ns
L3 Cache	~10–20 ns
RAM	~80–120 ns
SSD	~100 µs
HDD	~5–10 ms
Network	1–100+ ms

Accessing RAM is roughly a hundred times slower than accessing L1 cache. Accessing an SSD is much slower than RAM. A network call is millions of times slower than a CPU instruction.

For most real-world workloads, memory and I/O latency dominates computation time. A modern CPU can execute millions of instructions in the time it takes to fetch data from across the network.

That is why caches exist.

Why CPUs cache at all?

If a CPU had to fetch every piece of data directly from RAM, most of its time would be spent idle. To avoid this, CPUs keep caches. These caches work because real programs exhibit patterns.

Programs tend to reuse the same data repeatedly, which is known as temporal locality. For example same variable in a code being accessed repeatedly.

They also tend to access data near other recently accessed data, known as spatial locality. When a CPU reads memory, it does not fetch a single byte — it fetches a whole block, predicting that nearby data will be needed soon as per spatial locality.

As programs grew more complex, a single cache was not enough. Multiple layers emerged: L1 closest to the core, L2 slightly farther, and L3 shared across cores. Each level is larger and slower than the previous one. This hierarchy exists for one reason only: to keep frequently used data as close to computation as possible.

This idea does not stop at the CPU.

The same problem appears in software systems

Once you move beyond a single machine, the same pattern appears at a larger scale. Instead of CPU trying to read from RAM, you now have applications trying to read from databases. Instead of memory access, you have disk reads and network calls.

A request that crosses a network, a database query that scans a disk is orders of magnitude slower than a computation done in memory.

So the same idea reappears: keep frequently used data closer to where it is needed.

Databases cache index pages in memory so they don’t have to reread them from disk. Operating systems cache file blocks. Applications cache computed results. Reverse proxies cache HTTP responses. CDNs cache content close to users.

Each of these is the same idea expressed at a different level.

Caching is inherently risky for consistency

Caching literally means creating a copy of data. Which means, you are inherently taking a risk that it may be stale — as the original source might get updated but the cache might not be, unless explicitly done.

Cache, ideally, should not be the source of truth.

This distinction is crucial. To understand this better, consider what happens when data changes.

A request comes in and reads data from a database. The result is cached somewhere. Later, another request modifies the data in the database — but cache is not updated at the same time. The cached copy is now wrong. Nothing breaks immediately, but the system is now inconsistent.

Caches are often eventually consistent unless designed to provide stronger guarantees. That is, given enough time and no further updates, the stale data will eventually be replaced or discarded.

A quick note: Based on the strategy used, caching can be eventually or strongly consistent, as you will see soon.

Now a cache doesn’t magically fix itself with time. Somehow, the old data has to be discarded so that new requests will hit the source of truth directly, or it has to be replaced with the newer results. How that happens depends on the caching strategy you choose.

How cache eventually becomes consistent

TTL

The simplest mechanism is time. Cached data is stored with a time limit, called a Time-To-Live (TTL). Once that time expires, the entry is discarded. The next request fetches fresh data from the source and repopulates the cache.

This is simple but effective. The system risks a known window of inconsistency in exchange for speed. The TTL can vary with use case like 60s, 15m, 1h so on.

Here caches are eventually consistent, because stale data eventually expires once you keep a finite TTL.

Explicit Invalidation

More sophisticated systems try to be smarter. When data is updated, the cache entry may be explicitly removed or replaced. This requires the system to know exactly which cache entries are affected by each write.

For example, when a user updates their profile, the service writes to the database and also deletes or updates the cache entry

This reduces staleness, but introduces complexity.

Because now the system must know which cache keys are affected and update or invalidate them reliably and also handle failures, if any, during invalidation. If invalidation fails, stale data persists.

Eviction Policies

When the cache is full, you will have to decide what to keep and what to decide. This is where eviction policies come into the picture. While TTL and invalidation decide when data becomes invalid, eviction policies decide what to remove when cache is full

LRU

Least Recently Used is a popular strategy that discards items that haven’t been used recently starting from the least recently accessed ones.

LFU

In Least Frequently Used, instead of least recently accessed, you take out least frequently accessed ones.

FIFO

This is First-In-First-Out. That is, whichever is the oldest cached data, will get evicted.

Caching Strategies

There are different ways one can choose to cache.

Cache-Aside

This is the the most commonly used strategy.

flowchart LR
    R((Request))
    S[Service]
    C[Cache]
    DB[Database]
    R --> S --> |1 - Check in cache| C
    S --> |2 - Not found in cache, query DB| DB --> |3 - Response from DB| S
    S --> |4 - Cache the response| C

Read-Through Cache

Here, the service talk only with cache, never directly with DB. That’s why it’s called read-through cache. But the problem is, cache becomes a critical part of the path here.

A real-world use case can be found here: AWS DynamoDB uses DAX, which serves results from cache - if not found, it fetches from DynamoDB and returns the result, as well as store it in cache for future use.

flowchart LR
    R((Request))
    S[Service]
    C[Cache]
    DB[Database]
    R --> S --> C --> S
    C --> |Not found in cache, query DB| DB

Write-Through Cache

Here, writes are done to cache first and the cache becomes responsible for forwarding the write to DB. Since the cache is now part of the write path, its availability directly affects system correctness.

flowchart LR
    R((Write))
    S[Service]
    C[Cache]
    R --> S --> C --> DB

Write-Behind Cache

Here, writes are done to cache only initially. DB will be updated later i.e., asynchronously. This serves extremely fast writes but comes with a serious tradeoff: if the process crashes before the database update completes, data can be lost. This trades durability for throughput.

flowchart TD
    subgraph Immediate
        direction LR
        R((Write))
        S[Service]
        C[Cache]
        R --> S --> C
    end

    subgraph Asynchronous
        W[Worker]
        DB[Database]
        W --> C
        W --> DB
    end

Refresh-Ahead

Cache serves stale data while refreshing in background before TTL expires. This is used by CDNs generally. This helps avoid cache stampedes, where once a cache item’s TTL expires, many requests simultaneously hit the DB, making it do the same query repeatedly for all those requests before the response is cached again.

Tradeoffs

Choice	Benefit	Cost
Long TTL	Very fast reads	Stale data
Short TTL	Fresher data	More DB load
Cache-Aside	Simple and mostly fault-tolerant	Can cause cache stampedes and stale reads if invalidation fails
Read-Through	Low latency	Cache becomes a critical component and consistency depends on write strategy
Write-Through	Strong consistency	Higher latency and cache becomes a critical component
Write-Behind	High throughput and low latency	Risk of data loss
Refresh-Ahead	Helps avoid cache stampedes	Increased complexity with adding background workers and risk of refreshing unnecessary data

Why systems slow down after restarts

When a cache is empty, it is called cold. Every request must go all the way to the database or backend service. Latency is high, and load spikes sharply.

As requests flow through the system, popular data accumulates in the cache. Over time, the cache becomes warm. Requests are served quickly, load drops, and the system stabilizes.

This is why systems often feel slow immediately after deployment or restart. The cache has no memory yet. Large systems often pre-fill caches with known hot data to avoid this cold-start problem. Others accept the temporary slowdown.

How to use cache effectively?

If a cache refreshes too aggressively, performance collapses. If it holds data too long or invalidation fails, you see outdated results. If many requests miss the cache at once, cache stampede occurs. Due to fast responses from cache, you might not even recognize poor database queries — which will become an issue under cache stampedes.

Which is why, you must design the system to work as efficiently as possible, without cache in mind, and add cache only as an additional layer for better performance.

Anything that requires strong correctness — financial balances, authorization, inventory counts — must not be cached. You can sacrifice performance for consistency here. Anything that can tolerate inconsistency — like home page feed, user profiles — can be cached safely.

A piece of advice I found: one must not start by thinking about what to cache but, by pointing out what not to cache.

If you add cache, your system should be faster but still be correct where it matters — don’t cache what must not be cached. And if you remove cache, your system may get slow but should still be correct without any breaks — don’t rely on cache too much.

CDN: Caching at the Edge

Until now, we’ve talked about caching within a system. But the same latency problem exists at a global scale. When users are geographically far from servers, network latency dominates everything else.

For example, a simple website might be hosted in the US, but it might have many users in India. Now, if every request for the website goes across continents to the system in US, it will cause a lot of latency. So Content Delivery Networks (CDNs) cache static assets like HTML, CSS, JS and media files, near the edge i.e., near the users, to deliver them faster.

A CDN reduces latency by reducing the physical distance. But you will need to invalidate the cache actively, if you want any changes to appear immediately.

Tradeoffs

Benefit	Cost
Low latency	Eventual consistency
Lower load on origin server	Hard invalidation
High availability	Less control (as you typically don’t control the CDN)
Global scale	Debugging complexity (as you typically don’t control the CDN)

A simple example of the risk is Cloudflare’s CDN outages that occurred recently (2022, 2025) — you can’t be confident if the issue is with your site or the CDN until the it is identified as a CDN incident.

State in a System

Surya Sathi — Mon, 05 Jan 2026 06:30:52 GMT

Assume you are making a very simple login system. A user sends a request with a username and password. Let us list down what all needs to be present in the system.

If the credentials are valid, a session should be created.
All the authorized actions by that user after login should be allowed until session expiry.
If the credentials are not valid, failed attempts counter should be incremented.
A user must not appear logged in unless they actually are.
A user must be logged in through only one device at a time.
Logout must reliably revoke access.
No user should gain access due to race conditions.
Login should be fast.
Auth checks should be cheap (they happen on every request).
System must support many concurrent users.
Requests may be retried.
State must not be corrupted by retries or races.

What is a State?

Whenever a user authenticates themselves with their credentials, and expects all the further requests to be allowed. But in order to do that system must remember that the user has logged in already, in the form of some data stored somewhere. That remembered data is called a state.

What that remembered data is will change based on the problem statement you are dealing with.

The most obvious piece of state in this system, is the session itself — some representation that the user is authenticated. This can be a session ID or a token, stored in a cache or a database record. But once you start looking closely, you will find more pieces of state. This system needs to remember a lot more data - failed login attempts, expiry times, user permissions/roles for RBAC etc.

A useful way to recognize state is to ask a simple question: if the correctness of a request depends on something that happened earlier, you are dealing with state.

Once you see it this way, state appears everywhere.

Local State

graph TD
    U[User] --> |Login request| A[Server A]
    U --> |Other request| B[Server B]

    A -->|Check| M1[Session present in Memory A]
    B -->|Check| M2[Session not present in Memory B]

    A -->|Accepted| U
    B -->|Rejected| U

Initially, imagine all of this running on a single server. When a user logs in, the server stores the session in memory. When the user makes another request, that same server checks its memory and validates the session. Everything works because there is only one place where truth lives.

Problems begin when we introduce horizontal scaling.

If there are multiple servers, requests can land on any of them. If session state remains stored in the memory of the server that handled the login request, other servers will not see it. This shows the major limitation of local state.

Local, in-memory state only works when there is exactly one server handling all requests. Once traffic can land on multiple servers, state must move somewhere shared.

Shared State

graph TD
    U[User] --> |Login request| A[Server A]
    U --> |Other request| B[Server B]

    subgraph Shared State
        DB[(Session Store)]
    end

    A -.-> |Stored| DB
    B -.-> |Retrieved| DB

    A -->|Accepted| U
    B -->|Accepted| U

So we move session state into a database or a distributed cache. Now every server can read and write the same session data. This fixes the immediate correctness issue, but it introduces a new set of tradeoffs.

Accessing shared state requires network calls instead of memory access. These calls are slower, can fail, and can return outdated information depending on how the storage system works. The system is now relying on an external component to provide a consistent view of state.

Statelessness

At this point, application servers are often described as stateless. What this really means is that servers do not own authoritative state in their own memory or disk. Any server can handle any request because all necessary state lives elsewhere.

Statelessness doesn’t mean that state doesn’t exist but that it exists somewhere else — databases, caches, or other external systems.

Correctness: Consistency and Order

An important consideration that emerges is reads vs writes. Most of the requests simply check (aka read) the state — Is this session valid? Does this user have permissions?

But writes happen less frequently — session creation happens once when you login, session deletion happens once when you log out.

This distinction matters because reads happen more often and are also easier to scale. But when dealing writes the question of correctness appears.

Now assume that the state is also scaled across multiple servers to maintain availability — just like your login application.

Strong vs Eventual Consistency

Consider a user logging in. A server validates credentials and writes a new session to the database. The database confirms the write. Immediately after, the user sends another request that lands on a different server, which reads the session from the database.

The system must answer a precise question: after a successful login, should other servers be able to observe that session almost immediately? Or is it okay, if the updates are slightly delayed?

If the database guarantees that once it acknowledges a write, all subsequent reads will observe it, then the system provides strong consistency. In this case, the database ensures that the write is fully committed before responding, and that any server reading afterward sees the same result.

But what mostly happens is, databases acknowledge a write but some replicas may still return stale data for a short period. The difference lies in how much coordination the database system enforces before acknowledging a write. This is called eventual consistency.

This guarantee of strong consistency, is not free. Internally, the database must decide when a write is considered complete. If data is replicated, the database must decide whether to wait for all replicas, some replicas, or just one before acknowledging success. If one replica is slow or unreachable, the database must decide whether to wait, reject the write, or proceed anyway. These decisions affect both correctness and latency.

And for eventual consistency, whether it is acceptable depends on the system’s requirements. For login and authentication, immediate visibility is often expected. For other types of data, short delays may be acceptable.

Order

sequenceDiagram
    participant C1 as Client 1
    participant C2 as Client 2
    participant DB

    C1->>DB: Login
    C2->>DB: Logout
    DB-->>C1: OK
    DB-->>C2: OK

Now imagine a user clicks login on one device and clicks logout of all devices almost immediately on another device. These two operations may be handled by different servers and reach the database close together in time. The final state depends entirely on the order in which the database applies these updates.

If the system applies the login write first and the logout write second, the user ends up logged out. If it applies them in the opposite order, the user ends up logged in. Both requests are valid, but the outcome depends on ordering.

In this case, the application servers may not be in charge of this order. Maybe the database does. But whichever is in charge, it must impose a single, consistent sequence on concurrent writes to the same piece of state so that all readers observe the same result.

Locking

This ordering problem becomes even more visible when multiple requests attempt to modify the same data concurrently. Assume multiple failed login attempts incrementing a counter at the same time.

If these updates are applied without control, one update may overwrite another, leading to incorrect counters. To prevent this, locking is one of the ways that databases use apart from optimistic concurrency and version checks. Regardless of the mechanism, the goal is the same: ensure conflicting updates do not silently overwrite each other. When an update is being done to a record, any other updates to that record must wait or fail. This ensures that updates are applied one at a time and in a well-defined order.

Locking is not something the application servers coordinate explicitly. It is enforced by the database as part of managing shared state correctly.

Transactions

Often, a login operation involves multiple related updates. A session is created, failed login counters are reset, and timestamps are updated. If one of these changes succeeds and another fails, the system ends up in an inconsistent state. To prevent this, databases provide transactions, which allow a group of changes to be applied together or roll back applied changes if any one of them fails.

Failure is in POV

Suppose a login request reaches server, validated, session created but the response is lost due to some network issue. But the system has no way of knowing once the response leaves its area, so it regards it as request succeeded.

Now, if you look at it from system’s POV:

Session created → Succeeded
Session not created → Failed

If you look at it from client’s POV:

Received success response → Succeeded
Received failed response or didn’t receive one at all → Failed

But here, the request succeeded from the system’s view and failed from the client’s view at the same time.

Retries

Now user sends another login request:

If the system blindly creates another session in DB, there will be duplicate sessions created if its a valid request
Failed attempts counter will be incremented if the request fails while there is already a valid session.

To avoid this, operations must be designed so that retrying them does not cause incorrect state. This property is called idempotency.

An idempotent operation can be executed multiple times safely. The result is the same as if it were executed once.

In this use case, this means, when a user retries, the system must recognize repeated requests, should either return an existing session or invalidate the existing one and create a new session.

Considering Transactions

So far, transactions have helped us keep related changes consistent. But transactions have limits.

They work well when:

All data lives in one database
The database can enforce locks
Failures are rare and short-lived

They struggle when:

Data spans multiple systems
Systems fail independently

Imagine your login flow now touches:

A database like Postgres
A cache like Redis
A rate-limiting service
An audit log

If any one of these fails midway, a transaction that spans all of them becomes slow, fragile, or impossible.

So you will be forced to either accept that system can be inconsistent for a while before eventually reaching consistency, or design the system to deal with this chaos for strong consistency.

Performance vs Correctness

Every decision we have discussed so far, has a cost:

If you want correctness, where there is:

immediate visibility of writes
strict ordering
transactional guarantees

Then your system must:

Wait for confirmations
Reject or delay requests

This means a login request may block longer, fail more often during outages within the system, or require retries more often — all of which affect user experience.

If you want better performance with:

Faster reads
Faster writes

You have to accept:

Data may be outdated sometimes
Some clients receiving new data and some receiving older data.

So you are trading off consistency.

Find something hidden so far

If you look at all the decisions so far — deciding upon the order of writes, checking the availability of replicas, replicating data across them, order, locking, transactions — the database has been making some decisions for correctness. This is what we call coordination.

But as systems grow and you stop depending only on a database and start including other components like caches or even other types of databases, your system’s state gets distributed — different parts of state reside in different places.

That means, you can’t depend on any external component for ensuring correctness of operations.

State is unavoidable and when you have states distributed, you must coordinate between them. This is the moment when coordination stops being implicit and becomes explicit.

The question now is not “how do we scale?” anymore but “how do we handle coordination among different components in the system?”

Scaling & Load Balancing

Surya Sathi — Mon, 29 Dec 2025 06:30:34 GMT

If you have a single road, the 100 cars might take 5 minutes on average to cross the road. The throughput here is 20 cars/minute and latency is 5 minutes.

But if you have 5 roads, they might take only 2 minutes on average. So the throughput increased to 50 cars/minute and latency reduced to 2 minutes.

Adding roads increases both throughput and reduces latency because it increases capacity.

We ended the Latency vs Throughput article with this example. It is a bit simplified and you will notice that soon. Anyways, now you can interpret this example as one of the two things:

Adding more physical hardware like more CPU cores, memory etc. to a single server, so you get more and faster queues to process the requests. Or,
Adding more identical servers, so if a server is already under load, another server will take up new requests.

So bottom line, scaling increases system capacity. So under load, you get reduced latency with scaling.

This is the core idea of scaling. Whichever scaling method you choose, you are basically trying to widen the bottleneck to reduce contention. In vertical scaling, you will be making a single server more bulky, to be able to handle more requests and in horizontal scaling, you will be adding more servers and distribute the requests between them.

What Vertical Scaling Is

You will be generally improving your server in one of the following ways, core idea being reducing latency and increasing throughput:

If your application is CPU heavy, you will be increasing the CPU cores for more parallel execution of incoming requests.
You might be increasing the RAM, if the computation of more requests needs much more memory than what’s available.
In case, it is disk I/O heavy, you will be updating to faster SSDs to reduce request processing time.

Benefits

Since only one server will be communicating with your databases, you will likely have fewer data inconsistency issues.
And this is the only server that will be serving the requests, you can often simplify state management by keeping it local.
Using in-memory cache and disk for storage, you won’t be making the, comparatively slower, network calls to cache and databases.

Tradeoffs and Limits

Obviously, you have a single point of failure. If this system goes down, all the incoming requests will fail.
As you keep adding multiple processes and threads, you have to be careful of memory locks and database atomicity to prevent race conditions.
You can scale hardware only so much before hitting either physical limit or budget limit. As you add more and more of RAM, disk or a better CPU, your costs start increasing non-linearly.

Essentially, a bigger machine does not eliminate contention. Instead you are taking on the unsolved risk of single point of failure, need to robustly test the code to prevent race conditions and mainly every growing cost to increase physical limits to keep up with more and more requests. And beyond a point, vertical scaling hits diminishing returns due to shared resources and serial execution paths.

This makes horizontal scaling not a choice, but a necessity.

What Horizontal Scaling Is

Instead of keeping on bulking up one server, you will add more and more regular servers to handle more requests. So instead of one big queue, you have many smaller queues. This drops the latency per node and increases overall throughput.

Now, in order to distribute the requests between these servers, you will use a load balancer.

Load Balancer

Since you have multiple servers and each of them will have their own IP address, how will you choose which IP to map your DNS domain to?

Simple. Add another server in front of them and map its IP. And when requests hit this mediator server, you will route them to your servers which will be actually serving the requests, get the response and send it back. This mediator is called the load balancer. Because this will be balancing the load among the servers by routing the requests.

graph TD
    R1((Request 1))
    R2((Request 2))
    R3((Request 3))
    R4((Request 4))
    R5((Request 5))
    LB{Load Balancer}
    N1[Node 1]
    N2[Node 2]
    N3[Node 3]
    N4[Node 4]
    N5[Node 5]
    R1 --> LB
    R2 --> LB
    R3 --> LB
    R4 --> LB
    R5 --> LB
    LB --> N1
    LB --> N2
    LB --> N3
    LB --> N4
    LB --> N5

But how will you route the requests?

Will you choose randomly one of the servers? Or maybe you can send first request to server 1, next one to server 2, and the next one to server 3 etc. This strategy is called Round Robin. Or maybe keep track of which server is serving how many number of requests and route new requests to the one with the least load. This strategy is called Least Connections. These different ways give you different ways of balancing the load.

Apart from this, this load balancer will also keep track of health status of each server, typically by calling a health API endpoint you will be exposing, to route the requests to only the healthy ones.

Benefits

If one of your server fails, your load balancer will just route it to another server. Essentially, giving your service no down time.
Theoretically, you don’t have the same physical limits as vertical scaling. You can increase as many nodes as needed to handle more and more requests.
Beyond a certain scale, cost wise, horizontal scaling becomes more viable than vertical scaling.

New Problems with Horizontal Scaling

Horizontal scaling means multiple machines working together. The moment you do that, you enter the world of distributed systems, where new classes of problems appear.

I/O → Network

In a single machine, different services communicate with function calls. They share memory and storage. But once you scale, you often introduce a central cache and memory in order to maintain consistency.

Now they are no longer, internal calls to RAM or disk but network calls to cache and database servers, increasing latency from microseconds to milliseconds. Add to that there will be retries, exponential backoffs in case of failed network calls to those services.

Now the latency won’t be just because of computation but because of communication as well.

Partial Failures

When you have multiple nodes, one of them can be slow, one might be down, one might be good, one might already be overloaded. You can’t assume that all requests will succeed and need to account for retries, timeouts.

And when multiple services depend on each other, you must also account for cascading failures and be careful not to have a single point of failure.

Inconsistency in data

Now not just you but databases also implement horizontal scaling. Now look at these scenarios keeping that in mind.

Since data is outside of a server’s responsibility, updates can take time to propagate and replicate across machines of database, resulting in one person seeing updated data and one seeing outdated data.

If some of your nodes are slower, updates to data might go out of order.

If a person updates and another requests data, update might end up happening later than returning data resulting in outdated views.

So, in practice, many systems relax strong consistency to improve availability and scalability. But some systems might prefer strong consistency rather than availability — for example, banks. The choice depends on your use case and industry.

Load Balancing Tradeoffs

Load balancing doesn’t split traffic evenly.

Round Robin strategy, by default assumes that all servers are equally fast, and spreads them evenly only by count, so each server gets equal number of requests.

But counting requests is not the same as measuring load. Slow nodes become bottlenecks. If the event loop is filled on a slow node, it will keep the remaining requests in queue until the event loop gets free. This is called head-of-line blocking. In this case, latency explodes for further requests on that node.

Least Connections strategy is slightly better but even it doesn’t know about how much resource consumption that request leads to.

For example, a /health and a /orders/123 requests don’t really give any context to the load balancer, but a /health is not a heavy call and gives an immediate response but /orders/123 can result in a DB query along with auth checks on DB side which might be slower and more resource intensive but at the same time a /orders/456 might actually give the result from a DB cache itself resulting in a faster response. So if node A gets 2 cheap requests and node B gets 1 expensive requests, node B is not loaded enough in the eyes of the load balancer as it has only one active connection. So it will route the next connection to node B, which might turn out to be an expensive one.

A load balancer doesn’t know about any downstream services as well. In the same example, if the DB is not reachable, the node will keep on trying until it hits retry limit. Add that with an exponential backoff, resulting in a connection living longer on that node. But load balancer will keep sending it more requests, which will be just waiting in a queue until the previous requests are freed from the event loop, adding latency.

Conclusion

Now what would one want to use in a real system?

Since nobody likes a single point of failure, you would go with horizontal scaling generally but you don’t take very minimal nodes and scale them recklessly. Instead you find a sweet spot, a hybrid, between horizontal and vertical scaling to decide upon the hardware specs of each node in a node group based on your budget and latency requirements.

State - A Subtle Shift Horizontal Scaling Forces

Horizontal scaling looks simple as long as each server can handle requests independently — an assumption that rarely holds in real systems.

Once traffic can land on any node, state — things like sessions, counters, or cached data — that once lived comfortably in a single machine — in memory or on disk — can no longer be relied on. A request handled by one server may need information that was created or updated by another.

At this point new questions arise. Do we maintain the same state in all the machines somehow? Or do we completely avoid state? Or should we store the state somewhere externally and retrieve it when needed? But then how confident can we be about consistency?

This is where horizontal scaling in systems stops being just about adding servers and becomes a question of how state is managed across them.

I will explore these questions in my next article.

Latency vs Throughput

Surya Sathi — Tue, 23 Dec 2025 07:07:51 GMT

Background

Assume your browser made a request for a webpage. It flows like this:

flowchart TD
    subgraph Client
        CApp[Application]
        COS[OS]
        CNIC[Network Hardware]
        CApp --> COS --> CNIC
        CNIC --> COS --> CApp
    end

    subgraph Network
        WR[Wi-Fi Router]
        ISP[ISP Link]
        IR[Internet Routers]
        CNIC --> WR --> ISP --> IR
        IR --> ISP --> WR --> CNIC
    end

    subgraph Server
        SNIC[Network Hardware]
        SOS[OS]
        SApp[Application]
        IR --> SNIC --> SOS --> SApp
        SApp --> SOS --> SNIC --> IR
    end

When an application wants to communicate using TCP, it asks the OS to create a socket. The OS creates a socket data structure that holds state and buffers. When the application writes data, the data goes into the socket’s send buffer. The OS networking stack breaks this data into packets and places them into kernel network buffers.

The network hardware (for example, a Wi-Fi card) pulls packets from these buffers and transmits their bits sequentially over the physical link (Wi-Fi radio channel). Because the link can only transmit one stream of bits at a time, packets from different applications and sockets are serialized and wait in queues.

Each packet is received by the Wi-Fi router, queued, and forwarded over the ISP’s fiber optic link. The packets pass through multiple routers on the internet, where they may again be queued and delayed, until they reach the server.

As packets arrive, the server’s OS acknowledges them and delivers the data to the server application. Acknowledgments travel back along the reverse path (may not be same intermediate routers). While acknowledgments are returning, the client continues sending more packets.

The server begins processing the request before all data arrives. When it generates a response, that response is broken into packets again and sent back through the same sequence of links, queues, and buffers in reverse, until the client receives it.

Meanwhile just like you, many different people make these requests, so all these requests are accommodated by the server by spreading them over through event loops, threads and processes (to understand these topics, please check previous articles).

Bottlenecks

Now if you see the above flow, there are several bottlenecks:

The link from your system’s Wi-Fi card to your Wi-Fi router.
Your Wi-Fi router to your ISP fiber optic cable
Internet to the server’s network hardware
Inside the server, how it manages multiple requests

Definitions

Latency is the end-to-end time between sending a request and receiving the response, including network delay, queueing, and server processing.

Network Throughput is the amount of data that can be sent over a unit of time. In our case, to measure a server efficiency, you can think of it as, amount of requests that can be handled by the server in a unit of time, call it Server Throughput.

How latency occurs

From a client POV, the latency is mostly because of the hardware limits. Even if your OS and applications can handle thousands of requests simultaneously, at the end they have to be sent through your system’s network layer that can send only a finite amount of data at a time. So after your OS packages the data, your network layer will serialize the packets and sends them one by one. As a result, when there is more data, it will be automatically queued in buffers, which means there is a waiting time, meaning there is a latency.

Now, even if you are sending just one unit of data, it still needs to travel over networks to reach server, then server needs to compute and send a response which again goes through networks to reach your system. which will always take some amount of time, at least in milliseconds. So latency will always be there.

So we don’t try to remove latency, it is impossible, instead we try to reduce latency.

What about throughput

Now if you want your request data to be sent over as quickly as possible, logically you want your buffer queue to be empty so that the request is handled immediately. But if your queue is always handling as less data as possible, it means you are sending very low amount of data per second, which means very low throughput. But since you don’t want your hardware not be used to its full efficiency, you want your queue to be filled, meaning some requests needs to wait in the queue before being sent, meaning higher latency. So you notice the pattern:

flowchart LR
    subgraph Client[Single Client on Network]
        direction LR
        LT[Low Network Throughput]
        EQ[Empty Queue]
        NW[No Wait Time]
        LL[Low Latency]
        LT --> EQ --> NW --> LL
    end

flowchart LR
    subgraph Client[Single Client on Network]
        direction LR
        HT[High Network Throughput]
        FQ[Filled Queue]
        WT[Wait Time]
        HL[High Latency]
        HT --> FQ --> WT --> HL
    end

Think of a network as a road. Number of cars that can pass through per a unit of time is called the network throughput. How long a car takes to go from start to the end is called the latency.

Now it might seem like latency and throughput are proportional, but actually they aren’t. The difference comes from the perspective. You see, from your system’s perspective this is the case when it has a network to itself. But look at it from a shared network’s perspective.

If your Wi-Fi is shared by 5 devices, all 5 devices can’t send data to Wi-Fi at the same time, since all of you share the same physical medium even though you have 5 different logical links. So while one device sends data, the others have to wait for their turn. The same happens when you are downloading as well, while your device is receiving its packets, the other device needs to wait to receive its packets.

So ultimately, when you have more and more devices sharing a network, the throughput decreases per device, and the latency will increase. This is the reason why if someone on you Wi-Fi network downloads a movie, your YouTube video’s quality drops.

Similarly, no matter how much of data is flowing between your Wi-Fi router and your systems, the throughput of the ISP’s fiber optic cable to your router will again limit how much data can flow.

Add to that, as utilization approaches the capacity of a link or server, queueing delay increases non-linearly, which is why latency can suddenly spike even when throughput only increases slightly.

flowchart TD
    subgraph Network[Shared Network]
        direction LR
        MD[More Devices]
        LT[Low Throughput per device]
        WT[Higher Wait Time]
        HL[High Latency]
        MD --> LT --> WT --> HL
    end

    subgraph ISP
        direction LR
        R[Router]
        FO[Fiber Optic Cable]
        PCT[Pre-configured Throughput]
        IWT[Wait time before entering Internet]
        L[Latency]
        R --> FO --> PCT --> IWT --> L
    end

    Network --> ISP

Role of a server in latency and throughput

Now if a server receives just a single request, it will process it immediately and send a response. But if it gets multiple requests at a time, based on how the code was written, it will distribute among its event loops, threads and processes to try to handle them concurrently, and if possible, parallelly as well. This concurrency make it seem like they are being handled simultaneously and moderately reducing the response time of a request, thereby directly affecting the latency.

But if it gets overwhelmed by requests and its hardware limits of threading are reached, the requests will be queued and thereby wait time increases, increasing latency. Now here, number of requests it can handle per a unit of time is called the throughput.

flowchart LR
    subgraph Server
        direction LR
        CP[Efficient Concurrency and Parallelism]
        HR[Higher number of requests]
        HT[High Throughput]
        LRT[Lower Response Time]
        LL[Low Latency]
        CP --> HR --> HT --> LRT --> LL
    end

How to maximize throughput and reduce latency

Similar to the previous traffic example, say you have a 100 cars.

If you have a single road, the 100 cars might take 5 minutes on average to cross the road. The throughput here is 20 cars/minute and latency is 5 minutes.

But if you have 5 roads, they might take only 2 minutes on average. So the throughput increased to 50 cars/minute and latency reduced to 2 minutes.

Adding roads increases both throughput and reduces latency because it increases capacity.

You can increase the throughput of a single server by increasing its hardware limits, which will let it handle more requests simultaneously, reducing latency. But there’s only so much you can do that way.

Here’s where the vertical scaling vs horizontal scaling comes into the picture, which I will cover in the next article.

From Your System to the Cloud

Surya Sathi — Wed, 27 Aug 2025 06:30:00 GMT

Imagine you’ve built a brilliant new web application. It runs flawlessly on your laptop. But now comes the harder question: how do you make it available to the world?

This is the central problem of application deployment, one that has evolved dramatically over the last two decades. In the past, teams bought and managed physical servers — a slow, capital-heavy, and inflexible process. Today, cloud providers like AWS, GCP, and Azure offer abstraction layers that let you focus less on hardware and more on your code.

Broadly, there are three primary ways to package and run your applications in the cloud: Virtual Machines, Containers, and Serverless Runtimes. Each represents a step in the journey toward higher abstraction and efficiency.

Virtual Machines: The Starting Point

A VM is essentially a full emulation of a physical computer — complete with its own operating system (OS), CPU, memory, and storage. A single physical host can run many VMs, each isolated from the others.

Before the rise of microservices and rapid deployment, Virtual Machines (VMs) were the default way to run applications in the cloud.

Cloud providers still offer VMs as their most fundamental compute service:

AWS offers Amazon EC2 (Elastic Compute Cloud), where you launch instances based on Amazon Machine Images (AMIs).
GCP offers Google Compute Engine (GCE).
Azure offers Azure Virtual Machines.

Instance types (e.g., AWS’s t3.micro vs. c5.4xlarge) let you tune CPU, memory, and networking for your workload. You get full control over the OS and software stack — great for workloads that need custom environments.

Tradeoff: That control comes at a cost. VMs are heavy, slow to boot, and resource-inefficient compared to newer options. If you need fast scale-up or scale-down, VMs often lag behind.

Containers: No more "it works on my machine" (mostly)

The limitations of VMs drove adoption of containers.

A container packages an app and all its dependencies into an isolated unit. Unlike VMs, containers share the host OS kernel, which makes them dramatically lighter and faster. You can spin up thousands of containers in seconds.

Instead of having a full OS for each instance, containers run on a shared kernel, making them incredibly lightweight, portable, and fast to start.

Think of it like this: a VM is a house with its own foundation, walls, and plumbing, while a container is an apartment within a building. The apartment shares the building's foundation and common utilities but is completely separate from other apartments. This shared-resource model allows for much more efficient resource utilization.

This efficiency underpins microservices architectures and modern DevOps practices.

A key player in this container revolution is Docker.

Docker: Standardizing Containers

Docker is an open-source platform that standardizes how applications are packaged and distributed using containers. It lets you define the environment (libraries, dependencies, config) alongside your application code using a Dockerfile. From this file, Docker builds an image - a blueprint containing everything needed to run the app.

When this image runs, it becomes a container.

Because the container includes the full runtime context, it behaves consistently across environments — solving the classic “it works on my machine” problem (for the most part) because the containerized application will behave identically in any environment where Docker is installed, from a developer's laptop to a cloud server.

Docker builds images in layers, allowing shared caching between images. This reduces build times and storage — a key performance optimization. Advanced Docker topics like volume mounts for persisting data outside containers, multi-stage builds to reduce image size by separating build and runtime dependencies, and caching strategies offer further optimization, depending on your deployment needs.

While Docker makes it easy to create and run a single container, what happens when you need to run hundreds or even thousands of them? This is where container orchestration comes in, and the industry standard for this is Kubernetes (often abbreviated as K8s).

Kubernetes: Container Orchestrator

Kubernetes is an open-source system that automates the deployment, scaling, and management of containerized applications. It acts as a conductor for your containers, ensuring that your application is always running smoothly.

Imagine you're managing a fleet of delivery trucks (your containers). Docker provides you with the standardized trucks and the ability to load them. However, you'd need a dispatcher to manage the entire fleet: to decide which trucks go where, to replace a broken truck with a new one, to add more trucks during a busy season, and to ensure all trucks are working.

Kubernetes is that dispatcher. It groups containers into logical units called Pods, schedules them to run on available machines (Nodes), performs automatic rollouts and rollbacks for updates, and provides a way for containers to communicate with each other. This level of automation is what enables the massive scale and resilience of modern cloud-native applications.

But K8s adds operational complexity and it has a bit steeper learning curve. And cloud providers try to reduce that pain by providing managed services.

Containerization in Cloud

AWS offers several container services, most notably Amazon EKS (Elastic Kubernetes Service) and Amazon ECS (Elastic Container Service). ECS is AWS's own container orchestration service, which is simpler and deeply integrated with the AWS ecosystem.
GCP also offers GKE (Google Kubernetes Engine) for using Kubernetes and Cloud Run for deploying individual containers.
Azure offers AKS (Azure Kubernetes Service).

Serverless: Somebody else's server

For certain types of applications—specifically, small, event-driven functions—even containers can be overkill. This is where runtimes come into play through a concept known as Serverless Computing.

A runtime is a language-specific environment (e.g., Node.js, Python, Java) that executes your code. In a serverless model, you simply upload your code, and the cloud provider handles everything else: provisioning a machine, providing the runtime environment, and scaling the function based on demand. You don't have to think about servers, VMs, or even containers.

AWS offers AWS Lambda.
GCP offers Google Cloud Run Functions
Similarly, Azure offers Azure Functions.

Serverless functions are triggered by events, such as an HTTP request, a file upload to an S3 bucket, or a message in a queue. You only pay for the compute time your code uses, down to the millisecond.

Runtimes are ideal for stateless, short-lived, event-driven tasks like processing data from a database, resizing an image after a user uploads it, or handling API requests for a simple backend.

But serverless isn't a free lunch. Cold starts (delay when a function is invoked after being idle), limited execution time, and observability challenges can complicate performance tuning and debugging.

Conclusion

The journey from a local application to a global-scale service is now relatively simple, thanks to cloud providers like AWS, Azure and GCP.

The problem of getting your code to run reliably and at scale has been solved through different levels of abstraction. While Virtual Machines offer the most control and isolation, Containers provide a more agile and efficient alternative, and Runtimes represent the peaks of abstraction and cost efficiency for event-driven tasks.

The choice depends on your specific needs, but a modern developer now has great tools to deploy their applications, no longer worrying about the physical hardware but instead focusing on what they do best: writing great applications.

Food for Thought

While the cloud offers scalability and agility, it's not the ideal solution for every problem. The convenience of "pay-as-you-go" can lead to unpredictable costs, and the most cutting-edge technologies aren't always the most efficient for your specific use case.

Some organizations with predictable, high-volume workloads have found that the long-term cost of public cloud services exceeds the cost of managing their own data centers. This has led to a trend known as cloud repatriation.

Similarly, even within cloud, there are tradeoffs. For example, adopting a serverless architecture, while good for some applications, can become expensive and complex for certain types of workloads.

In 2016, Dropbox moved a significant portion of its data storage from AWS back to its own on-premises infrastructure to gain more control and reduce costs. You can read more about their decision and the technical details of their move here:

The Amazon Prime Video team detailed their decision to re-architect from a serverless, microservices-based system to a monolithic application, which resulted in a 90% cost reduction. You can find the summary of their case study here:

So, how do you decide?

Use VMs when you need full OS control, custom environments, or legacy workloads.
Use Containers for microservices, DevOps pipelines, and scalable web apps.
Use Kubernetes when you need container orchestration at scale and can afford the operational complexity.
Use Serverless Runtimes for event-driven, stateless, bursty workloads where cost and simplicity matter more than control.

Each layer of abstraction trades control for convenience.

Modern developers don’t have to worry about racking servers in a datacenter anymore but they do need to think critically about which cloud model actually fits their application. Because convenience at the wrong scale can be more expensive than running your own servers.

When Can You Break Software Design Principles?

Surya Sathi — Wed, 20 Aug 2025 06:30:00 GMT

You must have heard of SOLID, DRY and KISS design principles somewhere at some point. If you don't know what they mean let's go through now. But firstly, they are design principles and not rules - meaning, you are not strictly required to follow them, but in general, it is considered that they make the program more readable and maintainable code that can be understood easily, without repetition and also be open to extensions without severe breakage.

Uncle Bob's SOLID Principles

Imagine you are building with LEGOs. The SOLID design principles are like good practices for your LEGO pieces and how you connect them, so your final creation is strong, easy to modify and doesn't collapse if you change one block.

S: Single Responsibility Principle (SRP)

A class should have only one reason to change.

Let’s say you have a User class. Its job should be to manage user-related data—like name, email, etc. It should not be responsible for sending emails.

Why? Because:

Sending emails is unrelated to the core idea of what a user is.
If the way you send emails changes, your User class would have to change too—introducing unnecessary fragility.

Instead, you’d have a User class and a separate EmailSender class. Each has one job. Now, if the email logic changes, only EmailSender needs updating—User stays untouched. That makes your code easier to understand, test, and maintain.

O: Open/Closed Principle (OCP)

Software entities should be open for extension, but closed for modification.

Imagine you’re building a house. You want to add a new room later without tearing down the whole structure, right?

OCP means: You should be able to add new behavior to code without changing existing working code.

For example, suppose you’re calculating the area of shapes. If you have an AreaCalculator with calculateCircleArea(), calculateSquareArea() like that, you will be needed to change this to add a calculateTriangleArea() method later. Instead you can have a Shape interface with the method area(). You create Circle, Square, and Triangle classes that each implement Shape.

An AreaCalculator can now use shape.area() without caring about which shape it is. If you add a Hexagon, you don’t modify the calculator—just implement a new Hexagon class.

The core logic stays stable. New functionality gets added on top, not inside old code. This reduces bugs and makes extension easy.

L: Liskov Substitution Principle (LSP)

Subclasses should be replaceable for their base classes without breaking the app.

Suppose you have a Vehicle class, and a Car extends it. Any code expecting a Vehicle should be able to work with a Car without issues.

This isn’t just about types—it's about behavioral compatibility. A subclass should honor the same “contract” as the parent class. A child class's implementation should not be stricter or restricting.

Bad LSP: If Bird has fly(), and Penguin extends Bird, but Penguin.fly() throws an error—your inheritance model is broken. Penguin isn't substitutable for Bird.

Use inheritance only when it makes logical sense. If the behaviors differ significantly, consider composition over inheritance. If a subclass violates expectations of the base class, the base class might be modeling the wrong abstraction.

“At its heart, LSP is about interfaces and contracts, and when to extend a class versus using another strategy such as composition to achieve your goal.” – A Stack Overflow answer

I: Interface Segregation Principle (ISP)

Don’t force classes to implement methods they don’t use.

Think of a TV remote with 30 buttons: HDMI, lights, fans, Blu-ray controls, volume, channels. But if you only ever use volume and channels, the rest is just clutter.

ISP says: Split large interfaces into smaller, specific ones.

If you have a TaskWorker interface with:

startTimer()
reportProgress()
sendEmailNotification()
printDocument()

…but a class only needs the first two, it shouldn’t be forced to implement email and printing methods too. Instead, split into:

WorkerInterface
EmailNotifierInterface
PrintableInterface

This improves flexibility, reduces unnecessary dependencies, and keeps things clean and understandable.

D: Dependency Inversion Principle (DIP)

High-level modules shouldn’t depend on low-level modules. Both should depend on abstractions.

Abstractions shouldn’t depend on details. Details should depend on abstractions.

Suppose you have an OrderProcessor that:

Uses a PayPalPaymentGateway to process payments.
Uses a PostgreSQLOrderSaver to save the order.

If you later switch to GooglePay and MongoDB, you’d have to modify OrderProcessor directly. That’s tight coupling.

Instead:

Create an abstract PaymentGateway interface with process_payment().
Create an abstract OrderSaver with save_order().

OrderProcessor should depend only on these abstractions.

Then you do:

PayPalPaymentGateway, GooglePayGateway → both implement PaymentGateway
PostgreSQLOrderSaver, MongoOrderSaver → both implement OrderSaver

Now, you can swap implementations easily—without touching business logic. This promotes loose coupling, testability, and flexibility.

DRY: Don't Repeat Yourself

This principle is probably the easiest to grasp and incredibly important.

It's pretty self-explanatory: every piece of knowledge must have a single, unambiguous and authoritative representation within a system.

Think about writing an essay. You don't write the exact same paragraph over and over again in different sections. Instead, you'd write that paragraph once and then refer to it later.

In programming, this means if you have the same piece of code (like a calculation, a validation rule, or a way to format data) appearing in multiple places, you should extract it into a function, a method, or a class, and then call that single piece of code wherever you need it.

But sometimes excessive usage of DRY can reduce readability. Also sometimes you may find yourself extracting similar looking code into same utility function, though they should be independent. Do keep that in mind.

KISS: Keep It Simple, Stupid

This is a classic and very practical advice.

The idea is that simplicity should be the key goal in design and unnecessary complexity should always be avoided.

When you're designing or writing code, avoid the urge to over-engineer. It can also be applied to the naming conventions, logics of functions, comments, almost everywhere. It can also be stated as - when even someone stupid looks at your program, they should still be able to understand what it's doing.

This also means that, don't spend a lot of time making something super-efficient if it's not a bottleneck right now. Don't use a complex design pattern if a simple function call will suffice. And if you find a part of your code becoming overly complex, break it down into smaller, simpler pieces.

KISS says that simple code is easier to understand, easier to debug and less prone to error. And when you yourself or someone else has to work with your code later, it makes both your lives easier.

YAGNI: You Aren't Gonna Need It

The name says it all. Do not add functionality until it's actually required.

Think of it like packing for a trip. You might be tempted to pack all sorts of "just in case" items. Before you know it, you have a massive, heavy suitcase, and you probably won't use half the stuff in it.

It's the same in software development. It's about resisting the temptation to add code or features that you think you might need in the future, but aren't explicitly required right now for the current problem you're solving.

Adding extra code will waste time and effort, it is harder to understand later, bloats the code base and increases the risk of bugs just because you made assumptions about how the software will evolve without concrete requirements. So always follow YAGNI unless you know with high certainty that extra work will be helpful in the near future.

How to know when to violate a principle

"Pragmatism over Purity"

Before you ever deviate, ask yourself:

What problem does violating this principle solve right now? (e.g., "It saves us 2 days of development time for a critical deadline.")
What are the long-term consequences of this violation? (e.g., "The code will be harder to change in this one spot, but we'll likely rewrite this module anyway in 2 or 3 months.")
Are these consequences acceptable given the current situation? (e.g., "Yes, because missing this deadline means losing the client.")

If you can't articulate clear, justifiable answers to these questions, then you probably shouldn't violate the principle.

When can you violate SOLID?

When responsibilities are tightly coupled and always change together in smaller applications, you may break SRP.
For extending very small and stable modules, the overhead of introducing complex abstractions for trivial changes often outweighs the benefit of changing the module. If the code is already simple, and the change is small and unlikely to break anything else, you may break OCP.
Also when there is a fundamental change in a core component, it is not an extension but re-architecture. You will need to break OCP.
When you implement interfaces from third party libraries, you may need to implement methods you don't use, potentially breaking ISP.
You may also need to calculate the risk of breaking either YAGNI or OCP or find a balance in the initial stages of development as YAGNI says don't build for future while OCP says keep the future in mind when building.
You may also break all of SOLID principles during rapid prototyping with throwaway code.

When can you violate DRY?

Sometimes, two pieces of code look identical now but are logically independent and will likely evolve differently. So you write it twice, and if the third time the logic is identical, then you abstract.
If extracting a very small, simple piece of logic makes the code harder to follow and it is clearly understood inline, then repeating it might be more readable.

When can you violate KISS?

Only when it is performance critical. Otherwise, always try to follow it.

When can you violate YAGNI?

When you know a future requirement with high certainty.
When changing things in future takes considerably more effort than doing it now.

Final Thoughts

Like already said before, these are principles - not strict laws. They’re tools to help you write better software.

SOLID helps you avoid fragility and rigidity

DRY helps you avoid redundancy

KISS helps you avoid complexity

Try to understand them. Apply them when they make sense. Be cautious when breaking them.

Why is OOP criticized?

Surya Sathi — Wed, 13 Aug 2025 06:30:00 GMT

We have previously talked about Object-Oriented Programming when discussing about why so many programming languages exist in this article:

Why are there so many programming languages?

That it was originally introduced in CPP and gradually other languages included it as well, as a feature. We have briefly gone through what it means - OOP is a design where you write code similar to how you view real world. Let us go into a bit more detail here before going to why people criticize it.

Object-Oriented Programming

OOP is a way to mimic the real-world entities, their state and their behavior.

If you want to code about a car, instead of writing like this:

car1 = {
    color: 'red',
    brand: 'Toyota',
    speed: 40
}

car2 = {
    color: 'blue',
    brand: 'Toyota',
    speed: 60
}

car3 = {
    color: 'white',
    brand: 'Tata',
    speed: 50
}

def change_color(car, target_color):
    car['color'] = target_color

change_color(car1, 'white')
change_color(car2, 'red')

def increase_speed(car, increment):
    car['speed'] = car['speed'] + increment

increase_speed(car1, 1)
increase_speed(car2, 2)

def brake(car):
    car['speed'] = 0

brake(car1)
brake(car2)

You can make a base class called Car, which is like a blueprint, which will take color and brand as the states when you define it and give you an object i.e., an instance of the class (or blueprint) and you can access that object's properties when required.

class Car:
    def init(self, color, brand):
        self.color = color
        self.brand =  brand
        self.__speed = 0

    def change_color(self, color):
        self.color = color

    def increase_speed(self, increment):
        self.__speed = self.__speed + increment

    def decrease_speed(self, decrement):
        self.__speed = self.__speed - decrement

    def brake(self):
        self.__speed = 0

car1 = Car('red', 'Toyota')
car2 = Car('blue', 'Toyota')
car3 = Car('white', 'Tata')

# when you need to access a property, say, car1's color, you just do car1.color
# when you need to access a behavior, you can just do:

car1.change_color('white')
car1.increase_speed(2)

car1.brake()
car2.brake()

This makes the code more readable and much more intuitive. Now OOP also has some concepts you will need to know to use it properly.

Abstraction

Abstraction means providing a simplified, high-level view of an object, exposing only what's relevant to the user and while hiding the unnecessary internal details and complex implementations. It's closely related to another concept of OOP, called encapsulation.

Encapsulation

Encapsulation also means hiding the internal details of how an object works and only exposing what's necessary for other parts of the program to interact with it. This protects the data from accidental external modification. What is the difference between abstraction and encapsulation?

Abstraction is a design, and encapsulation is its implementation. Encapsulation tells us how exactly you can implement abstraction in the program. Abstraction is a concept while encapsulation is a mechanism.

In the above example, __speed is encapsulated. It is a private attribute and other parts of the code can't access it or modify it directly, outside of the Car class. We only provide them with increase_speed, decrease_speed and break methods to modify the __speed property in a controlled manner.

Inheritance

Imagine you have a basic Vehicle blueprint. It has wheels, an engine, and can move. Now, you want to create a Car and a Truck. Instead of starting from scratch, you can say, "A Car is a type of Vehicle," and "A Truck is also a type of Vehicle." They can inherit the common features of a Vehicle and then add their own specific characteristics and behaviors.

Inheritance allows a new class (called the child or derived class) to inherit attributes and methods from an existing class (called the parent or base class). This promotes code reusability and establishes a "is-a" relationship (e.g., a "Car is a Vehicle").

Example:

# Parent class
class Vehicle:
    def init(self, make, model):
        self.make = make
        self.model = model

    def start_engine(self):
        print(f"The {self.make} {self.model}'s engine is starting.")

    def stop_engine(self):
        print(f"The {self.make} {self.model}'s engine is stopping.")

# Car is a child class inheriting from Vehicle
class Car(Vehicle):
    def init(self, make, model, num_doors):

        # Call the parent class's init method to handle make and model
        super().__init__(make, model)

        # Property specific to the child class
        self.num_doors = num_doors

    def drive(self):
        print(f"The {self.make} {self.model} with {self.num_doors} doors is driving.")

# Truck is another child class inheriting from Vehicle

class Truck(Vehicle):
    def init(self, make, model, bed_capacity):
        super().__init__(make, model)
        self.bed_capacity = bed_capacity

    def haul_cargo(self):
        print(f"The {self.make} {self.model} with {self.bed_capacity} capacity is hauling cargo.")

my_car = Car("Tesla", "Model 3", 4)
my_truck = Truck("Ford", "F-150", "1000 lbs")

my_car.start_engine() # Inherited from Vehicle
my_car.drive()      # Specific to Car

my_truck.start_engine() # Inherited from Vehicle
my_truck.haul_cargo()   # Specific to Truck

Polymorphism

The word "polymorphism" comes from Greek. It means "many forms".

Polymorphism means objects of different classes can be treated as objects of a common type. Or, put simply, you can have a single way of doing something, and different objects will respond in their own specific way.

Think of a "play" button on different media players. The "play" button on a music player makes music, while the "play" button on a video player plays a video. The action, pressing "play", is the same, but the outcome is different depending on the device.

Example:

from abc import ABC, abstractmethod

# An abstract base class for any kind of Media Player
class MediaPlayer(ABC):
    def init(self, title):
        self.title = title
        self._is_playing = False

    @abstractmethod
    def play(self):
        pass

    @abstractmethod
    def pause(self):
        pass

    def get_status(self):
        return f"{self.title} is {'playing' if self._is_playing else 'paused'}."

class AudioPlayer(MediaPlayer):
    def init(self, title, artist):
        super().__init__(title) # Initialize the MediaPlayer part
        self.artist = artist

    def play(self):
        self._is_playing = True
        print(f"Playing audio: '{self.title}' by {self.artist}.")
    # Can have its own implementation here

    def pause(self):
        self._is_playing = False
        print(f"Pausing audio: '{self.title}'.")
    # Can have its own implementation here

class VideoPlayer(MediaPlayer):
    def init(self, title, resolution):
        super().__init__(title) # Initialize the MediaPlayer part
        self.resolution = resolution

    def play(self):
        self._is_playing = True
        print(f"Playing video in {self.resolution} resolution.")
    # Can have its own implementation here

    def pause(self):
        self._is_playing = False
        print(f"Pausing video: '{self.title}'.")
    # Can have its own implementation here

Composition

Take a look at the below example code.

class Bird:
    def fly(self):
        print("This bird can fly.")

    def lay_eggs(self):
        print("This bird lays eggs.")

class Eagle(Bird):
    def hunt(self):
        print("Eagle is hunting.")

class Penguin(Bird):
    # Penguins can't fly, but they inherit the fly() method
    def fly(self):
        print("Penguins cannot fly! This is awkward.")

    def swim(self):
        print("Penguin is swimming.")

class Ostrich(Bird):
    # Ostriches can't fly either
    def fly(self):
        print("Ostrich cannot fly! I'm a runner.")

    def run_fast(self):
        print("Ostrich is running very fast.")

eagle = Eagle()
penguin = Penguin()
ostrich = Ostrich()

eagle.fly()
eagle.hunt()

penguin.fly() # We had to override 'fly' just to say it can't fly.
penguin.swim()

ostrich.fly() # Same problem here.
ostrich.run_fast()

What's the problem here?

The Bird class has a fly() method because most birds fly. But then we have Penguin and Ostrich which are birds, but they cannot fly. We're forced to override the fly() method in their classes just to state that they can't do what their parent class implies they can. It indicates a design flaw: not all Bird objects can fly(), so fly() shouldn't be a core behavior of Bird.

Now with composition we do the following. Instead of saying "A Penguin is a Bird that can fly (but actually can't)", let's think about what capabilities an animal has.

We can define different "behaviors" as separate, smaller objects.

# Define behaviors as separate classes

class CanFly:
    def fly(self):
        print("This animal can fly by flapping wings.")

class CannotFly:
    def fly(self):
        print("This animal cannot fly.")

class CanSwim:
    def swim(self):
        print("This animal is swimming.")

class CanLayEggs:
    def lay_eggs(self):
        print("This animal lays eggs.")

class CanHunt:
    def hunt(self):
        print("This animal is hunting prey.")

class CanRunFast:
    def run_fast(self):
        print("This animal can run very fast.")

# Now, let's build our birds by "composing" these behaviors

class Eagle:
    def init(self):
        self.flying_behavior = CanFly()
        self.hunting_behavior = CanHunt()
        self.egg_laying_behavior = CanLayEggs()

    def fly(self):
        self.flying_behavior.fly()

    def hunt(self):
        self.hunting_behavior.hunt()

    def lay_eggs(self):
        self.egg_laying_behavior.lay_eggs()

class Penguin:
    def init(self):
        self.flying_behavior = CannotFly()
        self.swimming_behavior = CanSwim()
        self.egg_laying_behavior = CanLayEggs()

    def fly(self):
        self.flying_behavior.fly()

    def swim(self):
        self.swimming_behavior.swim()

    def lay_eggs(self):
        self.egg_laying_behavior.lay_eggs()

class Ostrich:
    def init(self):
        self.flying_behavior = CannotFly()
        self.running_behavior = CanRunFast()
        self.egg_laying_behavior = CanLayEggs()

    def fly(self):
        self.flying_behavior.fly()

    def run_fast(self):
        self.running_behavior.run_fast()

    def lay_eggs(self):
        self.egg_laying_behavior.lay_eggs()

eagle = Eagle()
penguin = Penguin()
ostrich = Ostrich()

eagle.fly()
eagle.hunt()
eagle.lay_eggs()

penguin.fly() # This correctly says it cannot fly
penguin.swim()
penguin.lay_eggs()

ostrich.fly() # This correctly says it cannot fly
ostrich.run_fast()
ostrich.lay_eggs()

Why is this better?

We can easily create new types of birds by mixing and matching these behavior objects. Want a flying penguin? Just give it CanFly behavior! You don't have to redefine an entire inheritance tree. Penguin and Ostrich no longer "pretend" to fly and then awkwardly tell you they can't. Their "flying" behavior is explicitly defined as CannotFly. And if we ever change how CanFly works, it only affects objects that have a CanFly object, not every class in a deep inheritance hierarchy.

This is the essence of composition: instead of inheriting features from a parent class, you build your objects by including instances of other objects that provide the desired behaviors. You're giving the object "parts" that define what it can do, rather than inheriting a whole "blueprint" that might contain unwanted features.

Things to be careful about

While there is a concept of static methods, which are like utility methods in a class, which can't access an object's state directly, you will notice that a vast majority of methods you write in a class often directly deal with some state of an object. After all, the fundamental idea of OOP is to encapsulate the state and behaviors into a self-contained unit, an object. This core philosophy by itself is where functional programming differs. But that's a topic for another time.

Meanwhile, you will notice that I have said that OOP lets you write a more readable and intuitive code. But that doesn't mean if you are using OOP, you are writing great code. You will find many people criticizing OOP.

One of the key reasons highlighted by many is exposure to bad usage of OOP in the code bases which made them hate it. Endless inheritance trees making it difficult to understand the code (ironic when OOP is made so code can be intuitive), fragile base class problem (meaning same base class being used by too many different classes - so when you need to make a change to base class in the future, you will have to go through all the child classes to make sure nothing breaks. Though this is what unit testing is for), uncontrolled state changes (meaning different parts of your program can affect your state making it hard to track how it will change), over engineering (using classes and objects where not necessary) along with some other reasons lead to the increased disillusionment with OOP.

So you can avoid that by properly understanding when and how to use OOP:

Try to model your classes based on intuitive entities.
Use it only when necessary. Not everything needs to be a class. Sometimes a simple function will do the job.
Proper encapsulation.
Use small classes with single responsibility instead of a single class with too many methods.
Try to use composition over inheritance when useful

Conclusion

You don't need to use every design principle of OOP in your code just because you can. Based on the problem you are dealing with, if only a few features of OOP can do the job, then use just those. Always look for simplest, cleanest and efficient answer. When you don't need composition, don't use it. When you don't need inheritance, don't use it. Sometimes you may not even need OOP, just a regular function might do. Then don't use it.

Asynchronous Programming in Python and NodeJS

Surya Sathi — Wed, 06 Aug 2025 06:30:00 GMT

In the last article, we have talked about concurrency and parallelism at OS level. Concurrency is when tasks can start and run seemingly at the same time. Parallelism is when tasks actually run at the same time. Now let's talk about how you can access them at a language level, which will also let you see, that different languages have different ways of implementing these features due to the different core philosophies and how they've been designed. We will go through how Python and NodeJS implements them in this article. Before that let us go through the foundational concept, asynchronous programming.

Asynchronous Programming: When You're Waiting

Now, let's talk about event loops and asynchronous programming. Imagine you're a coffee shop manager. When a customer orders a coffee, they have to wait for it to be prepared. But you don't wait with them—you take other orders for others in the meantime, and when the coffee is prepared, you give it to the customer.

This "doing something else while waiting" is the essence of asynchronous programming. When you write an application, many operations involve waiting: waiting for data from a network, waiting for a file to be read from disk, waiting for a database query to return results. These are called I/O-bound tasks (Input/Output bound).

This is where asynchronous programming comes into play. You typically write code using the async/await syntax.

At its core, this concept uses an event loop.

Event Loop

Think of the event loop as the coffee shop manager in our example. When you place an order (start an I/O operation), the manager notes it down and goes to the next customer. When your coffee is ready, a signal is sent to the manager, who then comes back to you.

In asynchronous programming, when your code encounters an await keyword, it means "I'm going to start an I/O operation here, and while I wait for it to complete, the event loop can go off and do other tasks." When that I/O operation finishes, the event loop then comes back to where you awaited and resumes your code. This way, a single thread can manage many concurrent I/O operations without getting blocked. It's like a single barista handling multiple orders efficiently by constantly switching tasks.

So, while threads are about potentially running multiple things in parallel, event loops are about running many I/O-bound tasks concurrently within a single thread. It's about maximizing the utilization of that single thread by not letting it sit idle during waiting periods.

Examples:

Python:

import asyncio

async def fetch_data(): # declares a coroutine (an async function)
    print("Start fetching...")

    # Simulate I/O: pauses for 2 seconds. Meanwhile other tasks continue.
    await asyncio.sleep(2)

    print("Done fetching!")

async def main():
    # runs them seemingly at the same time
    await asyncio.gather(fetch_data(), fetch_data())

asyncio.run(main()) # starts the event loop and runs the main coroutine

NodeJS:

const fs = require('fs/promises');

// declares an asynchronous function

async function readFile() {
    console.log("Reading file...");

    // the event loop continues handling other tasks while this waits.
    const content = await fs.readFile('example.txt', 'utf8');

    console.log("File content:", content);
}

readFile();
readFile();

Python

Before we jump ahead, we absolutely have to talk about something called the Global Interpreter Lock, or GIL.

Global Interpreter Lock (GIL)

GIL is a locking mechanism which makes sure that only one thread can enter and execute Python bytecode at any given time.

Now, why does Python have this GIL? Well, it's mainly there to make memory management simpler and safer. Without it, imagine multiple threads trying to modify the same piece of data in memory simultaneously—it could lead to unpredictable and hard-to-debug issues. The GIL prevents these kinds of race conditions.

However, the downside is that even if you have a powerful multi-core processor, the GIL prevents multiple threads from executing Python bytecode in parallel. This means if your program is heavily dependent on CPU-bound tasks (tasks that spend most of their time crunching numbers), then using multiple threads in Python won't magically make it run faster across all your CPU cores.

But GIL only applies to Python bytecode. If your code calls out to underlying C libraries (which many popular Python libraries do, like NumPy for numerical computations), then the GIL can be released, allowing those C operations to run in parallel. This is a crucial point to remember!

Concurrency: Threading

Now with GIL in place, how does threading work in Python? Python's threading module allows you to create and manage multiple threads within a single process.

When you use threads in Python, you are indeed achieving concurrency. Multiple threads can exist and execute seemingly at the same time. However, due to the GIL, only one thread can be executing Python bytecode at any given moment. The operating system's scheduler rapidly switches between these threads, giving the illusion of parallel execution.

So, when is threading useful in Python?

I/O-bound tasks: If your task involves a lot of waiting (like fetching data from the internet, reading a large file, or waiting for a database response), then threads can be very effective. While one thread is waiting for an I/O operation to complete, the GIL is released, allowing another thread to execute Python bytecode. This means you're still making progress on other parts of your program.

Tasks that release the GIL: As mentioned earlier, if your code calls out to underlying C libraries that release the GIL (like many numerical processing libraries), then you can truly achieve parallelism with threads for those specific operations.

However, if your task is CPU-bound (e.g., heavy mathematical calculations, image processing, complex algorithms that primarily involve Python code), then using multiple threads in Python might not give you the performance boost you expect. In fact, the overhead of managing threads and the GIL switching can sometimes even make CPU-bound threaded programs slower than their single-threaded counterparts.

It’s also important to note that not all libraries behave the same way with regard to GIL. Just because a library is implemented in C doesn’t guarantee it will release the GIL.

Example:

import threading
import time

def download_file():
    print("Start downloading...")
    time.sleep(3)  # Blocking, simulates I/O
    print("Download complete!")

# Start two threads to run the download_file function
t1 = threading.Thread(target=download_file)
t2 = threading.Thread(target=download_file)


# runs both functions concurrently.
t1.start()
t2.start()

# waits for both threads to finish
t1.join()
t2.join()

print("Both downloads finished.")

Parallelism: Multiprocessing

If you want to achieve true parallelism for CPU-bound tasks in Python, where different parts of your program run simultaneously on different CPU cores, then the multiprocessing module is your choice.

Multiprocessing, as the name implies, creates separate processes instead of threads. Think of each process as a completely independent instance of the Python interpreter, with its own memory space and its own GIL. Because each process has its own GIL, they can all run simultaneously on different CPU cores without interference.

The multiprocessing module handles all the complexities of creating these processes and provides ways for them to communicate with each other (if needed), for example, through queues or pipes.

That said, using multiprocessing has trade-offs: processes are heavier than threads and inter-process communication can be slower than shared-memory in concurrency.

Example:

from multiprocessing import Process
import time

def heavy_computation():
    print("Start computing...")
    total = 0

    for i in range(10**7):  # CPU-intensive loop
        total += i
    print("Computation done:", total)

# Creates two new OS processes. They truly run in parallel on different CPU cores.
p1 = Process(target=heavy_computation)
p2 = Process(target=heavy_computation)

p1.start()
p2.start()

p1.join()
p2.join()

print("Both computations finished.")

NodeJS

Node JS achieves the same concepts of concurrency and parallelism quite differently from Python. To know why, you should first get to know the main philosophy of NodeJS.

Single-Threaded Nature of NodeJS

The most fundamental concept to grasp about NodeJS is that its main execution thread is single-threaded. This means that your JavaScript code runs on one single thread. There's no GIL here in the Python sense because JavaScript itself doesn't have a GIL; it's designed differently.

Then how can it handle so many users and tasks if it's single-threaded? This is where the event loop with non-blocking I/O come in.

Concurrency: With Non-Blocking I/O

Node.js is designed with non-blocking I/O. This means when your Node.js code needs to do something that takes time, like reading a file from disk, making a request to another server, or fetching data from a database—instead of waiting, Node.js tells the OS or some background libraries, "Hey, can you do this for me? When you're done, let me know!" And then, it immediately moves on to the next task.

Those background libraries themselves might use multiple threads.

Simplified Flow

When your Node.js application starts, the event loop begins its cycle.

It executes any synchronous JavaScript code first (like setting up variables or simple calculations). When it encounters a file system asynchronous operation like reading a file, it offloads that task to an underlying C++ library called libuv. But when it encounters a network I/O task, it uses OS async APIs.

Libuv maintains a thread pool, a small group of actual operating system threads, to handle these heavy, blocking I/O tasks. So, when your Node.js code says "read this file," libuv picks an available thread from its pool to do the actual file reading. Once that background thread finishes the I/O operation, it doesn't return the result directly. Instead, it places a callback function—a piece of code that should run when the task is complete—into an event queue.

The event loop continuously checks if the call stack (where synchronous code runs) is empty. If it is, it picks the next callback from the event queue and pushes it onto the call stack for execution.

This continuous cycle ensures that the main JavaScript thread is almost always busy doing something useful, either processing new requests or handling the results of completed asynchronous operations, instead of waiting idly.

This model is incredibly efficient for I/O-bound applications (like web servers) because the single JavaScript thread is never blocked waiting for slow I/O operations. It's always free to accept new connections and process other requests.

Example:

const fs = require('fs');

function readFileAsync(filename) {

// fs.readFile is non-blocking — it doesn't wait for the result. The event loop continues running after calling both reads. The callbacks are triggered once each file is read.

  fs.readFile(filename, 'utf8', (err, data) => {
    if (err) throw err;
    console.log(`Read ${filename}:`, data);
  });
}

readFileAsync('file1.txt');
readFileAsync('file2.txt');

// Node doesn’t create OS-level threads here — it offloads to libuv's internal mechanisms.

console.log("Files are being read...");

Parallelism: With Worker Threads

While Node.js's single-threaded nature simplified programming (no need to worry about shared memory, race conditions), this model had a limitation: if you had a CPU-bound task, it would block the entire Event Loop. In our previous flow, by the time the I/O task is complete and a callback function is set in the event queue, if the main execution stack still keeps running for a long time, this callback will not get executed until the call stack is free.

To address this, worker threads were introduced. Worker threads allow you to create separate, isolated JavaScript execution environments that run on different threads. Each worker thread has its own V8 engine instance, its own event loop, and its own memory space.

This means you can offload computationally intensive tasks to a worker thread, keeping the main thread (and its event loop) free and responsive to handle incoming requests. The worker thread will have its own event loop, but it will still be in the same process. The benefit is that the main thread won't be blocked by the long running heavy computation happening in the worker thread.

That said, worker threads come with their own complexities. They are heavier than you might expect (each runs its own V8 instance), and communication between threads happens via message passing.

Example:

const { Worker, isMainThread, parentPort } = require('worker_threads');

if (isMainThread) {
  function runWorker() {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename);
      worker.on('message', resolve);
      worker.on('error', reject);
    });
  }

  runWorker().then(result => {
    console.log("Worker result:", result);
  });

} else {
  let total = 0;
  for (let i = 0; i < 100; i++) 
    total += i;
  }

  parentPort.postMessage(total);
}

Parallelism: With Clustering

In addition to worker threads, Node.js also supports another technique for achieving parallelism across CPU cores: clustering.

The cluster module in Node.js allows you to spawn multiple child processes, each running an instance of your server. These child processes share the same server port and can handle incoming requests in parallel. Essentially, you’re replicating the same Node.js process multiple times, with each process operating independently and utilizing a different CPU core.

Imagine you have an 8-core machine. With clustering, you can start 8 separate Node.js processes (workers), all managed by a master process. When a new connection is received, the master process can distribute it to one of the worker processes, often using a round-robin or OS-level load balancing strategy.

This approach provides true parallelism at the process level without changing your application code much. It’s especially useful for scaling I/O-bound web applications horizontally within a single machine.

However, like multiprocessing in Python, clustering comes with its own caveats: each worker has its own memory space, so sharing data across workers requires inter-process communication via messaging or external data stores.

Example:

const cluster = require('cluster');
const http = require('http');
const os = require('os');

if (cluster.isMaster) {
  const numCPUs = os.cpus().length;
  console.log(`Master PID ${process.pid}, spawning ${numCPUs} workers`);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork(); // Spawn worker
  }

} else {
  http.createServer((req, res) => {
    res.end(`Handled by PID ${process.pid}`);
  }).listen(3000);
}

Conclusion

It's crucial to understand the strengths and limitations of different programming languages and use them accordingly. You might choose Node.js for building a web server due to its efficient non-blocking I/O model, which is great handling high-concurrency I/O-bound workloads, such as handling multiple HTTP requests, reading/writing to databases, or working with file systems.

However, when it comes to CPU-bound or computation-heavy tasks like data processing, image manipulation, or machine learning, Node.js is not ideal due to its single-threaded event loop. In such cases, it's often better to offload these tasks to a separate service written in a language more suited for heavy computation, such as Python or C++.

If you're using Python, you may worry about the GIL restricting parallelism in multithreaded programs. While it's true that the GIL limits CPU-bound concurrency in threads, I/O-bound concurrency (e.g. network calls, disk I/O) can still benefit from multithreading. For CPU-bound tasks, using multiprocessing, which spawns separate processes, is often the better approach.

More importantly, you can combine both NodeJS and Python to play to each of their strengths — using Node.js for handling high-throughput asynchronous I/O operations, and Python for offloading compute-heavy tasks and leveraging its rich ecosystem in data science, AI, and numerical computing.

How Your OS Runs Applications

Surya Sathi — Wed, 30 Jul 2025 06:30:00 GMT

When you use your computer—whether for browsing, gaming, coding, or anything else—everything ultimately comes down to calculations and instructions being carried out. The part that does all the actual calculations and follows every single instruction is the Central Processing Unit, or CPU. You can think of the CPU as the entire brain chip of your computer, tirelessly handling every task.

CPU

Cores: Workers of CPU

Inside this CPU chip, there are typically multiple independent working units called cores. Each core is like an independent worker. The crucial point is that each individual CPU core can execute instructions from only one single task at any given instant. Even if a core is incredibly fast, it processes one instruction after another, one at a time.

So, a CPU with four cores can perform four distinct sets of instructions simultaneously, one on each core.

Generally, in most modern personal computers you'd buy today – like laptops or desktop PCs – you'll commonly find CPUs with 4 to 8 cores. High-end consumer computers or specialized workstations might have 10, 12, or even 16 cores. Servers, which are powerful computers designed for heavy tasks, can have many more, sometimes 32, 64, or even hundreds of cores across multiple CPU chips.

Registers: CPU's Own Memory

Each CPU core also has its own very small, super-fast storage areas right inside it, called CPU Registers. When a core is actively working on instructions, it uses these registers to hold the numbers and pieces of information it needs right at that exact moment. They are the fastest place for the core to access data it's currently manipulating.

32-bit vs 64-bit Architecture

When we talk about 32-bit and 64-bit CPUs, we are referring to a fundamental aspect of their design/architecture: the size of the data chunks a CPU core can process in a single operation.

Performance wise, a 64-bit processor can be faster than a 32-bit one, for applications that require large memory or deal with large data types. As an example, if you add two large numbers that require 64 bits to represent, a 64-bit CPU can do that in a single operation. A 32-bit CPU, on the other hand, would need to break down a 64-bit number into two 32-bit pieces and perform multiple operations to achieve the same result, making it slower for such tasks.

Impact of CPU Architecture on RAM

Another benefit of increased bit capacity is the amount of memory a CPU can utilize. Imagine your computer's main memory (RAM) as a vast collection of tiny memory boxes, and each box has a unique "address".

A 32-bit CPU uses 32 digital bits (which are 0s or 1s) to create these memory addresses. With 32 bits, you can create exactly 2^32 unique addresses. If each address refers to one byte of memory, then 2^32 bytes is approximately 4 Gigabytes (GB). This means a 32-bit CPU can only directly "point to" or use a maximum of about 4 GB of your computer's main memory. Even if you install more than 4 GB of RAM, a 32-bit CPU cannot physically address or utilize that extra memory beyond its 4 GB limit.

However, a 64-bit CPU uses 64 bits to create memory addresses. This allows for an astronomically larger number of unique addresses – 2^64. This vastly expanded addressing capability means a 64-bit CPU has the capacity to use a huge amount of RAM, easily reaching into Terabytes (TB) of RAM and beyond, limited mainly by how much RAM you can provide as hardware. This capability is essential for modern software, which often requires significant amounts of memory to run efficiently.

With this understanding of what the CPU can do, let’s look at how the software—particularly the operating system—takes advantage of it.

Operating System (OS): The Coordinator

The Operating System is a very special, complex program, a piece of software, that acts as the coordinator of your entire computer system. It is crucial to understand that the OS is a software, while the CPU is a hardware. They are distinct but work very closely together.

The OS is responsible for controlling and coordinating all the hardware components (including the CPU and its cores, input devices like keyboards, output devices like screens) and all the other software programs you run. It acts as the bridge between you, your programs, and the computer's hardware. One of its key roles is directly managing the cores of the CPU. The OS tells each individual CPU core what instructions to execute and when.

In the very early days of computers, only one program could run at a time. You had to finish one task before starting another. The ability to run multiple programs seemingly at once, called multitasking, was a major advancement implemented by the OS.

The OS achieves multitasking through a technique called time-sharing. A key part, known as the OS scheduler, is responsible for this. The scheduler doesn't let one program run continuously. Instead, it gives each running program's active worker (a "thread," which we'll discuss next) a very small slice of CPU time on a core.

After that tiny slice of time, the OS takes control back from that thread and then gives the CPU core to another thread for its own small slice of time, and so on. This switching happens incredibly fast, often thousands or millions of times per second. Because of this rapid switching, it creates the illusion that all programs are running simultaneously, even if your computer only has one CPU core.

How does the OS manage to take control back from a running program? It uses hardware mechanisms called interrupts. When a CPU core receives an interrupt or even when a thread voluntarily yields control, it immediately stops what it's currently doing and temporarily hands control over to the OS. This allows the OS to regain control from the running thread, even if that thread is very busy. Once the OS has control, its scheduler can then decide which thread gets to run next on that CPU core. This entire process of pausing one thread and starting another, including saving and loading their states, is called context switching.

Processes and Threads

A process is an isolated environment for a running program, created and managed by the OS. It includes dedicated memory, file access, network permissions, and more.

When you open any program on your computer, the OS creates a process for it. It also assigns and manages all the resources a process needs, such as its dedicated segment of memory, access to specific files on the hard drive, and network connections, and prepares it for execution. When you close the program, the OS cleans up and removes its process, freeing up all its resources.

The OS uses special hardware capabilities to enforce strict boundaries between processes. This means one process cannot directly read from or write to the memory space of another process without explicit permission from the OS. This isolation is fundamental for system stability and security; if one program crashes, its issues are contained within its own process, preventing it from corrupting or crashing other running programs or the entire operating system.

Inside each program (process), there can be one or many threads. A thread is a smaller, individual sequence of instructions or a specific "worker" within a process. All threads within the same process share that process's allocated memory space and resources. This makes them very efficient for tasks that need to cooperate closely and share data within one program. The OS is also responsible for creating, destroying, and managing these threads within their respective processes.

It is crucial to understand that processes and threads are concepts and structures created and managed entirely by the Operating System. They are not inherent features of the CPU hardware itself. The CPU simply executes the instructions that the OS tells it to run, which come from threads.

How Processes, Threads, and Cores Work Together

Let’s make the relationship clear:

Only one thread runs on a single CPU core at any given instant. This is a fundamental rule. A CPU core can execute instructions from only one thread at a time. The OS scheduler assigns a thread to a specific CPU core to execute its instructions. The OS tells a core, "Now, execute instructions from this thread."

A thread does not always run on the same core. When a thread is paused (during a context switch) and then later resumed, the OS scheduler can (and often does, for load balancing) choose to run that thread on any available CPU core, not necessarily the same one it was on before. This flexibility helps the OS efficiently manage the workload across all cores.

During context switching, the complete state of the thread being paused is saved. This includes all the values currently held in the CPU's registers for that thread, the exact instruction it was about to execute next (its program counter), and any other crucial information about its current progress. This entire state is copied from the CPU's registers and stored into a specific memory area designated for that thread, which resides in the computer's RAM. When the OS decides to resume that thread later, it retrieves this saved state from RAM and loads it back into the CPU's registers, allowing the thread to continue executing precisely from where it left off.

And multiple threads of the same process can run on different cores simultaneously. This is a key benefit of multi-core CPUs. If your program (process) has multiple threads, the OS scheduler can distribute those threads across different available CPU cores. This allows different parts of your single program to genuinely execute at the same moment, speeding up its overall performance.

This is how modern applications take full advantage of multi-core CPUs. Threads allow programs to scale their workload, and the OS ensures they are efficiently distributed across available cores.

Concurrency and Parallelism

Now that we understand cores, threads, and the OS's role, it's important to define two often-confused concepts, concurrency and parallelism:

Concurrency is about the ability of the OS to manage and make progress on many tasks over time, even if there are not enough CPU cores to execute them all at the exact same instant. It relies on time-sharing and rapid context switches between various threads, creating the illusion of simultaneous execution. This is possible on any computer, even those with only one CPU core.

Parallelism is about the actually doing of multiple tasks at the exact same moment. This is achieved when the OS schedules different threads to run on different, available CPU cores. So, if you have a 4-core CPU, the OS can run up to four distinct threads truly in parallel. This is possible only if you have a multi-core CPU.

Modern systems combine both: they use concurrency to manage more tasks than available cores, and parallelism to speed up those tasks on multicore hardware.

Example: Running Multiple Tabs on a Browser

Let’s make this all even more tangible by walking through a real-world scenario: opening multiple tabs in a web browser.

When you launch your web browser, say Chrome, the Operating System creates a process for it. This process includes everything the browser needs: memory to hold the pages you view, access to the network for loading websites, and access to files for caching or downloads.

Now, when you open multiple tabs within that browser, the architecture becomes more layered:

Each tab is often handled by a separate process (or at least a separate thread, depending on the browser's internal design). For example, Chrome is well known for creating a separate process per tab. This isolation means that if one tab crashes due to a faulty script, it doesn’t take down other tabs or the entire browser—because the OS has enforced process-level isolation.

Within each browser process or tab, there are multiple threads. For instance, one thread may handle rendering the web page (graphics), another may handle JavaScript execution, while another manages network communication. All these threads share the same memory and work closely together, allowing the web page to load smoothly and respond to user interactions.

The OS scheduler assigns these threads to CPU cores. If your computer has four cores, four threads—possibly from four different tabs or background tasks—can run truly in parallel. If there are more threads than cores (which is usually the case), the scheduler uses time-sharing to rapidly switch threads in and out of each core.

Context switching ensures that if you're watching a video in one tab, loading another page in a second tab, and downloading a file in the background, your system appears responsive—even if those operations are switching in and out of a single core every few milliseconds.

Conclusion

Understanding this coordination between the CPU, its cores, the OS, processes, and threads is crucial to understand how your computer gets work done. Whether you're writing code, using software, or just curious about how things work, this is how it's done.

Understanding Big O

Surya Sathi — Tue, 22 Jul 2025 18:30:00 GMT

There are often many ways to solve a problem. These different approaches are called algorithms. An algorithm is just a clear, well-defined set of step-by-step instructions that a computer follows to perform a computation. For example, if you want to find a specific word in a dictionary, you could start from the very first page and read every word until you find it. Or, you could open the dictionary roughly in the middle, see if your word is before or after, and then narrow down your search. Both are algorithms to find a word. But one is more efficient than the other.

This is why we look into solving the same problem using different algorithms. Because some are just more efficient than others. So, how do you figure out which one’s better? You usually look at two things: how much memory it eats up (space complexity) and how long it takes to run (time complexity). In this one, we’re just talking about time complexity.

What is it?

Big O is a way to describe this time complexity. Think of it as a way to describe how much "work" an algorithm has to do as the amount of "stuff" i.e., input size, it's working with grows. It's not about the exact number of seconds an algorithm takes, because that can change depending on how fast your computer is or what other programs are running in the background. Instead, Big O focuses on how fast the amount of work grows as the input size grows.

Big O helps us categorize algorithms based on how their performance scales. It gives us a kind of a "worst-case scenario" for how long an algorithm might take. We use a special notation, often with a capital 'O' followed by parentheses, like O(n) or O(n^2).

Let's go through some examples:

O(1): Constant Time

Imagine, in the above dictionary example, you already know the page number and the position of the word you are looking for. Then you don't need to go through all the words one by one. If you already know the page number is 20, it doesn't matter whether the dictionary has 100 pages or 1000 pages, you will always open page 20 as soon as you open the dictionary. This type of getting what you want in the same time no matter the input size, is O(1).

Example:

Accessing an element in an array by its index - it doesn't matter how many elements they are, you can just jump to its index In the memory.
Adding an element at the front in a singly linked list - you are just adding an element and referencing the pointer to the previous first item and it doesn't matter how many elements there are after it.

O(n): Linear Time

In the same example, if you don't know the page number, then you are not just accessing the word but you are searching for it. And if you are reading every word, if you have a small dictionary with only 100 pages, in the worst case scenario, the word will be in the 100th page and you will have to read all 100 pages. If you have a 1000 pages, you will have to read all 1000 pages. That means the work grows directly with the number of pages - double the pages, double the time. So, you can say the time complexity in this case is O(n).

Example:

Searching for an item in an unsorted list (we used a dictionary example, which is technically a sorted list, just for a good analogy. But there are much better algorithms to find elements in a sorted list).
Printing all the elements in an array - you are going through all the elements of an array.
Counting the number of items in a list - again, you are going through all the elements of an array.

O(logn) - Logarithmic Time

In our dictionary example, if you are opening in the middle and deciding to go left or right, then opening again in the middle and again deciding to go left or right and you keep on repeating it until you find the word. That is Binary Search Algorithm. It wouldn't work if the input elements are not sorted. This algorithm can give you O(logn).

"Logarithm" sounds scary, but it really isn't. Think of it this way: with O(logn) algorithms, you're not looking at every item. Instead, you're cutting the problem size in half (or some other fraction) with each step.

If you have a million pages in a dictionary:

First step: You cut it to 500,000 pages.

Second step: You cut it to 250,000 pages.

...and so on.

You can find your word incredibly fast because with each step, you eliminate a huge chunk of possibilities. This is why O(logn) is considered very efficient, especially for large inputs.

Example:

Binary search - like our dictionary example, where the data must be sorted.
Finding an item in a balanced binary search tree.

Why is binary search O(logn)?

If n (your input size) is 100:

After 1 step, you're looking at 50 elements.
After 2 steps, you're looking at 25 elements.
After 3 steps, you're looking at 12-13 elements.
After 4 steps, you're looking at 6-7 elements.
After 5 steps, you're looking at 3-4 elements.
After 6 steps, you're looking at 1-2 elements.
After 7 steps, you're looking at 1 element (and you'll find it or it's not there).

So, for 100 items, it takes about 7 steps.

Now consider n = 1,000,000 (one million).

1st step: 500,000
2nd step: 250,000
3rd step: 125,000
...
10th step: about 1,000
...
20th step: about 1 element.

So, for a million items, it takes about 20 steps.

The number of steps is basically: how many times can you divide n by 2 until you're left with 1?

This "how many times do you divide by something" question is answered by a logarithm. Specifically, we're talking about the base-2 logarithm (log2).

log2(100)≈6.64 (which rounds up to 7 steps)
log2(1,000,000)≈19.93 (which rounds up to 20 steps)

So, the number of operations grows proportionally to the logarithm (base 2) of the input size n. That's why we say its time complexity is O(log n).

O(n^2) - Quadratic Time

Imagine you're trying to find all possible pairs of students in a classroom. For every student, you have to pair them with every other student. If there are 'n' students, the first student pairs with (n-1) others, the second student pairs with (n-1) others, and so on. This quickly leads to n multiplied by n, or n^2 operations.

When you see nested loops in a computer program (a loop inside another loop), it often points to O(n^2) complexity. If you double the input size, the time taken doesn't just double; it quadruples! This can get very slow for large inputs.

Example:

Bubble Sort (a simple but inefficient sorting algorithm where you repeatedly step through the list, compare adjacent elements and swap them if they are in the wrong order).
Finding all possible pairs of elements in a list.

There are more like O(n^3) and O(2^n) and so on, which we are not covering in this article.

Why is Big O important for Data Structures?

Data structures are ways of organizing data in a computer so that it can be used efficiently. The choice of data structure directly impacts the time complexity of the algorithms you use with it.

If you need to quickly check if an item exists, a hash table might give you an average O(1) search time. If you need to keep your data sorted and frequently search for items, a binary search tree might give you O(logn) search time. If you just need a simple list and don't care much about super-fast searching, a basic array might be fine for O(n) search time.

Understanding Big O helps you choose the algorithms to make your software perform well. It can be the difference between an app that runs in milliseconds and one that takes seconds… or minutes… or just crashes your computer.

Organizing Data in Structures

Surya Sathi — Fri, 18 Jul 2025 18:30:00 GMT

Let's try to understand the place where your computer remembers everything, its memory. As this plays a huge part in building efficient software.

Bit: The Smallest Piece of Information

Let's start with the absolute smallest element your computer can remember. It's called a bit. Think of a bit like a switch. It can be either on or off. That's it! Just two states. In the world of computers, "on" is usually represented by the number 1, and "off" by the number 0.

The idea of using just two states wasn't invented for computers. Back in 17th century, a German mathematician and philosopher named Gottfried Wilhelm Leibniz was fascinated by this binary system. He saw it as a beautiful reflection of creation – everything coming from nothing and one (like God). He believed it could simplify all of logic and reasoning.

"Everything comes from nothing."

Byte: A Group of Bits

Now if you want to store something meaningful, you can group these ons and offs together in different combinations. When 8 bits are grouped together, it's called a byte. Like 00000000, 00000001, 11111111 etc. So a byte can represent 256 different combinations of 0s and 1s and hence 256 different values.

For example letter 'A' can be represented by one combination, letter 'B' by another combination, letter 'a' by some other combination and so on. This is how a computer stores everything. This is where you get the words KB, MB, GB, TB etc. from, when you store much larger files like notes, images, audio, video etc.

Data Structures: Organizing Data

If you think of the memory like a bookshelf filled with bytes, if you throw all the books around in some random order, it will be difficult to find anything, right? Same goes for computer memory. We have some specific ways of organizing data, that makes it easy to find, add or remove information. That is called data structures.

Think of data structures as different kinds of shelves or cabinets on our bookshelf, each designed for a specific purpose. Let's look at a few of them.

Array: A Simple List

Imagine you have a series of identical boxes, all lined up neatly in a row. Each box can hold one piece of information. And you know exactly where each box is because they are numbered: Box 1, Box 2, Box 3, and so on.

This is very similar to an array. An array is like a continuous block of memory where you store similar types of data, one after the other. Each piece of data has a specific "address" or position (like the box number), which makes it super-fast to find any specific piece of data if you know its position.

scores = [90, 85, 78, 92]
print(scores[1]) # outputs 85

Note: Python lists behave like dynamic arrays, but allow mixed data types unlike typical arrays in C/C++/Java.

Why is this useful?

If you have a list of student scores, an array is perfect. You can quickly get the score of the 5th student, or the 100th student, because you just jump directly to their position.

Linked List

Now, imagine you have a bunch of individual boxes, but they are not lined up neatly. Instead, each box has a little reference inside it that tells you where the next box in the sequence is located. So, Box A tells you where Box B is, Box B tells you where Box C is, and so on.

This is like a linked list. Unlike an array, where items are stored right next to each other, a linked list stores data items in different, possibly scattered, locations in memory. But each item (called a "node") has two parts: the actual data, and a little piece of information called a pointer that "points" to the next item in the list.

Pointer

Think of a pointer as a tiny sticky note with an address written on it. This address tells your computer exactly where to find another piece of data in its memory. It's like saying, "Go to shelf number 5, slot 3, and you'll find the next part of this story." Pointers are how different pieces of data, even if they are physically far apart in memory, can be linked together logically.

class Node:
    def init(self, data):
        self.data = data
        self.next = None

# Creating linked list: 10 -> 20 -> 30

node1 = Node(10)
node2 = Node(20)
node3 = Node(30)

node1.next = node2
node2.next = node3

current = node1
while current:
    print(current.data)
    current = current.next

# Outputs
# 10
# 20
# 30

# Inserting 40 between 20 and 30

node4 = Node(40)
node4 .next = node2.next
node2.next = node4

current = node1
while current:
    print(current.data)
    current = current.next

# Outputs
# 10
# 20
# 40
# 30

Why is this useful?

Imagine you have a large collection of data. If you use an array, and you want to insert a new piece in the middle, you'd have to shift every single piece after that to make space. But with a linked list, you just create a new piece, change the "next piece" tag of the previous piece to point to your new piece, and then make your new piece point to the next one in the original sequence. It's much easier to add or remove elements in the middle without shuffling everything around.

Stack: LIFO Abstraction on List

Imagine you're "stacking" plates. How do you usually stack them? You put one on top of the other, right? And when you want a plate, which one do you take first? The one on top. This "last in, first out" idea is exactly how a stack works.

Putting something on is called "pushing" onto the stack. You just add the new item right on top. Taking something off is called "popping" from the stack. You always remove the very last item you put on.

stack = []
stack.append("typed 'hello'")
stack.append("bolded text")
last_action = stack.pop()

print(last_action) # Outputs 'bolded text'

Why is this useful?

"Undo" action in a word processor. Each action you take (typing a letter, bolding text) is "pushed" onto an undo stack. When you hit undo, the last action is "popped" off and reversed.

Queue: FIFO Abstraction on List

If you're at a store, and there's a "queue" of people waiting to pay. Who gets served first? The person who arrived first. And who's next? The person who joined the line right after them.

This "first in, first out" (FIFO) idea is how a queue works. You add a new item to the back of the queue (enqueueing). When you remove one, it's from the front (dequeuing).

from collections import deque

queue = deque()
queue.append("Print job 1")
queue.append("Print job 2")
next_job = queue.popleft()

print(next_job) # Outpus 'Print job 1'

Why is this useful?

Queues are essential for managing tasks where order matters. When you browse the internet, your computer sends requests to websites. These requests might get put into a queue to be processed in order.

Tuple

This is an ordered collection, just like a list, where you can look at them, you can count them, but you can't change them, add new ones, or remove old ones from that specific list.

coordinates = (10, 20)
print(coordinates[0]) # outputs 10

Why is this useful?

Tuples aren’t just immutable lists. They’re often used when heterogeneity and data grouping are needed (e.g., (x, y) coordinates, function returns).

Dictionary

Think about a set of index cards. On each card, you write a keyword (like "Apple"). And then, on the same card, you write its definition (like "a round fruit"). If you want to find the definition of "Apple," you quickly look for the "Apple" card.

This "keyword and definition" idea is what we call a key-value pair.

Key: This is the unique identifier, like the keyword on the index card. It's what you use to look up the information.

Value: This is the actual data or information associated with that key, like the definition on the index card.

So, a dictionary is a collection of these key-value pairs.

person = {"name": "Alice", "age": 30}
print(person["age"]) # Outputs 30

Why is this useful?

When you look up a customer by their ID, or a product by its unique code, a hash map is very likely working behind the scenes.

When you define variables in a program (e.g., age = 30), the programming language often uses a hash map to store the variable name (age is the key) and its value (30 is the value).

Imagine a website that serves millions of users. Instead of going to the main database every time someone asks for a popular piece of information, the website might store that information in a "cache" using a hash map. The "key" is the request, and the "value" is the answer, allowing for lightning-fast delivery.

Tree

A tree data structure is a way of organizing data in a hierarchical fashion (like a family tree).

The very top item is called the root (like the main ancestor or the tree's base). Each item can have "children" (items below it, connected by a line), and these children can also have their own children, and so on. Items at the very bottom, with no children, are called leaves (like the leaves on a real tree).

class TreeNode:
    def init(self, data):
        self.data = data
        self.children = []

root = TreeNode("C:")
docs = TreeNode("Documents")
down = TreeNode("Downloads")
pics = TreeNode("Pictures")
movs = TreeNode("Movies")

root.children = [docs, down]
docs.children = [pics]
down.children = [movs]

def print_children(prefix, node):
    print(prefix, node.data)
    for child in node.children:
        print_children(prefix + '---', child)

print_children('', root)

# Outputs
#  C:
# --- Documents
# ------ Pictures
# --- Downloads
# ------ Movies

There are different types of trees like Binary Trees, Binary Search Trees etc.

Why is this useful? Trees are incredibly powerful for organizing data that has a natural hierarchy.

File Systems is a perfect example! Your main drive (C:) is the root, then you have folders like "Documents," "Pictures," "Programs," etc., which are its children. Each of those folders can contain more folders or files, extending the branches of the tree.

Graph

Imagine a map with many cities. Some cities are connected by roads, others are not. A road might go both ways, or just one way. And some roads might be longer or shorter (representing distance or time).

That's a graph! It's a collection of "nodes" (which are like our cities) and "edges" (which are like the roads connecting the cities). The edges can be one-way or two-way, and they can even have "weights" (like the distance of a road).

There are different types like directed/undirected, weighted/unweighted, cyclic/acyclic, sparse/dense graphs.

graph = {
    "A": ["B", "C"],
    "B": ["C"],
    "C": ["A"],
    "D": ["C"]
}

# This is a direct unweighted graph that says there is an edge from A to B, A to C and C to A but not B to A. Similarly there is an edge from D to C but not from C to D.

graph = {
    "HYD": [("BLR", 480), ("CHN", 600)],
    "BLR": [("HYD", 480), ("CHN", 360), ("KOC", 540)],
    "CHN": [("HYD", 600), ("BLR", 360)],
    "KOC": [("BLR", 540)]
}

# Now, this is an undirected weighted graph.

Why is this useful?

Graphs are for representing complex relationships like:

Social Networks: Think of Facebook or Instagram. Each person is a "node," and if two people are friends, there's an "edge" connecting them. You can use graphs to find out who's friends with whom, or to recommend new friends.

Maps: When you use Google Maps to find the shortest route between two places, it's using graph algorithms. Each intersection or landmark is a node, and the roads are edges, with their lengths as weights.

The Internet itself: Websites are nodes, and the links between them are edges! This is how search engines like Google "crawl" the web.

Closing

Understanding these different ways of organizing data is a huge step in truly understanding how computers work efficiently.

If you need to quickly add and remove items from the end (like undoing actions), a stack is great. But if you need to process tasks in the order they arrive (like print jobs), a queue is the way to go.

If you need to store hierarchical data (like files on your computer), a tree is ideal. But if you need to represent complex relationships (like friends on a social network), a graph is perfect.

It's not just about storing information; it's about storing it in a way that makes it easy and fast to find, add, delete, and process, which is at the heart of building powerful and responsive software.

"The art of memory is the art of attention." – Samuel Johnson

Why Are There So Many Programming Languages? - Part 2

Surya Sathi — Sun, 06 Jul 2025 18:30:00 GMT

"So, if C was such a powerful language, why didn't it remain the sole dominant language ruling every domain, every program, and every software ever created? Why were Python, JavaScript, Java, and Go and a hundred other languages created?"

This was the question which we ended the previous part with. To give you an analogy, it's like asking why we have bicycles, bikes, cars, ships and airplanes when horses could get us around. Each serves a different purpose.

The world kept expanding and trying to solve some of the problems with C is painfully long and inefficient. And this is where the story of specialization begins. Think of it like this, early humans did everything themselves - farming, building shelters, hunting and cooking. But as civilizations evolved, we got different people taking up different occupations, specializing in one thing - we got farmers, builders, doctors, chefs etc. Programming languages, sort of, followed the same path.

Rise of Object-Oriented Thinking

One of the biggest shift after C was OOP (Object-Oriented Programming). It is about writing code based on how we see the world. If you are to write the properties of a dog in C, you will do something like, dog_color, dog_weight, dog_name and some functions like walk_dog, feed_dog etc. But in OOP, we get all of this together by creating objects, which contain both properties and functions bundled.

"The whole is greater than the sum of its parts." - Aristotle

So in OOP, you'd define a Dog blueprint (which is called a Class) which says "Every dog has a name, age and a color. It can eat and walk and sleep.". Then, whenever you need to create a dog in your code, you will reference this class to create a dog object with that particular dog's properties. Every dog will have its own name, age and color. It's like having self-contained units.

This concept makes code much easier to understand, maintain and reuse in large programs. That said, it shouldn't always be the go to. Think before you apply. Sometimes simple functions can do :-)

C++: "C with Classes"

The first big step in that direction was C++ which was also created at Bell Labs, just like C. He initially called it C with Classes, because that's what it is.

C was good for system-level programming – things close to the hardware, like operating systems. But as software grew more complex, especially for large applications, managing all the interactions with just C became difficult. C++ allowed programmers to keep C's efficiency and control over hardware while making organization easier with OOP.

C++ became the go-to for complex applications, game development, and high-performance computing where speed mattered, but organization was also crucial.

While C++ was making big applications more manageable, a whole new area was opening up: the World Wide Web. Suddenly, people wanted programs that could run on different types of computers, over networks, and easily interact with information spread across the globe.

Java: "Write Once, Run Anywhere"

Created in the mid-1990s, it had a revolutionary idea at its core, "Write Once, Run Anywhere".

Remember how C programs needed to be compiled specifically for the type of computer it's running on? If you wrote a C program on Windows, it wouldn't run on Mac or Linux without recompiling it.

Java solved this by introducing something called a Java Virtual Machine (JVM) between the code and the system. It is like a universal translator mini-computer running on your actual computer. When you write Java code, it's compiled into an intermediate form called bytecode. This bytecode isn't tied to any specific type of computer. Instead, the JVM on any computer will take the trouble to understand and run this bytecode on the particular type of computer it is running on.

Java quickly became immensely popular for enterprise applications and large-scale systems because of its portability, security features and its strong support for OOP.

JavaScript: "Interactive Webpages"

Around the same time as Java, another language was born with the specific purpose of making webpages interactive. Before this, the webpages were plain HTML and CSS. You can view them and they were colorful as well but not interactive. Just text and images. Couldn't click on a button and have something happen in the page. JavaScript changed that by running on browsers and allowing dynamic content like menus and forms bringing life to websites.

Without JavaScript, the web as we know it today – with its rich, interactive experiences like social media, maps and streaming services – simply wouldn't exist.

Fun Fact: A common misconception is that Java and JavaScript are related because of the "Java" in the name. But they are two completely different languages made for different purposes. It was just a marketing gimmick by Netscape, the company that created JavaScript, to capitalize on the growing popularity of Java at the time.

Ober a decade later Node JS was also born with the intention of running JS on the servers, not just browsers, powering backend as well. So with Node JS, JavaScript became a full-stack language, meaning one can use same language for both frontend and backend applications.

This is an advantage, when you need faster development as you don't need a developer who knows different languages and one doesn't need to switch between two different languages when working on frontend and backend.

Python: "Code should be Readable"

Python, created in the late 1980s (though I am mentioning it after Java, it was created even before that), initially didn't have the explosive growth as Java. But it gained immense popularity later, due to its philosophy that code should be readable first and developers should spend more time on actual logic with fewer lines than C and CPP.

Python also transforms the code to bytecode behind the scenes and runs it on Python Virtual Machine (PVM), but it doesn't involve an explicit compilation step like Java, and does the compilation on the fly, leading to its tag of "interpreted language". Meaning, if you write a code with Java, due to its static typing and explicit compilation step, you will be able to catch many issues before runtime.

But still, Python is much more easier to learn, and it has a vast ecosystem with countless pre-built libraries for data science, AI, web development etc. Due to its simple syntax and the libraries available, you can easily prototype and test your ideas.

Python became the language of choice for data scientists, AI researchers and basically anyone who needed to get things done quickly without sacrificing much performance.

Go: "Minimalism and Concurrency"

Google faced a massive problem: their systems were incredibly complex, highly distributed, and needed to handle millions of requests concurrently (at the same time). Existing languages often made this difficult or error-prone. So, Go, often called Golang, was created at Google in 2009.

Its philosophy is it be simple, efficient and has concurrency as a core part of its design with "Goroutines". While other languages do support concurrency with the concept of "threads", Goroutines are generally much more lighter (consume less CPU and memory) and faster.

It is very fast in compilation, its built-in concurrency features make it ideal for building things like web servers and microservices (small, independent parts of a large application) and it avoids many complex features found in other languages (like inheritance from traditional OOP), making it easier to maintain by introducing new ways to handle the same things.

So, Why So Many?

Just like one wouldn't go to a dentist for a stomach problem, different problems require different specializations with different tools. C++ is great for operation systems, JS is great for webpages, Python is great for data science and Go is great for highly concurrent services. Each language comes with a set of rules and philosophy, some prioritize speed, some readability and others safety. Just like a vibrant community and libraries made Python a de facto standard language for data science, the ecosystem plays a role in which language you choose for your application.

As the world evolves, needs change and new languages will keep emerging.

You don't need to learn every language you heard of. Knowing, why a language exists, what problem it's trying to solve, its strengths and weaknesses, helps you choose the right tool for the job.

"For every tool there is a task, and a tool for every task."

Why Are There So Many Programming Languages? - Part 1

Surya Sathi — Thu, 19 Jun 2025 18:30:00 GMT

Abhay just joined his bachelor's program, and he started learning to code during his first year of college. Being a complete beginner, he thought:

"Once I learn a programming language, I'm good to go."

He began with Python because his friends said that it is clean, readable, and easy. Then came C in his college course. It felt raw and mechanical, with memory pointers and manual management. Later, he had to pick up JavaScript, for a side project. His classmates were learning Java, Go, and Rust, and he was feeling pressured to learn them as well.

That's when he paused and thought:

"Why are there so many programming languages? And why do people keep inventing new ones instead of just improving the old ones?"

Now, to make a list, we have C, C++, Python, Java, JavaScript, R, Rust, Go, Ruby, Kotlin, .NET, etc. The names above are just the ones I have heard of and can recall at the moment. There are many, many more. And people needed to learn multiple languages to build proper applications.

Now the question

Why..? Why make life hard with so many of them?

To understand this better, try to think of the theory of evolution, not the biological kind in full, but an evolution guided by humans. Life evolved from single-celled organisms to multicellular organisms living in water, some stepped onto land, evolved into multiple types of organisms, each having its own features to fit and survive in the environment it's living in. Among them are primates, which evolved into apes, a particular species then evolved into humans.

So, once humans were here, did all the other apes disappear? Or did the fish just vanish because some organism stepped onto land? Of course not. Each species adapted to its niche.

Though programming is a human-driven evolution, similar to the biological one, one language didn't replace the other. One after another, solved different problems - scientific, business, systems, web and so on. Some got obsolete - yes, similar to the species that couldn't adapt. But those that stuck around have accumulated and now we have a variety of languages existing together.

The Evolution of Computer Language

Where it all started

Now, before stepping into programming languages, let's start with where it all started: Computation. Alan Turing introduced a concept called the Turing Machine, a mathematical model that could simulate the logic of any algorithm. The idea is to prove that any computable problem can be solved by a general purpose machine, what we now think of as separating what needs to be done (a software in today's terms) from the machine executing it (the bare metal, hardware). This is one of the most important contributions to the computer world as nearly all modern programming languages are Turing-complete.

Painful but still a proof of concept

Turing helped design a machine, Bombe (it was not based on the Turing Machine model though), during World War II, to decipher German codes. You can say it's the first attempt to automate logic-based tasks. Though he had a tragic death, he did a great service to humanity with his inventions. Around the same time, ENIAC was born. One had to rewire physical cables and flip switches to program it. Here, programs weren't lines of code but hardware configurations.

Architecture for programming

Then came the Von Neumann Architecture, which introduced the idea of storing both data and instructions in memory. Here, the term "program" takes on a better meaning. One can load different instructions into memory without rewiring the machine like ENIAC. You just need to change the bits in the memory.

BUT, you need to write the program in pure binary, 1s and 0s. Very painful and error-prone, it led to the creation of assembly languages, giving symbolic, human-readable names to instructions and memory locations. It is still extremely low-level but better than writing binary. But there was a different assembly for every processor.

Let's make it readable

But the idea took hold: why shouldn't humans write human-readable terms and let the machine do the work of translating it to 1s and 0s? That is the idea behind a compiler.

Making it easy for scientists

Initially, IBM created FORTRAN (FORmula TRANslation - 1957), as the name suggests, to translate algebraic formulas into machine code to help scientists write math-heavy programs without needing to know assembly language. This made it possible to write expressions like X = A + B \ C*; then the compiler will turn it into machine code and make them work.

Making it easy for businesses

Then there was COBOL (COmmon Business Oriented Language - 1959), which was more business-focused. COBOL emphasized English-like syntax with syntax like IF HOURS > 40 THEN COMPUTE OVERTIME-PAY = HOURS \ OTPAY*, so non-programmers can read it better. It is said to have dominated enterprise applications and to still be found in some legacy systems.

Laying the foundation

Before COBOL, there was Lisp (1958), using lists (catch it in the name). It was the first time implement garbage collection, meaning a language can automatically manage memory without you needing free it when you no longer need it, popularized recursion, and the idea of functional programming. It is said to be the foundation of early AI programming.

There was also ALGOL (ALGOrithmic Language - 1960), which isn't as widely used commercially, but introduced language design with the idea of scopes, block structure with a beginning and an end, and a syntax that influenced nearly all future languages. It was the first to separate syntax, how the program looks, from semantics, what it does. This leads to the development of C.

Next came BASIC (Beginner's All-purpose Symbolic Instruction Code - 1964), programming for everyone. It was meant for education, simple enough to teach students. It exploded on home computers and helped democratize programming.

Software Crisis

Then came the period known as the Software Crisis. Computers were evolving rapidly, but software wasn't. Codebases became messy and hard to maintain everywhere, a phenomenon later became known as 'spaghetti code'; there wasn't a structured discipline which led to a need for structured programming with type safety and better compilers.

Around this time, at Bell Labs, Ken Thompson needed a language to rewrite some parts of the UNIX OS. So he created B from BCPL. It was typeless and minimalistic. Dennis Ritchie, working with Thompson, extended B into C, adding types, structure, better memory control, etc., for systems development. They used it to rewrite the UNIX kernel.

Why C?

C (1972) succeeded because it hit the sweet spot. It talks directly to hardware, similar to assembly language, so it is efficient and powerful. But abstract and readable enough to manage larger programs with variables and data structures, algebraic expressions, control flow (conditional statements and iterations), reuse with function calls, and memory management with pointers, etc.

Also, it was highly portable compared to its predecessors; you can write it once and compile it on other machines (though with minor changes). It follows structured programming with scopes and other concepts from ALGOL.

C is one of the oldest programming languages that you can still find widely being taught. While many people recommend that beginners start their programming journey with Python, as it is much more readable and very high-level, I personally think C is better for starting, as I believe it is the most grounded one, without any wrappers, which helps you understand what programming is and how logic works at a lower level. It can be brutal though.

What's Next?

So, if C was such a powerful language, why didn't it remain the sole dominant language ruling every domain, every program, and every software ever created? Why were Python, JavaScript, Java, and Go and a hundred other languages created?

To answer that, we need to look at why programming languages began not just evolving, but diverging, a phenomenon you must have observed by now in this article, even after the rise of C. And that's what we will explore in "Why Are There So Many Programming Languages? - Part 2." We will go over how different programming languages emerged one after the other, over time, each designed to solve specific challenges, reflect unique philosophies, or serve a particular domain or community, ultimately understanding why we need a multitude of programming languages.