Organizing Data in Structures

Let's try to understand the place where your computer remembers everything, its memory. As this plays a huge part in building efficient software.

Bit: The Smallest Piece of Information

Let's start with the absolute smallest element your computer can remember. It's called a bit. Think of a bit like a switch. It can be either on or off. That's it! Just two states. In the world of computers, "on" is usually represented by the number 1, and "off" by the number 0.

The idea of using just two states wasn't invented for computers. Back in 17th century, a German mathematician and philosopher named Gottfried Wilhelm Leibniz was fascinated by this binary system. He saw it as a beautiful reflection of creation – everything coming from nothing and one (like God). He believed it could simplify all of logic and reasoning.

"Everything comes from nothing."

Byte: A Group of Bits

Now if you want to store something meaningful, you can group these ons and offs together in different combinations. When 8 bits are grouped together, it's called a byte. Like 00000000, 00000001, 11111111 etc. So a byte can represent 256 different combinations of 0s and 1s and hence 256 different values.

For example letter 'A' can be represented by one combination, letter 'B' by another combination, letter 'a' by some other combination and so on. This is how a computer stores everything. This is where you get the words KB, MB, GB, TB etc. from, when you store much larger files like notes, images, audio, video etc.

Data Structures: Organizing Data

If you think of the memory like a bookshelf filled with bytes, if you throw all the books around in some random order, it will be difficult to find anything, right? Same goes for computer memory. We have some specific ways of organizing data, that makes it easy to find, add or remove information. That is called data structures.

Think of data structures as different kinds of shelves or cabinets on our bookshelf, each designed for a specific purpose. Let's look at a few of them.

Array: A Simple List

Imagine you have a series of identical boxes, all lined up neatly in a row. Each box can hold one piece of information. And you know exactly where each box is because they are numbered: Box 1, Box 2, Box 3, and so on.

This is very similar to an array. An array is like a continuous block of memory where you store similar types of data, one after the other. Each piece of data has a specific "address" or position (like the box number), which makes it super-fast to find any specific piece of data if you know its position.

scores = [90, 85, 78, 92]
print(scores[1]) # outputs 85

Note: Python lists behave like dynamic arrays, but allow mixed data types unlike typical arrays in C/C++/Java.

Why is this useful?

If you have a list of student scores, an array is perfect. You can quickly get the score of the 5th student, or the 100th student, because you just jump directly to their position.

Linked List

Now, imagine you have a bunch of individual boxes, but they are not lined up neatly. Instead, each box has a little reference inside it that tells you where the next box in the sequence is located. So, Box A tells you where Box B is, Box B tells you where Box C is, and so on.

This is like a linked list. Unlike an array, where items are stored right next to each other, a linked list stores data items in different, possibly scattered, locations in memory. But each item (called a "node") has two parts: the actual data, and a little piece of information called a pointer that "points" to the next item in the list.

Pointer

Think of a pointer as a tiny sticky note with an address written on it. This address tells your computer exactly where to find another piece of data in its memory. It's like saying, "Go to shelf number 5, slot 3, and you'll find the next part of this story." Pointers are how different pieces of data, even if they are physically far apart in memory, can be linked together logically.

class Node:
    def init(self, data):
        self.data = data
        self.next = None

# Creating linked list: 10 -> 20 -> 30

node1 = Node(10)
node2 = Node(20)
node3 = Node(30)

node1.next = node2
node2.next = node3

current = node1
while current:
    print(current.data)
    current = current.next

# Outputs
# 10
# 20
# 30

# Inserting 40 between 20 and 30

node4 = Node(40)
node4 .next = node2.next
node2.next = node4

current = node1
while current:
    print(current.data)
    current = current.next

# Outputs
# 10
# 20
# 40
# 30

Why is this useful?

Imagine you have a large collection of data. If you use an array, and you want to insert a new piece in the middle, you'd have to shift every single piece after that to make space. But with a linked list, you just create a new piece, change the "next piece" tag of the previous piece to point to your new piece, and then make your new piece point to the next one in the original sequence. It's much easier to add or remove elements in the middle without shuffling everything around.

Stack: LIFO Abstraction on List

Imagine you're "stacking" plates. How do you usually stack them? You put one on top of the other, right? And when you want a plate, which one do you take first? The one on top. This "last in, first out" idea is exactly how a stack works.

Putting something on is called "pushing" onto the stack. You just add the new item right on top. Taking something off is called "popping" from the stack. You always remove the very last item you put on.

stack = []
stack.append("typed 'hello'")
stack.append("bolded text")
last_action = stack.pop()

print(last_action) # Outputs 'bolded text'

Why is this useful?

"Undo" action in a word processor. Each action you take (typing a letter, bolding text) is "pushed" onto an undo stack. When you hit undo, the last action is "popped" off and reversed.

Queue: FIFO Abstraction on List

If you're at a store, and there's a "queue" of people waiting to pay. Who gets served first? The person who arrived first. And who's next? The person who joined the line right after them.

This "first in, first out" (FIFO) idea is how a queue works. You add a new item to the back of the queue (enqueueing). When you remove one, it's from the front (dequeuing).

from collections import deque

queue = deque()
queue.append("Print job 1")
queue.append("Print job 2")
next_job = queue.popleft()

print(next_job) # Outpus 'Print job 1'

Why is this useful?

Queues are essential for managing tasks where order matters. When you browse the internet, your computer sends requests to websites. These requests might get put into a queue to be processed in order.

Tuple

This is an ordered collection, just like a list, where you can look at them, you can count them, but you can't change them, add new ones, or remove old ones from that specific list.

coordinates = (10, 20)
print(coordinates[0]) # outputs 10

Why is this useful?

Tuples aren’t just immutable lists. They’re often used when heterogeneity and data grouping are needed (e.g., (x, y) coordinates, function returns).

Dictionary

Think about a set of index cards. On each card, you write a keyword (like "Apple"). And then, on the same card, you write its definition (like "a round fruit"). If you want to find the definition of "Apple," you quickly look for the "Apple" card.

This "keyword and definition" idea is what we call a key-value pair.

Key: This is the unique identifier, like the keyword on the index card. It's what you use to look up the information.

Value: This is the actual data or information associated with that key, like the definition on the index card.

So, a dictionary is a collection of these key-value pairs.

person = {"name": "Alice", "age": 30}
print(person["age"]) # Outputs 30

Why is this useful?

When you look up a customer by their ID, or a product by its unique code, a hash map is very likely working behind the scenes.

When you define variables in a program (e.g., age = 30), the programming language often uses a hash map to store the variable name (age is the key) and its value (30 is the value).

Imagine a website that serves millions of users. Instead of going to the main database every time someone asks for a popular piece of information, the website might store that information in a "cache" using a hash map. The "key" is the request, and the "value" is the answer, allowing for lightning-fast delivery.

Tree

A tree data structure is a way of organizing data in a hierarchical fashion (like a family tree).

The very top item is called the root (like the main ancestor or the tree's base). Each item can have "children" (items below it, connected by a line), and these children can also have their own children, and so on. Items at the very bottom, with no children, are called leaves (like the leaves on a real tree).

class TreeNode:
    def init(self, data):
        self.data = data
        self.children = []

root = TreeNode("C:")
docs = TreeNode("Documents")
down = TreeNode("Downloads")
pics = TreeNode("Pictures")
movs = TreeNode("Movies")

root.children = [docs, down]
docs.children = [pics]
down.children = [movs]

def print_children(prefix, node):
    print(prefix, node.data)
    for child in node.children:
        print_children(prefix + '---', child)

print_children('', root)

# Outputs
#  C:
# --- Documents
# ------ Pictures
# --- Downloads
# ------ Movies

There are different types of trees like Binary Trees, Binary Search Trees etc.

Why is this useful? Trees are incredibly powerful for organizing data that has a natural hierarchy.

File Systems is a perfect example! Your main drive (C:) is the root, then you have folders like "Documents," "Pictures," "Programs," etc., which are its children. Each of those folders can contain more folders or files, extending the branches of the tree.

Graph

Imagine a map with many cities. Some cities are connected by roads, others are not. A road might go both ways, or just one way. And some roads might be longer or shorter (representing distance or time).

That's a graph! It's a collection of "nodes" (which are like our cities) and "edges" (which are like the roads connecting the cities). The edges can be one-way or two-way, and they can even have "weights" (like the distance of a road).

There are different types like directed/undirected, weighted/unweighted, cyclic/acyclic, sparse/dense graphs.

graph = {
    "A": ["B", "C"],
    "B": ["C"],
    "C": ["A"],
    "D": ["C"]
}

# This is a direct unweighted graph that says there is an edge from A to B, A to C and C to A but not B to A. Similarly there is an edge from D to C but not from C to D.

graph = {
    "HYD": [("BLR", 480), ("CHN", 600)],
    "BLR": [("HYD", 480), ("CHN", 360), ("KOC", 540)],
    "CHN": [("HYD", 600), ("BLR", 360)],
    "KOC": [("BLR", 540)]
}

# Now, this is an undirected weighted graph.

Why is this useful?

Graphs are for representing complex relationships like:

Social Networks: Think of Facebook or Instagram. Each person is a "node," and if two people are friends, there's an "edge" connecting them. You can use graphs to find out who's friends with whom, or to recommend new friends.

Maps: When you use Google Maps to find the shortest route between two places, it's using graph algorithms. Each intersection or landmark is a node, and the roads are edges, with their lengths as weights.

The Internet itself: Websites are nodes, and the links between them are edges! This is how search engines like Google "crawl" the web.

Closing

Understanding these different ways of organizing data is a huge step in truly understanding how computers work efficiently.

If you need to quickly add and remove items from the end (like undoing actions), a stack is great. But if you need to process tasks in the order they arrive (like print jobs), a queue is the way to go.

If you need to store hierarchical data (like files on your computer), a tree is ideal. But if you need to represent complex relationships (like friends on a social network), a graph is perfect.

It's not just about storing information; it's about storing it in a way that makes it easy and fast to find, add, delete, and process, which is at the heart of building powerful and responsive software.

"The art of memory is the art of attention." – Samuel Johnson

Organizing Data in Structures

Bit: The Smallest Piece of Information

Byte: A Group of Bits

Data Structures: Organizing Data

Array: A Simple List

Linked List

Stack: LIFO Abstraction on List

Queue: FIFO Abstraction on List

Tuple

Dictionary

Tree

Graph

Closing

Comments

Foundations

Understanding Big O

More from this blog

Consistency Models

Exactly-Once Delivery is Impossible

Idempotency

Why Distributed Systems Can't Know What's Happening

Caching: Performance vs Consistency

Command Palette

Bit: The Smallest Piece of Information

Byte: A Group of Bits

Data Structures: Organizing Data

Array: A Simple List

Linked List

Stack: LIFO Abstraction on List

Queue: FIFO Abstraction on List

Tuple

Dictionary

Tree

Graph

Closing

Comments

Foundations

Understanding Big O

More from this blog