Threading and TCP Connection Management in Backend Applications

Name: 5 Backend Design Patterns for Managing Threads and Sockets
Uploaded: 2026-01-19T18:11:45.644761+00:00
Channel: Hussein Nasser
Description: Summary and key takeaways on Threading and TCP Connection Management in Backend Applications, covering Introduction In this episode of the Back Engine Show

Hussein Nasser

Jan 19, 2026

•

4 min read

YouTube video ID: x9iHwoAbwiA

Source: YouTube video by Hussein Nasser — Watch original video

PDF

Introduction

In this episode of the Back Engine Show, host Hussein Nelson dives deep into how multi‑threaded applications handle networking, with a focus on TCP connection management. He explains why accepting and managing many client connections is a core challenge for backend services such as web servers, SSH servers, custom protocols, gRPC, and more.

From Single‑Core to Multi‑Core CPUs

Early days: One CPU, one process – the process monopolized the sole core.
Modern CPUs: Multiple cores (dual, quad, octa, etc.) allow a single process to run on separate cores, reducing contention between applications.
Threading emergence: Developers added multiple worker threads to a single process so each thread could run on a different core, achieving parallelism.

The Downside of Multithreading

Shared memory contention – All threads of a process share the same heap. Simultaneous reads/writes to the same variable cause race conditions.
Example: Two threads increment a counter; without proper locking both read 0, increment to 1, and store 1 – the expected 2 never appears.
Complex resource management – Mutexes/locks are required to serialize access, adding overhead and making the code harder to reason about.
Load‑balancing issues – Threads may receive unequal workloads; a “greedy” client can overload one thread while others stay idle, leading to unfair CPU usage.

TCP Listening, Queues, and the OS

When an application calls listen(port), the OS creates two kernel queues:
SYN queue – Holds half‑opened connections (SYN packets) awaiting the three‑way handshake.
Accept queue – Holds fully established connections ready for the application to accept().
The default listen address 0.0.0.0 (or :: for IPv6) binds to all network interfaces, which can unintentionally expose admin APIs to the public internet.
A full SYN flood can fill the SYN queue, preventing legitimate connections.

Single‑Threaded Model (e.g., Node.js Networking)

Node.js runs the networking stack on a single thread; the event loop continuously calls accept() and dispatches I/O events.
If a request triggers a blocking operation (heavy computation, synchronous DB call, etc.), the listener thread stalls, becoming a bottleneck.

Multithreaded Architectures

One listener thread + worker thread per connection – Accepts a socket, spawns a new thread to handle it. Works up to a limited thread count; beyond that you get thread explosion.
One listener thread + thread pool – Listener accepts connections and hands the file descriptor to a pool of worker threads. Each thread may handle many connections, improving scalability.
Listener thread reads, workers process – The listener reads the request, determines its cost, and forwards the request to a worker thread for heavy processing. This separates I/O from CPU‑intensive work and enables dynamic load balancing.
Multiple listener threads via SO_REUSEPORT – By enabling the socket option, several threads (or processes) can bind to the same port and each call accept() concurrently, dramatically increasing accept throughput. Proxies like HAProxy and Envoy use this technique.
Container‑level horizontal scaling – Run many single‑threaded instances in separate containers, each on a different port, and front them with a layer‑4 load balancer or iptables DNAT rules. This keeps each instance simple while leveraging all CPU cores.

Choosing the Right Model

Workload profile: I/O‑bound services benefit from event‑driven single‑threaded designs; CPU‑bound workloads need worker threads or processes.
Resource limits: Thread stacks consume memory; a thread‑per‑connection model can exhaust RAM under high load.
Complexity vs. performance: Adding mutexes, thread pools, and load‑balancing logic increases code complexity. Simpler designs are easier to maintain but may require more hardware (e.g., more containers).
Security considerations: Always bind services to the minimal required interfaces; avoid the default 0.0.0.0 unless you truly need public exposure.

Practical Tips

Set an appropriate backlog size when calling listen() to avoid SYN queue overflow.
Use SO_REUSEPORT when you need high accept rates and your language/runtime supports it.
Prefer asynchronous I/O (e.g., io_uring on Linux) for high‑throughput servers.
Profile your application to determine whether it is CPU‑bound or I/O‑bound and choose threading or async models accordingly.

Conclusion

Understanding how the operating system handles TCP handshakes, queues, and socket acceptance is essential for building scalable backend services. Multithreading can unlock multi‑core performance, but it introduces shared‑memory contention, load‑balancing challenges, and added complexity. By selecting the appropriate architecture—whether a single‑threaded event loop, a thread‑pool model, SO_REUSEPORT listeners, or container‑level horizontal scaling—you can balance performance, simplicity, and security for your specific workload.

Effective TCP connection management hinges on matching your application's workload characteristics with the right concurrency model—balancing multi‑core utilization, thread safety, and simplicity to achieve scalable, secure backend services.

Frequently Asked Questions

Who is Hussein Nasser on YouTube?

Hussein Nasser is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

The Linux Programming Interface Book Recommended

Comprehensive guide to Linux system calls, sockets, and multithreading; helps readers implement safe thread synchronization and understand kernel queues for TCP handling

Amazon →

Go Concurrency Patterns Book

Provides practical examples of goroutine pools, channel‑based load balancing, and using SO_REUSEPORT in Go; ideal for building high‑performance network services

Amazon →

Aws Ec2 Instance With 16 Vcpus

Offers a readily available multi‑core environment to test and benchmark different threading and container scaling strategies discussed in the episode

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

In this episode of the back engine show,
I would like to talk about threading
multi-threaded application specifically
within the context of networking and
connection management. To be more
specific even than that, TCP connection
management. It's very critical in
backend applications that you have a
socket that you listen to. Whether this
is a web server, SSH server, whether
this is a custommade
protocol that you built, gRPC, you know,
any other protocol, the challenge
becomes how do you accept connections
from clients and how much can a single
box manage all these connections from
all these clients. This is what I want
to talk about in this episode. Let's
jump into it. Welcome to the back and
engineering show with your host Hussein
Nelson. And this is our series, our
laidback series where we sit down and
and discuss u interesting topics and
specifically to backend engineering. Uh
it's a it's a podcast so you can listen
to it on your favorite podcast player.
So if you like this kind of content,
consider subscribing to this channel and
check us out on the Spotify and Apple
podcast. With that out of the way, let's
get into it. In the early days, very
very very early days of computing
when you spin up a process and you had a
single CPU on your host machine and that
process executes certain task. Let's say
it accepts a connection and that
connection now has some sort of a
request. Let's say it's an HTTP request.
Once a uh determines where the request
starts and where the request ends that
logic of the translation of a request
will be taken to the application and
then application start processing it
whatever that means you know if it's a
get slash API that will make a request
to some other database that is in
somewhere else so we'll establish
another connection to other database
sends a request since the SQL command
regard Regardless what it is the
processing some of the processing will
be
localized within that instance. So it
will consume CPU
power from that host. Some of the kind
of the request will be not CPU bound
maybe IO bound whether this is
a network call or a desk call hey I'm
going to the desk. So that's why very
important to understand the nature of
your backend and whether it's does it
cost CPU or does it cost I/IO and this
is an episode by itself you know because
you can scale differently based on that
but regardless if you assume it's a CPU
inensive app where you're you're doing
processing in the machine itself right
even after sending a request to to the
database getting response. You kind of
doing a localized processing even if you
don't know it. You're using you're
probably using a library that does that
kind of processing especially the
serialization deserialization that's
costly. Encryption decryption of TLS all
of this stuff is happening without us
knowing. And uh I try as much as
possible at least this is for myself to
erase
all this uh am ambiguity and and
you know the vagueness of anything that
I use by understanding what every single
thing I use what is actually doing right
it's not everyone's cup of tea I
understand but I like to understand
everything I use. That's just me. It's
just gives you it keeps your eye open.
In the old days when you have this
single core and you have single process
that core will be occupied
to your process, right? And you might
your host might have multiple processes
and they are sharing you know time
sharing this CPU and like all right
let's stop there. I'm done. Uh take over
CPU, right? Take over the next process
process three. you can take over and the
operating system is scheduling these
things. You know, move few years, move
few more years, a decade maybe in the
future and now we
were able to make CPUs more
powerful, you know, uh we have more
powerful CPUs. The single core is
powerful. Move
a little bit forward and now we have the
ability to add multiple cores in a
processor. So you have a processor but
that processor will have multiple cores.
So there's dual cores technically think
of two CPUs. No, and we have four cores,
eight cores, so on. With that in mind,
we don't have contention between
different applications now because if I
if I my single process app will get a
core and the other host processes can
use other cores. That's pretty neat. I
no longer share one core between all the
processing. But developers thought about
it and says, "Aha,
that sounds like a great idea.
What if my app I'm greedy.
I am greedy.
My app is a single process. But what if
my app actually consists of multiple
processes or multiple threads? Right? A
process and a thread is very it's almost
like a splitting hairs when it comes to
like process and a thread. Especially in
Linux, I think a process is a thread.
It's just like they share the same
memory sort of to speak, right?
So what people invented was says, "All
right, let's just spin up multiple
threads." Yeah. So multiple worker
threads and we have one main thread and
let them do the work in parallel. Why?
Because now not only I have access to
one core, my multiple threads can
utilize multiple cores. you know at the
same time. So now a lot of people move
to multi-threading
because of the performance benefit they
might they might get right
because now I can share multiple CPU if
a single process needs x amount of CPU
and I can parallelize that work let's
spin up multiple threads
right and let
divide this work and let them all spin
up their one task and they execute these
task in parallel concurrently if you
will.
That was a revelation. So now we are
using multiple course so the app is is
faster
but just like any
human
evolution
nothing comes
without its own problems. Almost every
solution we create as engineers comes
with its own downsides. I can't think of
anything that we created,
you know, software engineering wise that
didn't come with its own downside. It's
always always the case. What's the
problem with multi-threading? Two things
that I can think of. First, the
management of the threads and resource
access. We mentioned that when you spin
up a process, you are allocated certain
amount of memory, right? It's called the
heap. You can dump your stuff there. But
then and and we never had this problem
before with a single process because a
single process is a single process. You
know, it's when a single process want to
write a read write read a variable, it
can go ahead and read that variable.
When it want to write to that variable,
nobody's writing to that variable except
itself. But with multi-threading
all these threads shares the same
memory. It's a shared memory when it
comes to just that process. But the
moment you have a shared memory between
these threads, those guys
start competing
on these resources because no two thing
no two threads can access the same
variable at the same time. You might say
why they can. Sure they can. Let them do
that. But you get undesired results.
This is a whole thing I talk about in my
database course when it comes to like
acid thing like automicity, consistency,
isolation and durability. We have the
same problem there, right? Cuz we are a
concurrent system database after all. So
you have two transactions trying to
update the same row. What does what does
that mean? What is what do we do? So the
simplest thing to do is to acquire a
mutex or a lock. I think it's the same
thing. No, where you if a thread wants
to write something, it acquires a mutex
on it. It locks it says hey this
variable I'm about to write to it.
Nobody can write to it or nobody can
read to it at all. So if another th
is blocked. So
man the management of this stuff is
absolutely challenging. A lot of people
liked it in the beginning, but the more
they got into it, the more complex your
app becomes. Now things that you used to
not worry about, now you have to worry
about them at the cost of an additional
CPU. So you finding yourself serializing
things. So So you the multi-threaded
apps all of a sudden now if this if
these threads are completely isolated,
you you won the jackpot. But if they
need to access the same variable which
guess what almost most of the time
you're gonna need to access the same
variable either to read or to write to
increment a value even increment is a
very hard problem to solve like how do
you increment something you have to
serialize it and when I say serialize
that means you have to lock it so that
the other thread cannot they cannot both
of the time let's take an example let's
say increment the value the variable fu
Right?
If you have two threads that does the
increment, both of them will read the
value. Both of them will read zero. Both
of them will increment it and then both
of them will store one. That's not
correct, right? Because incrementing in
that particular case should give you 0,
one, and two. Instead, you got one. So
that's just a simple example of where
things can go wrong. Yeah. But so now we
talked about multiro multi-threading.
uh of one of the problems of multi is
the management of the second problem
that I think of is uh isolation
in in a bad way. If every thread is
running in isolation,
we don't know
what the workload of these threads. We
don't know if this thread
is overloaded compared to this thread
that is might not be overloaded. So as a
result you might not have even load
balancing between these threads right so
in order to do that you have to
introduce a manager a coordinator right
more complexity so why am I talking
about multi-threading right we all know
what multi- threading is but it I
thought it's very critical to talk about
and then we going to link it back to
socket management and connection
management here you see when you have a
web application let's say you built your
own app from using C or go
and you have a single thread and you
said hey I want to listen on port 80
that's a web app HTTP so when you listen
on port 80
what happen is you're telling the
operating system that hey on this
particular IP address I'm listening to
port 80 and you can specify which IP
address might say what do you what does
that mean I should have only one IP
address nope you have so many IP
addresses on your machine you have the
loop back that's an IP address you have
you might have an Ethernet that has an
IP address might have a Wi-Fi that has
an IP address
might have a a Docker
bridged in uh interface neck. You might
have a virtual neck. You might have
another Ethernet port. Yeah. And all all
of these network card, another neck, I
mean can have their own IP address. They
have their own connected to their own
gateway and they have another completely
different IP and a different subnet. So
when you listen in a specific interface,
right, you can listen on all of them if
you want and sadly that's the default.
Listening is very expensive and I I I I
really
I'm really worried that the default when
you don't specify hey listen 80 even in
Node.js most apps when you listen it's
listening on all interfaces.
Why
I would love I guess they did it for
simplicity but just like anything in
engineering the mo if the moment you
simplify the developer experience by
making the code easier
you're introducing you're hiding
abstractions
right you're introducing abstraction
which hides the complexity of these
interfaces Okay.
And this is a perfect example. When you
just listen on port 80, I know I'm going
all the over the place, but I think it's
all related. So if you listen on port
80, which is the default like without
and was specifying a host, what will
happen is it will listen to an IP a
pseudo IP address called 0.0.0.0,
right? Which means listen on all
interfaces.
And to nitpick, actually I think it
listens to all IPv4
interfaces,
right? If you do FFF
FFF, that that listens to uh all IPv6. I
might be wrong on that one, but just
just a guess, right? So this is all all
interfaces.
Why? What if you're building like an
admin API, right?
And this admin API should only be
accessed within the machine itself
or within a specific
interface. So if that host happened to
have a public IP address and you wrote
your application in a way that such that
it listens to all IP address by default
then you just expose your admin API to
the public.
That's why
>> [laughter]
>> That's why all all all these leaks
happen with elastic search leak and
MongoDB leak and Postgress leak right
because when you listen when posress
listen to when MongoDB listens it
listens to all IP addresses I think the
default should be changed the default
should be hey you tell me which
interface to listen to and I understand
is that's not convenient end for
programming.
But [sighs]
I think we at some point point we should
stop simplifying everything cuz that's
not the way to go, right? Just
simplifying everything because
eventually you're going to get you're
going to get bit in the ass. That's what
is going to happen, right? Yeah. You
simplify the API and that's true for
everything we doing in software
engineering. Look at all the countless
libraries all competing to
make the code shorter.
Instead of writing, oh, my code is only
five lines of code. Oh, my code is three
line of code. My code is one line of
code. In one line of code, you can do
all of this stuff. these things really
scares me because you the you know the
developer who's going to use this. You
have no clue what's going on behind that
one line of code, you know, and that is
really creepy, right? Hey, if you know
what's going on, all power to you. But
if you don't and you're just using an
app and there, hey, one line of code and
voila, I built Twitter.
That's a whole thing by itself. I don't
know. Yeah, I know. I know. We'll we'll
come back to the point. Yeah. So
listening on port we talked about all
these IPv4 thing IP interfaces but we
listened we have a listener and it's a
single thread listener.
So when you listen
what happen is the operating system will
allocate
call the backlog for you the queue if
you will. Yeah,
again this is just TCP. Let's not go
through UDP right now cuz HTTP1 and
HTTP2 is TCP. So let's just just assume
TCP for now. If you listen the operating
system will will allocate a Q for you
and you can specify the length of this
queue. I think it's th00and by default
and that Q
is in the kernel memory.
So you're here at the user space you
listen your application is running you
asked the operating system hey I'm
listening on port 80 the OS will create
all right it says okay I'm listening on
the loop back 1277001
right let's say I'm I'm I'm practicing
hygiene here and I only listen to the
loop back because I don't really need to
listen anything else so the will create
this two cues for us something called
the sin Q and something called a accept
Q, right? What are these? Well, we
talked about how the TCP works, right?
There is a sin, SAC, and then act, which
is the three-way handshake. So every
time a client want to connect to your
server on that specific IP address on
that specific port which is 80,
it will need to send a send packet TCP
segment which is carried in an IP packet
and is sent to that the operating system
receives it through the network
interface
controller,
right? Or some people like to call it
card, network interface card. Same
thing, right?
That network card will take that frame
and then package it up into an IP packet
and then package it up to a TCP signal,
then ship it to the to the operating
system. And I think it doesn't even do
that. It just takes the frame. Hey, is
it is it destined to is the frame tested
to this machine? Yes. Yep. Just take it.
Ship it to the OS. The OS will take it.
Oh, it's a SIN. And it's listen to 80
and it's listen to this IP address. Yep.
Ask me. Let me add this to the syn Q and
it will add it to the sin Q, right? The
app doesn't know about it yet. All
right. The it adds that to the syn Q,
the OS kicks in and it will say, "All
right, let it's time to start finishing
the handshake." Right? So once it added
to the sin q the OS will kick in and
they say okay let me take this sin
request because someone tried to connect
to me right at this point it's not a
full-fledged connection yet it's just a
request to connect if you will so the OS
will take that sin and they say okay
send I need to send a synac I agree
synac we'll send a synak to the to the
uh to the client
and then we'll move on because it needs
to receive the final act right so we'll
move on so meanwhile lots of other sins
are coming connection to request and
they are added to the queue yeah that's
by the way how sin flooding can happen
right because because you're adding
blindly adding the sin packets to this
que this queue can easily get flooded
right why very easy a client that sends
a sin and never acts.
Just send send send
all of a sudden you're flooded. Nobody
can else connect. That's how it works.
So let's say a legitimate client will
send an act back. So completing
effectively the handshake. So when the
opponent system receives that final act,
it maps it back to an entry in the queue
says, "Oh, oh, you are from this guy."
Because the sin will have a source port
and a source IP and a destination port
and a destination IP and those four
tipples will be mapped to that Q
effectively right and that will
effectively complete the connection and
the moment the connection is complete
that popped it's popped from the queue
and now there is another queue that we
talked about the accept accept Q which
is basically a full-fledged connection
to happen here. So, hey, I I guarantee
this client is good. He finished the
connection with us. Again, we didn't
send anything here. We're just
connecting. We didn't even establish the
TLS. I'm not even talking about TLS
right here. Right? It's it's port 80,
right? The next thing is to send an
actual HTTP request, right? But we sent
that and now that connection will be
transferred to an accept queue.
All right. What does that mean? It means
that it's the operating system did its
job. It's up to the application which is
muah. Remember I listened.
Listening to an app doesn't mean you
have connections, right? You as the
application
which is the backend application in this
particular case have to accept
connections actively accept connections.
So you have to technically ask the you
the operating system do I have a
connection do you have a connection? Do
I have a connection? Do I have a
connection? Do you have a connection?
That's how it works today
right and you can do this by calling
something called accept. And you might
say, I never did this with Node.js.
Well, NodeJS does that for you behind
the scenes. Yeah, there is there is an
infinite loop that just accepts. Yeah.
Where is this infinite loop? You might
say it is in your thread
which is again we said it's a single
thread app. So we have one listener. It
it it it has a loop that and it's
accepting all the connection. And the
way it works if it calls accept if the
the the the function call accept will go
to the operating system say hey I want
to accept a connection. So sure you you
have one right here in the accept Q take
it and take it really means that it was
going to be popped from the accept Q and
a file descriptor a unique integer value
will be returned to that thread that
called except whoever called it will get
that file descriptor and that file
descriptor will present your connection
and that is one client one connection
one user connected to you and then you
can exchange information using that file
descriptor. So the thread can write to
the file descriptor and it can read from
the file descriptor and that's its own
story right reading and writing there is
asynchronous read there is synchronous
blocking reads and there is this whole
new thing that Linux built called IO
uring which is a fabulous design for
asynchronous
reads and rides for everything files
network calls pretty much everything All
right. So, I owe you ring that's what
it's called. But let's let's not get
into a lot of details here. Let's keep
this objective and to the point. Yeah,
sure. What's the problem? I have a
single thread that single thread which
is Node.js. Contrary to the belief
NodeJS is a single threaded app. Yeah,
it has multiple threading app but has
nothing to do with networking. Right.
The networking is still a single
threaded
experience in Node.js. The only time
NodeJS uses multi-threading and it's
documented will documented in NodeJS is
when it does DNS entries and in specific
libraries where it use multi-threading
but DNS definitely right and I suppose
when when it uses asynchronous
file system reads I talked about NodeJS
threading uh check out the video there
just type NodeJS reading Hussein and uh
you you should find it.
But yeah u but network all single
thread. So that means I have a loop that
accepts connection and I have a loop
that actually
processes
my request H.
So that's actually pretty cool. So that
connection that thread will just
accept the connections. So, I have a
connection file descriptor. What if what
if another user came in, another
connection request? Well, I'm just going
to accept it again. The thread, if it's
free, it's going to accept the
connection. And now I have another file
descriptor. So, now it's your
responsibility to add it into an array,
so to speak, right? Cuz if if it's an
HTTP request, you you you can't do that,
right? There will be an event that will
be called for you and they say hey there
is an an event an open I think
connection open is called right in HTTP
library itself and that will be
delivering you an actual connection
object even fancier than that right and
the connection object will have methods
like write and read and this is how
websockets
work identically right the same thing
and uh you'll build basically an array
of connection in your thread in your
process
and uh you can talk to any of them,
right? And every connection object will
have an event associated with it. So,
and what is happening is your app is
constantly asking, hey, did I did I get
a read here? Did I get a read here? Did
I get a read here? Did I get a right
here? All of this stuff is going to be
managed by the NodeJS HTTP library and
says, okay, oh, some something just came
in from connection number one. Oh,
something ch came just came in from
connection number 103
and so on. Right? So, we have one
thread. What's the problem of this? It
easily becomes the bottleneck, right?
Because if one of those connections sent
you an HTTP request and at that HTTP
request you're doing a blocking call
that is
computing
a hash or doing something so expensive
and let's assume you don't have
threading because if you do like a
specific crypto
operation NodeJS will use threading if
you enabled it but let's assume there is
none right so if you're doing that
compute that expensive let's say it's a
loop while loop I don't know while loop
one through
basically why because now
it depends on what NodeJS will do I keep
talking about NodeJS as an example
because it's a very popular backend
right but if you build your own C
application you have to do all this
stuff yourself right so now you're
blocked and that becomes quickly becomes
the bottleneck
The listener thread cannot do work
technically. You can of course you can
if you know the limit but the moment you
do work in the listener thread in the
same thread then new connections cannot
be accepted or they will be delayed
because the moment the listener thread
the worker thread will have a time to
breathe. [gasps] Finally I'm done with
this task. I'm go I'll go now go accept
a connection. Oh, I'll go execute a read
right here. Oh, let me go uh the user
asked me to write something. So, it's
just busy doing stuff. You will be
facing blocking at some point. Right
now, what do we do? Like one
use case, right, is uh what mimashd does
and we just did a crash course
architecture crash course on mimachd.
What mimachd does is it's exactly
identical the same thing, right? It has
one listener thread, but that listener
thread only accepts connection. The
moment it accepts a connection, it spins
up a new thread and sends that thread
that thread that connection file
descriptor says, "Hey thread, take it.
That's yours. Now I'm going to move on.
Now you have the file descriptor. You do
a thing. If there is a read that comes
into that connection, it's your
responsibility. If you want to write,
write to that. I'm not involved anymore.
As a main listener thread, my job has
done. I just accepted the connection. I
handed you the connection. So the
connection array, if you will, it's not
in the listener main thread. It's in the
somewhere else. Keep shaking the table,
right? It's in it's in the thread. So
another connection came spin about
another thread. Another connection spin.
every connection that comes in right
will spin up a new thread but up until
the maximum number of thread allowed. So
if you the default is four those four
will share these new connections. So
every connection that comes in will be
given to one of the available threads.
It's not one thread per connection. It's
one thread multiple connections per
thread. Otherwise per as per the doc I'm
going to share it below as well. Uh it's
going to be a disaster if there there
will be like a thousand connection and a
thousand thread. So one thread multiple
connection per thread. Just a slight
clarification there. Yeah. So that's one
way. So the work the compute is done in
the threads right that's that's my point
with the multi-threading. So that's
powerful. So now I accepted the
connection with the multi thread right
or with the listener thread but the
connections are being worked out in each
and own thread. So a read that is
happening is a responsibility of the
thread that should continue to pull for
reads. Are you is there a read? Is there
a read? Is there a read? Is there a read
right or a blocking read or a IU ring
read? Depends what we whatever you use.
The threads are doing this job now. So
that's a model. That's one way to do it.
What's so we talked about one way have
one thread do do everything ex accept
the connection and do the work doesn't
scale well right
another way mim cachd
have one thread accept all the
connection but send off these
connections spin up a new thread for
each connection and let the connection
do the thread what's the problem with
that design the problem with that design
is
uh
one connector connection.
Not all connections are equal. What does
that mean? A client that connected to my
application might be greedier than other
clients.
One requ one client might send very
heavy requests and another client might
send lightweight request, right? Another
request might just just flood with me
with requests that are so tiny. So they
are not equal. What does that mean? It
means that you'll end up with a thread
that is so overloaded and other threads
that has connections have connections
but they're relaxed. They're just
chilling. Sitting there chilling doing
nothing or doing very minimum work. So
you wasted memory on spinning these
threads but those threads ain't doing
much.
Why why is this the case? Because that's
part of the problem on multi- threading.
We talked about it initially, right?
Multi- threading is it just there is no
knowledge. Knowledge
there is no knowledge
doesn't exist. The knowledge doesn't
exist between these threads. So you'll
end up with unfairness.
And this world that we live in is very
unfair, my friends. It's very very
unfair. So one thread might do 80% of
the work while the other uh threads are
sitting by the water cooler and drinking
and chatting and just uh having fun. No.
So another model
is as follows. What if we do this? What
if
the
So that's the third one.
Let let there be one listener thread.
Let there be multi-threaded. But here's
how we're going to do it.
That thread is responsible to accept the
connection. So we have the connections,
but keep the connection arrays in the
main thread.
Isn't that just the first one? No. Wait
a second. Let's do that. Let's do that.
What if since since we're trying to
solve this load balancing problem,
right?
What if we do this?
What if
we accepted the connections? We have
this connection array in the thread all
the file descriptors
but
we also read from all the connections
but
we do not process. So we read the
request. The main listener thread is
just re accepting connections saving
these file descriptor and also reading
from all these connections.
So it's reading request
read request. Oh, you want get slash
this is get slash API or this is get
slash blah this is get slash and now
that it has the vision of request what
it does say is hey okay I have a request
I think this is going to be expensive go
there thread hey there's another request
go there's another thread all right and
we start distributing requests to
threads not connections the the threads
have no clue about connections here so
you just send request boop
send request. Hey, process this, process
this, process this. So now we just
splitted the problem. That is a
beautiful design. I like it a lot.
I like it a lot. Now we kind of
distributed the law because now if
there's a thread that is doing a lot of
work, the main thread knows about it.
Hey, this is busy. It knows it's busy
because hey, it's talking to it. You can
argue that this is part of the problem.
We're talking to it. There is exchange.
But hey, you got to pay a price. All
right. Nothing is free.
But yeah, just talking to talk talk talk
through it and then send requests. And
hey, you're busy. Hey, here's here's a
thread that is not doing anything. Hey,
get back to work. Here's some work. Do
some work. Stop sitting next to the
water cooler. Do some work. Okay. No
more sitting next to the water cooler.
Okay, so load balancing salt. That's
that's that's an interesting solution.
That's a I like it a lot. I like it a
lot. I forgot what app uses that design
though. Here's another one. A fourth
one.
We always talk about one listener
thread. Why don't we have only one
listener process? Why don't we have
multiple processes listen to the same
port? Haha, we can't. Have you even seen
this error before? You listen to port 80
and you try to listen on port 80 again
on another app says hey port is already
listening. Ah la can't do that. Right?
That was by design. You cannot have two
processes listen on the same port. But
if you know what you're doing you can
turn that switch and let the operating
system know
it's cool operating system. I own these
two puppies. So you can spin up two
puppies, two threads listening on the
same port
by turning on an option called so
reuse port socket option reuse port.
It's like hey reuse port. So now you
have multiple
threads listeners listening on the same
port. So now multiple threads you can
have 10 threads listening on the same
port.
So the operating system
and all of them are calling accept are
looping and calling accept. So now the
throughput of accepting connection are
way higher. You don't have a single
thread accepting the connection because
if you have a client a flood of users
connecting at the same time you're going
to face trouble accepting connections.
Right? talked about that right the
accept queue might be full and the app
is not fast enough accepting this
connection because it's just a single
thread so you do this just just scatter
shot all of the threads threads are
listener thread all of them are
listening all of them are listening at
the same time and all of them are
accepting connection so it's an in
parallel connection acceptance so each
of you whatever connection you accept
it's your loot
you take care of It's yours. You process
it. You do whatever you want. Proxies
like invoice support that. Proxies like
HA proxy supports that. I suppose
engineext even supports that. Right? Cuz
it's a it's a busy you do this when you
have like a very busy back end. You
accept like a API gateway a load
balancer like a layer 4 reverse proxy.
When you do that even layer seven
doesn't matter right you
right this gateway is going to have ton
of connections so you would need to
accept as fast as possible connections
either deliver them to another thread
you can do that model right instead of
you but then you're going to have a
thread explosion right so comes to the
fourth this is fourth one fifth one now
which is kind of I I Like back to the
basics, back to the original model.
Single beautiful thread, it listens and
it works. My saying you're not using
your power of multicore.
Sure, I can though.
What if I I don't want to listen in the
same port? Single threaded app. So
simple. That's my job. That's my app. So
my app becomes so elegant because it's
single threaded. doesn't have this mumbo
jumbo of threads and connections and
loop and coordination. None of that. A
single thread. You might say, Hussein,
it's a single core. You're not going to
take advantage of your uh 16 core AWS
instance here.
I'll let you know. I have this beautiful
thing and I use this thing that's called
Docker. Yeah. Put in my app in a
container and I spin up 100 containers
of my application.
All of them are different ports. Sure.
And then put that and then let them do
the work. Right. In this case,
can I have two containers on the same
port? I wish I can. Like if it's not if
it's possible, then this is really good.
Let the operating system handle that. I
suppose you can. I never tried it,
right? But that would be really cool.
But even if not then I can just do an IP
table rule that just say hey if someone
connect to port 443 or 80 load balance
them through these guys right and you're
going to have an a a process running on
port 81 82 83 just g an example so now
your app is didn't change but now you
just taken advantage of a single
threaded app but literally multiplicated
right so you are taking advantage of
your single machine cores and at the
same time you kept your application
simple I like this design I like it a
lot you might say uh one one app might
receive more load than the other then
you might add another logic on top of it
like a a layer 4 proxy that controls
that maybe right you can do
Then of course it becomes kind of a
single point of failure. Make it
simpler. Make it a NAT level layer for
proxy. I don't know. I just I just like
that fifth model. It's just it seems
like it's so elegant and simple. I I of
course nothing is free. I I'm pretty
sure it has its own problem. But
simplicity like going back to the
basics, right?
my app having my app being simple is is
a game changer that given that you have
to of of course write your app in a way
that is statelessly way yeah in a
certain isn't it I throw an Arabic word
there when I'm tired specifically after
a long day like today right I'll my
English juice will deplete and I'll
start throwing Arabic words because back
to my native language I I work always
and by the time I it's 6:00 p.m. I'll
start just uh I can't talk English
anymore. I don't This is just me.
[laughter]
All right, this is kind of indication
that I have to end this video. All
[snorts] right, guys. Hope you enjoyed
this uh video. I I like this stuff. I
like this a lot. Uh I'm learning a lot.
And uh if you enjoy this kind of
content, consider becoming a member in
this channel. Supports the show. Uh
check us out on Spotify, Apple Podcast
if you prefer to listen to this. And uh
check out my courses. Uh this is this is
kind of at the same realm.
network.husseinas.com
for a discount coupon. Learn the
fundamentals of network engineering
because any anything that comes on top
can be derived to its basic first
principles. Hope you enjoy this episode.
I'm going to see you on the next one.
You guys stay awesome. Goodbye.