Uber’s Real‑Time Location Architecture

Name: The Genius System Behind the Uber App’s Real-Time Map
Uploaded: 2026-02-24T09:03:42.842867+00:00
Channel: Philipp Lackner
Description: Summary and key takeaways on Uber’s Real‑Time Location Architecture, covering Introduction When you open the Uber app, enter your pickup and drop‑off points

Philipp Lackner

Feb 24, 2026

•

5 min read

YouTube video ID: gHIs0Mdow8M

Source: YouTube video by Philipp Lackner — Watch original video

PDF

Introduction

When you open the Uber app, enter your pickup and drop‑off points, and tap “Request a ride,” two things happen instantly: your location is sent to Uber, and the positions of nearby drivers are sent back to you. Making this interaction feel seamless for hundreds of millions of users, even on unreliable networks, requires a sophisticated backend architecture.

The Original Polling Solution

Uber initially used a polling‑based approach.
The client repeatedly asked the server, “Are there new locations for nearby drivers?” in a tight loop.
Problems that emerged:
Unnecessary server load – most polling requests returned no new data.
Battery drain on the device.
Extra overhead from request headers.
At one point, 80 % of network requests from the app were polling calls.
Cold‑start time increased dramatically because many concurrent polling calls competed for resources, delaying UI rendering.

Moving to a Push‑Based Architecture

To eliminate the inefficiencies of polling, Uber switched to a push‑based communication model built around a system called Ramen (Realtime Asynchronous Messaging Network).

Key Components

Firewall (microservice) – decides when to push data. It listens to events such as:
A rider requesting a ride.
A driver accepting a ride.
New location updates for riders or drivers.
It may compare a new location with the previous one and only push if the change is significant.
API Gateway – receives the minimal push payload from Firewall, enriches it with additional context (e.g., user locale, operating system), and forwards it to the client.
Ramen Server – delivers the final payload directly to the client.

How Ramen Works Internally

Initially built on Server‑Sent Events (SSE), guaranteeing at‑least‑once delivery of pushed events.
Later migrated to gRPC, enabling bidirectional streaming so the client can also send data to the server over the same channel.

Spatial Partitioning for Efficient Driver Queries

Uber processes real‑time location updates for millions of drivers each minute. Sending every driver’s location to every rider would be impossible.

Naïve Distance Calculation (Rejected)

Calculating the distance from a rider’s position to all drivers would create an unmanageable server load at scale.

The Spatial Partitioning Solution

Uber divides the world into smaller geographic regions, mapping each driver’s position to a specific region.
This avoids per‑driver distance calculations; the server only checks which drivers reside in the rider’s region and neighboring regions.

From Squares to Hexagons – H3

Square grids cause corner bias because diagonal squares are farther away than side‑adjacent squares.
Uber adopted H3, an open‑source hexagonal spatial index:
The globe is tiled with hexagons, each having equal distance to its neighbors.
Any GPS coordinate is mapped to an H3 index (the hexagon it falls in).
Queries use a K‑ring: all hexagons within K steps of the rider’s hexagon.
- K = 1 → 7 hexagons (the central one plus its immediate neighbors).
- K = 2 → includes the second layer of neighbors.

Complexity Reduction

The algorithm’s time complexity drops from O(N) (where N could be millions of active drivers) to O(K² + M), where K is the radius (number of hex rings) and M is the number of nearby drivers.
Example: If 100 drivers are nearby, Uber iterates over those 100 instead of millions.

Additional Uses of the Spatial Index

Creating dynamic pricing zones.
Predicting estimated time of arrival (ETA).
Demand forecasting.

Optimizing Network Latency with Edge Servers

Mobile users often rely on cellular connections, which can be unstable. Uber reduces latency by deploying edge servers:

Hundreds or thousands of servers positioned globally, each serving nearby users.
A rider in Berlin talks to a server in Frankfurt rather than one in California.
Edge servers also cache relevant data, speeding up access.
On a typical 4G connection, this can shave ≈ 100 ms off request times compared with a distant server.

Smoothing Location Updates on Unreliable Networks

Even with edge servers, location updates may arrive irregularly. Uber ensures the map marker moves smoothly by:

Reckoning – predicts the driver’s next position based on the last known speed and direction.
Kalman filters – blend the predicted coordinates with actual measured coordinates.

The combination prevents abrupt jumps when a new measured point differs significantly from the prediction, resulting in fluid motion of the car marker.

The “Phantom Cars” Rumor

Some users claim Uber shows non‑existent “phantom” cars to make the map look populated and discourage switching to competitors.
Uber denies these allegations.
The discussion highlights how seemingly simple UI elements can mask a highly complex, scalable backend.

Conclusion

Uber’s journey from a naïve polling system to a sophisticated push‑based architecture illustrates the challenges of delivering real‑time location data at massive scale. By introducing services like Firewall, adopting Ramen with gRPC, leveraging hexagonal spatial indexing (H3), deploying edge servers, and applying reckoning with Kalman filters, Uber provides a responsive and fluid experience even under poor network conditions. The system’s intricacy underscores why modern mobile apps often rely on elaborate backend engineering to meet user expectations.

Uber transformed its architecture by replacing a heavy polling model with a push‑based system that uses Firewall, an API Gateway, and Ramen, first over SSE and later gRPC, dramatically cutting unnecessary traffic and latency. Spatial partitioning with the H3 hexagonal index reduces driver‑search complexity from linear to a small, region‑based computation, enabling fast nearby‑driver queries and supporting features like dynamic pricing and ETA estimation. Deploying edge servers close to users trims round‑trip times by roughly a tenth of a second, improving responsiveness on cellular networks. Predictive reckoning combined with Kalman filtering smooths driver location updates despite irregular network delivery, ensuring fluid map animations. Together, these engineering choices illustrate how Uber’s backend handles massive real‑time location data while delivering a seamless user experience.

Takeaways

Polling accounted for 80 % of network requests and caused unnecessary server load, battery drain, and increased cold‑start time.
Uber replaced polling with a push‑based model that uses Firewall, an API Gateway, and Ramen, initially built on Server‑Sent Events and later on gRPC for bidirectional streaming.
Spatial partitioning with the H3 hexagonal index reduces driver query complexity from O(N) to O(K² + M), allowing efficient nearby‑driver searches.
Edge servers positioned near users reduce latency by about 100 ms compared with distant data centers.
Reckoning and Kalman filters are used together to predict and smooth driver location updates on unreliable networks.
Uber denies the existence of “phantom” cars, highlighting the gap between UI perception and backend complexity.

Frequently Asked Questions

Who is Philipp Lackner on YouTube?

Philipp Lackner is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

How Ramen Works Internally

* Initially built on **Server‑Sent Events (SSE)**, guaranteeing **at‑least‑once delivery** of pushed events. * Later migrated to **gRPC**, enabling **bidirectional streaming** so the client can also send data to the server over the same channel.

Summarize another video

Full Transcript YouTube

We've all done it. You open the Uber
app, enter your location and desired
destination, and then click a button to
get a ride. From this point on, two
things happen. On the one hand, your
location as the rider is shared with
Uber. And on the other hand, nearby
drivers positions are shared with you.
And what seems like such a simple thing
comes with massive complexities on
Uber's end to make this system work
fluently for hundred millions of users.
How does the Uber app on back end really
achieve to update it so fluently for
this many drivers even in bad network
conditions? Let's take a look. Uber must
keep millions of driver devices, rider
devices, and backend services in sync in
near real time. And originally, Uber
stick to an approach that ended up
getting them into huge problems. Back
then, they've actually implemented a
polling based approach where the app
simply asks the server for new data. So
in a simple loop, the client has asked
the server things like, "Hey, are there
new locations for nearby drivers? Are
there new locations? Are there new
locations?" So this was Uber going full
founders mode. But it was clear that
this approach won't scale. On the one
hand, every polling request made that
way that does not result in new data for
the client is in the end unnecessary
load for the server. It also drains a
lot of battery. It comes with a lot of
overhead because every polling request
adds additional headers to the request.
And what this has led to at some point
is that 80% of network requests made to
Uber server from the app were polling
calls. It has also made the cold startup
time of the app skyrocket because
multiple concurrent polling calls were
competing which prevented the app from
rendering UI because the data came from
the polling calls. And as a result of
these polling problems, Uber has moved
to a pushbased communication approach.
So instead of the client asking for new
data, the server now pushes it to the
client once it's available. And for that
they've created ramen. Okay, not
actually instant noodles. Ramen stands
for realtime asynchronous messaging
network. And in theory this works much
better, but still comes with challenges.
So now instead of the client blindly
asking for new data, they needed logic
deciding when to push, what to push, and
how to push that data to the client. And
for a scaling infrastructure, they
decided to separate these three
responsibilities. First of all, they've
introduced Firewall, which is a micros
service responsible for the decision
behind when to push. So, it's really
just a service that listens to all kinds
of different events and then decides if
it's worth pushing an update to the
client. So, these events could be a user
requesting a ride. This could be a
driver accepting a ride. It could be a
new location of a rider or a driver. And
while locations may change very
frequently, not every little change
definitely has to be sent. So Fireball
may compare a new location with a
previous one and only send the new one
if it has changed enough, for example.
[music] And when Fireball then decides
that a push should happen, it takes this
information and it sends it to Uber's
API gateway. This gateway now has the
purpose to take a look at that minimal
push data and gather the entire data
necessary for the client. For example,
the users locale, the operating system
they use, and so on. Finally, the client
then connects directly to the ramen
server, which pushes the determine
payload to the client. How does Ramen
actually work internally? Well,
originally it was built on top of SS.
So, server sent events and ramen is in
the end just a protocol, a technology
built on top of that that makes sure
that at least once delivery is
guaranteed. So that events being pushed
to the client will definitely also
arrive there. Nowadays, however, they've
migrated it to use gRPC as it allows
sending messages from both client and
server. So, it's not just a pushbased
approach anymore where just the server
can push data to the client at any time,
but the client can also share data with
the server that way. So this
infrastructure now allows Uber to push
data to clients efficiently. However,
there was still a big challenge. Uber in
the end receives realtime updates for
millions of locations around the world
every minute. So how do you actually
send just the locations a specific
client cares about? So exactly those of
nearby drivers of that client app? One
approach you may think could work is to
simply take a look at the riders
position and then calculate the distance
to all drivers positions to only show
those in a given [music] radius. But as
you can imagine, at high scale,
calculating millions of distances for
millions of active riders results in
unmanageable load for the server. So
instead, Uber actually came up with a
much much more genius approach called
spatial partitioning. And the idea
behind that is really to take a
geographic space and then divide it into
smaller regions where we can then take a
driver's position and map it to a clear
region. since this way we don't have to
calculate millions of distances anymore
but just check which drivers are in our
region and the ones around it. Splitting
up the world into squares however has a
major problem and it is that squares
don't have the same distance to all the
neighbors which can lead to what we call
corner bias since squares diagonal to
the one you're looking at are further
away than squares to the top of it or to
the left and right. Wouldn't it be much
better if we could simply query nearby
drivers based on at least an approximate
radius? And that is why Uber has created
H3, which is an open-source hexagonal
spatial index. So instead of dividing
the world into squares, it divides it
into a honeycom of hexagons, each really
having the same distance to all of its
neighbors. So given any GPS coordinate,
H3 computes which hexagon it's in, and
then returns that H3 index. And instead
of computing distances for every driver,
you instead ask for the so-called K
ring. So those hex's within K steps of
your rider cell. So for example, if K is
equal to one, then that includes the
first layer of neighbors making up seven
cells in total. If K is equal to two,
then the second layer of neighbors is
also included. And this new system
effectively brings down the time
complexity from O of N where N could
literally be millions of active drivers
to O of K^ squ plus O of M where K is
the radius and M is the number of nearby
drivers.
So this means if there are 100 nearby
drivers, Uber now only has to iterate
over those 100 drivers instead of
possibly millions. And the genius thing
behind this new strategy is not just
that it can be used for finding nearby
drivers, but actually for so so much
more that is important to Uber. [music]
For example, to create dynamic pricing
zones for predicting the estimated time
of arrival or for things like
forecasting demand. And now that spatial
index helps Uber to really optimize load
on the server side since when receiving
a rider's location, the server can now
look up nearby drivers positions much
faster and therefore give the client a
faster response. This however is only
one side of the metal. Especially with a
mobile app, you can't count on its
internet connection being super stable
at all times. And in the case of a ride
sharing app like Uber, it's completely
expected that most people using it
aren't at home and therefore likely
won't connect via stable Wi-Fi
connection, but much more likely a
cellular one. So, how did Uber's
engineers optimize the mobile app to run
super smoothly even under bad network
conditions? And here there are two
things that have to be looked at. First
of all, how do we actually minimize the
network latency between mobile app and
server? And second, how do we make sure
the app still feels fluent even when not
receiving many GPS updates? Let's start
with keeping network traffic fast. Here,
Uber actually sticks to so-called edge
servers. And those are really nothing
else than hundreds, if not thousands of
servers spread around the globe where
each server is responsible for nearby
users. So instead of a user in Berlin
talking to a server in California, Uber
instead uses service relatively close,
for example, in Frankfurt. And these
edge servers then not only serve as a
primary entry point for requests for
nearby drivers and users, but also cache
all kinds of relevant data, so data
access is even faster. And when
considering a typical 4G connection,
this change alone can result in requests
being approximately 100 milliseconds
faster when talking to a closed local
server versus one that is 10,000 km
away. However, even with closed servers,
there is no guarantee that new location
data arrives reliably at the same
intervals. [music]
In mobile apps, you in the end always
need to account for connection drops at
any time. So, let's imagine the Uber app
now receives three location updates from
a driver over the period of 20 seconds.
Our goal is now to not make the location
jump from one coordinate to another when
a new location is received from the
server. So instead, we need a way to
take a look at a few known locations and
use that information to kind of predict
what is likely the new current location
of the driver, even if it doesn't send a
new one. And for that, Uber uses what we
call that reckoning, which really tries
to predict where the driver should be if
they kept the last known speed and
direction. This is then used in
combination with so-called Kelman
filters, which combines these predicted
coordinates with real measured
coordinates. [music] And that's in the
end what makes the car marker move
around smoothly even when a new measured
coordinate is quite different than the
last predicted one. But I actually also
have something for the conspiracy
theorists among you. There are actually
claims and discussions about Uber
showing fake cars in the mobile app.
cars that don't even exist. Because if
we think about it, for cars that don't
even exist, [music]
you obviously don't have to listen to
serverside location updates, and you
could also artificially make more
drivers appear to riders to not show
them a dead area, which may make them
open a competitor's app. This phenomenon
of phantom cars is however something
that Uber denies. But what this shows is
that most simple looking interfaces
often have the most complex back ends,
especially when things really have to
scale. And with this video, I really
tried something new here and a lot of
work went into this video and that is
why I need your feedback here. Is there
anything that wasn't clear? Uh would you
like to see more such videos every now
and then? And if so, which app would you
like me to take a look at next? So, if
mobile development and especially with
cotton is a thing for you, then
subscribe for all kinds of in-depth
tutorials about that. Thanks so much for
watching. I will see you back in the
next one. Have an amazing rest of your
week. Bye-bye.