A large pot of biryani being prepared on a stove in a home kitchen, steam rising, with a smaller bowl of plain leftover rice sitting on the counter nearby

Why are maps broken in India. It is the data layer.

My 7-year-old was not wrong. Biryani is a rice dish. We had rice. Why make more? The only thing a 7-year-old does not understand is that based on the dish you are planning to make, the process of making the rice changes. This is also the story of every mapping company that launched on delivery GPS data.

The question that started this · at home, before dinner
My daughter, 7 years old
"Why are you making rice again for the Biryani? We already have so much leftover rice from lunch."

She was not wrong. After all, it was rice, and biryani is a rice based dish, so why not re-use? The only thing that a 7 year old does not understand is that based on the dish that you are planning to make, the process of making the rice changes. Biryani rice is parboiled, seasoned differently, cooked to a specific texture. Leftover lunch rice is fully cooked, already starchy, compressed in the container. You can put it in the pot. The biryani will not taste like biryani.

Well, you may be thinking I am talking about cooking. Actually I am not.

What I want to bring to light is that any information, data, or system that we build is designed for a specific purpose, and if the output is used for a different purpose, it may not be optimal... or even advisable.

There are a lot of mapping companies that are mushrooming, and there are quite a few of them which have been there for years. I am not going to name any of them, as almost all of them have this critical flaw... which no one notices. Or maybe we are just used to mediocracy.

When you are making maps, the technology is there. It has been there for a while: routing, geocoding, reverse geocoding, ETA and more. What drives this technology is the underlying data. And when the data itself is not accurate, no technology can help. Gneerally this is the part that gets skipped over in the excitement of building a product.

I recently saw some companies launch their own mapping products. Delivery companies. Cab companies. They saw that OSM was there, a brilliant technology platform, and they had data. After all, such companies have a huge fleet running across Mumbai, Bengaluru, Delhi and dozens of other cities, so they know the routes; they know the pickup and drop locations along with the address. So it makes perfect sense to overlay the data on OSM and make a map of your own. Someone told me quite casually in a product review once: "we basically have the map already, we just need to render it."

Here is the catch.

You never gathered the data of locations, traffic, and routes for the purpose of building maps. They were gathering the data as a by-product: proof of delivery, route optimisation, billing. Not for making maps. As a result, they were boiling the rice for making lunch, and now they want to use the leftover rice to make biryani.

A delivery person on a motorbike stopped in a narrow Indian street, looking at a mobile phone mounted on the handlebar showing a map, buildings close on either side
The route looked correct on the map. The map was built from the wrong data. The delivery person found out the hard way.

Let me give you one example. How does a delivery company get location data?

A delivery person has an address where a parcel needs to be delivered, so the system has the address. When the delivery person delivers the parcel, they mark it as delivered. So you get the lat-long of the delivery... and boom: you have both the address and the GPS location. What more do you need for geocoding and reverse geocoding, right?

The answer is: no, you do not have data. And more importantly... you have the wrong data.

Actually, let me stop here and be more precise, because this is the part that matters. Let us understand how the system actually works. The system takes the lat-long of the place where the delivery personnel marks the item as delivered. So do you know that the location where it was marked as delivered is actually the location where the delivery happened?

The person can make 5 deliveries in a locality, come back to the truck, and then mark the 5 items as delivered. So what happens? The delivery location is wrong. The address-to-GPS mapping is wrong. Some may say: we geofence the location so that the delivery personnel is marking at the right location and not somewhere else. Fair point. But how do you set the geofence without knowing the exact address? So you give at least 1 to 2 kilometres of radius as the geofence zone. Which means the location can be marked anywhere in that circle.

The uncertainty in numbers
3.14sq km minimum uncertainty (radius = 1 km)
12.56sq km maximum uncertainty (radius = 2 km)
pi × r² for r = 1 km and r = 2 km respectively. This is how large the uncertainty zone is. The geocoordinate stored in the database could be anywhere inside this circle.

Now this 1 to 2 km radius creates an uncertainty that is large and covers an area of 3.14 to 13 square kilometres (pi r squared). This is how large the uncertainty is.

The data looks clean in the database. Every row has an address, a lat-long, a timestamp, a status of "delivered." It looks like mapping data. It is not mapping data. It is delivery confirmation data that has been mistaken for mapping data, in exactly the same way that leftover rice has been mistaken for biryani rice.

While I wish these maps to succeed, I still believe that we need to focus on the data layer rather than just bring your own data to OSM and make a map.

Below is a real example. One shipment delivered to my office address: the location of the update is 550 metres away from the actual location. So the next time the map routes the delivery personnel to my office, the poor person would need to try 4 different directions, without the knowledge of which direction is correct, and may traverse 4 extra kilometres just to reach me. All because someone marked a package as delivered from the wrong side of the road.

I am not sure I got this right across every edge case. There are delivery companies doing good work on data quality, and I may be underestimating how some of them handle the geofence problem. But I did not delete the draft either, because the 550-metre gap in my own office delivery is real, and that gap exists in a dataset that is now being called a map.

The biryani looked like biryani. It just tasted wrong.

← All writing Home