The Blind Spot: What Three Years of Fighting Fraud in Ride-Hailing Actually Taught Me

The Blind Spot: What Three Years of Fighting Fraud in Ride-Hailing Actually Taught Me

There is a joke that circulates among fraud teams at ride-hailing companies. It is told in war rooms at two in the morning when a new exploit has just surfaced and nobody quite knows how big it is yet. It goes like this:

The best way to stop fraud is to stop the business. No business, no fraud.

Everyone laughs. Then everyone goes back to work. Because the joke contains the most honest description of the job that anyone has ever articulated. Fraud is not a bug in a ride-hailing platform. It is a property of one. The moment you build a two-sided marketplace with money moving through it, incentives attached to behaviour, and millions of strangers transacting with each other at speed, you have created the precise conditions under which fraud becomes rational, scalable, and inevitable.

You did not make a mistake. You built a business. The fraud came with it.

I spent two years as a Risk PM at a large ride-hailing company in India. This is the piece I wish had existed when I walked into that role. Not a technical reference. Not an engineering post. Something more uncomfortable than either of those things: an honest account of what the job actually is, what makes it unlike any other PM role in the company, what the hardest problems really are, and why the most important cost of fraud never appears on any dashboard anywhere.

I am going to name what I learned, including the things the engineering blogs leave out. Because the engineering blogs are written by people who built impressive systems. This is written by someone who also had to explain to a driver why he was suspended, know that the suspension might be wrong, and live with the fact that there was no clean answer to that question.


The Role Nobody Fully Explained to Me

Every other PM at a ride-hailing company has a product surface. The driver app PM owns the driver app. The customer app PM owns the customer app. The payments PM owns the payments flow. The incentives PM owns the incentive engine. Each of them goes deep on their surface. Each of them builds expertise in a defined domain. Each of them has a clear metric that tells them whether they are winning.

The Risk PM has none of that.

The Risk PM's product surface is every other PM's blind spot. His job is to look at the entire company, every system, every flow, every feature, every incentive structure, and ask a single question: where can this be gamed?

That requires a cognitive posture that is fundamentally adversarial. Not adversarial toward colleagues. Adversarial toward the product itself. You have to think like someone trying to break every other PM's work. You have to understand the driver app not as a tool for drivers but as a surface with exploitable edges. You have to understand the payment system not as infrastructure but as a set of race conditions waiting to be found. You have to understand the incentive engine not as a supply management tool but as an attack surface that someone, somewhere, is already probing.

Every other PM asks: how do people use this?

The Risk PM asks: how do people break this?

That is a different question. It requires different thinking. And it means the Risk PM needs to go as deep on every system in the company as the PM who owns that system, but in a specific and unusual dimension: not how it works, but where it fails.

There is also a loneliness to the role that nobody warns you about. The driver app PM has a clear success metric. Driver engagement. Completion rate. Session quality. When things go well, there is a number that goes up. The Risk PM's success is largely invisible. When fraud is low, nobody knows if it is because the system is working or because fraud actors took a holiday or because the fraud is happening in a way the system cannot yet see. When fraud spikes, everyone notices. The job is structurally thankless in a way that no other PM role at the company is. You are celebrated for absence. You are blamed for being present.

And yet it is the most intellectually demanding PM role I have encountered. Because it combines technical depth, behavioural economics, legal reasoning, marketplace strategy, and moral philosophy in ways that no other role requires simultaneously. I will try to explain why.


The Fraud Taxonomy: What You Are Actually Dealing With

Fraud in ride-hailing has two primary actors. Drivers and customers. Their motivations are different, their techniques are different, and they require fundamentally different responses. Understanding the taxonomy precisely matters because the action you take should match the nature of the fraud, not just the fact of it.

Driver Fraud

Drivers are the more sophisticated fraud actors in this industry, for a structural reason. Drivers have more touchpoints with the system. They interact with allocation algorithms, incentive engines, payment systems, and GPS infrastructure simultaneously. Each of those touchpoints is a potential attack surface. And because driver income depends directly on how well they navigate the system, the incentive to find exploits is high and constant.

Incentive fraud is the dominant category.

Most ride-hailing companies manage supply and demand through incentive programs. Complete N rides today and earn a bonus. Travel X kilometres this week and unlock a higher tier. Maintain a certain acceptance rate and qualify for peak pricing. These programs are necessary. They are also, in almost every design I have seen, exploitable.

When the incentive pays per number of completed rides, the optimal fraud strategy is to maximize ride count while minimizing time per ride. The driver creates or acquires a set of fake customer accounts. He allocates rides to himself from those accounts, performs the minimum GPS movement required to satisfy the platform's distance validator, marks the ride complete, and moves to the next. Each cycle takes three minutes instead of fifteen. In a six-hour session the math is devastating.

The velocity signatures this creates are distinctive. Extremely low variance in ride duration across a session. The same customer device fingerprint appearing repeatedly across rides for the same driver. Pickup and drop-off coordinates clustering within a radius too small to represent genuine passenger trips. Account creation dates that cluster suspiciously close to the start of the incentive period.

In Jakarta, four drivers were arrested in one documented case, each controlling up to thirty fake accounts from which they placed hundreds of fake orders over a two-month period. Each was earning the equivalent of roughly US$678 per day. They were caught when a device intelligence tool detected phones cycling through multiple accounts, a signal that looks unremarkable in isolation but becomes unmistakable at the network level.

When the incentive pays on kilometres travelled, the problem inverts. The driver's goal is now to log kilometres without driving them. Three techniques dominate. Mock GPS, where a spoofing application feeds fabricated coordinate streams to the driver app while the driver sits in a parking lot. Transit fraud, where the driver places his phone on a bus and lets genuine city roads generate a legitimate-looking GPS trace. And parallel-platform fraud, where the driver accepts a ride on your platform and simultaneously drives for a competitor, using phantom customer accounts to generate the trip data on your side.

Mock GPS detection is a solved problem technically, but evasion has evolved alongside detection. Early spoofing apps produced unnaturally smooth trajectories with no road noise, no signal drift, no acceleration variance. Modern detection uses logistic regression on GPS trace features: speed consistency, acceleration profiles, signal noise characteristics, cross-referenced against network triangulation. The spoofing apps got better in response. The detection models got more sophisticated. The arms race continues.

Underlying all incentive fraud is a single structural exploit: self-allocation. The driver controls which customer account requests his service. This is the foundational vulnerability. Every incentive fraud variant depends on it. The countermeasure is not just detecting self-allocation after the fact. It is making self-allocation structurally expensive at the allocation engine level: enforcing that a driver's registered device cannot share a fingerprint with any customer account, checking whether customer accounts have ever logged in from the driver's device, requiring minimum transaction history before a customer account can contribute toward incentive calculations.

Off-platform fraud is older and simpler.

The driver's commission to the platform is typically 15 to 25 percent of the fare. Off-platform fraud is the oldest exploit: the driver approaches the customer after accepting the ride and proposes to cancel in the app and complete the trip privately, charging slightly less than the meter would. The customer saves 10 percent. The driver earns 10 percent more. The platform loses its entire commission and all data about that trip.

Detection signal: a pattern of cancellations that are geographically and temporally too convenient. Off-platform fraud cancellations tend to occur after the driver has arrived at the pickup point and the OTP has been shared. A cancellation when the driver's GPS shows him within 50 metres of the customer's pin, especially followed by a period of inactivity, is a strong signal. The harder detection: cross-referencing driver earnings per hour against completion rate. A driver who completes fewer rides but whose hourly cash collection is disproportionately high is a candidate for off-platform transaction review.

Feature abuse fraud is the category that embarrasses product teams.

It exploits design decisions rather than external behaviour. The wallet fraud that exists when an Auth and Capture mechanism is not implemented, where a bad actor can exploit the race condition between ride-status update and payment-deduction calls. The chargeback fraud enabled by the absence of two-factor authentication, where organized rings acquire stolen card numbers, sell rides through WhatsApp groups to genuine customers who pay cash, complete the rides, and leave the platform holding the chargeback sixty days later.

The WeChat syndicate pattern in Australia is the clearest documented case of this. Chinese syndicates exploited the absence of mandatory OTP authentication on Australian card payments to build what was effectively a grey market ride resale operation. The platform was the unwitting infrastructure for fraud it had no mechanism to detect at transaction time and only discovered retrospectively through chargeback patterns.

Customer Fraud

Customer fraud is less technically sophisticated but spikes sharply and predictably around two triggers: new user acquisition campaigns and referral programs.

First-ride discounts are the standard acquisition mechanism. The exploit response is multi-accounting: creating multiple fake accounts to claim the discount repeatedly. The sophistication ladder runs from same device, different email addresses, which device fingerprint matching catches easily, up to coordinated multi-device farm operations where dozens of physical devices create accounts in synchronized time windows from the same IP subnet, completing first rides to identical destinations.

Referral fraud uses the same multi-accounting infrastructure against referral programs. Detection is graph-based. Legitimate referral graphs branch outward with low clustering coefficients. Fraudulent referral graphs form tight closed clusters: the same accounts referring each other in loops, with shared device and IP attributes, GPS coordinates that never diverge across rides.

Vomit fraud belongs in its own category. The driver falsely claims a customer damaged or soiled the vehicle and files a cleanup reimbursement. The mechanism is a generous damage claim process, necessary in litigious markets, that becomes exploitable when claimed repeatedly. Detection tracks claim frequency per driver against overall ride volume. A single claim is plausible. Three claims in a month from a driver with an otherwise normal profile is a flag.


The Payments Fraud Layer

Payments fraud sits orthogonally to the driver-customer taxonomy. It is not about gaming rides. It is about gaming the money.

Card testing uses ride-hailing platforms as verification infrastructure. A fraudster with a batch of stolen card numbers needs to know which are live before selling them or using them for high-value transactions. Small ride transactions on newly added cards are unlikely to trigger bank-level fraud alerts and complete quickly. Detection: flag any card that generates more than two transactions within 24 hours of being added, especially across multiple customer accounts.

Account takeover compromises user accounts through credential stuffing, phishing, or SIM swapping. The attacker adds a new payment method, changes registered contact details, and uses the account for rides or resale. Uber's response, penny drop verification, is instructive in its design philosophy. Rather than immediately banning accounts that trigger high-risk signals, which generates false positives and creates customer service burden, Uber routes suspected high-risk users to verification challenges calibrated to the specific risk type. The penny drop flow asks the user to confirm two small, random authorization hold amounts against their registered card within a time window. The legitimate cardholder opens their bank app and reads the amounts. The fraudster, who has the card number but not the bank login, cannot.

The design principle: minimize false positives by allowing genuine users to self-resolve while making it structurally impossible for fraudsters to pass. Friction that is easy for the legitimate user and hard for the attacker is the ideal design. Friction that is equally hard for both is a bad security measure and a bad product decision simultaneously.

Social engineering account takeover in India deserves its own treatment.

Because it is categorically different from every other fraud type in this piece. Every other fraud has the platform or another driver or a customer as the victim. Social engineering account takeover has the driver or customer themselves as the victim. The platform is the unwitting instrument of the crime, not the target.

The pattern is well documented and devastatingly effective. A fraudster calls a driver pretending to be platform support. The script is polished. He knows the driver's name, his city, sometimes his recent trip history, sourced from data leaks or social media. He tells the driver there is a problem with his account, his payment is stuck, his account will be deactivated unless he verifies immediately. Then he asks for the OTP.

The driver, who is not technically sophisticated, who trusts authority, who is anxious about losing his primary income source, reads the OTP out loud. The fraudster uses it to log into the account from a new device, changes the registered phone number and email, and the driver is locked out of his own livelihood.

This is not a technical vulnerability. The authentication system worked exactly as designed. The human was the vulnerability.

Several forces make India specifically susceptible. Low digital literacy among a significant portion of the driver base. High trust in authority figures. Widespread availability of partial account data from multiple leak events. And, critically, the legacy of a cash-first driver base that joined the platform before they were familiar with digital financial security norms.

The countermeasures are product decisions, not just security decisions. Number masking eliminates the most common sourcing channel for social engineering attacks. Two-factor authentication for login on new devices stops credential stuffing. But the most important intervention is the language in the OTP message itself: this code is for your login. Never share it with anyone including platform support. Platform support will never ask for this code. That sentence, in the local language, in the SMS itself, reduces successful social engineering attacks. It is a product decision that most teams treat as a legal disclaimer and most users never read, and that should instead be designed as the primary defence layer for the population most at risk.

The emergency account freeze mechanism matters as much as the prevention. When a driver realises he has been socially engineered, the window between realisation and permanent damage is minutes. He needs to freeze his account instantly, before the fraudster changes his registered details and locks him out permanently. A panic button. A short code. A support flow that prioritizes account freeze above everything else and requires no verification to initiate, only to reverse. The asymmetry matters: freezing an account incorrectly costs the driver a temporary inconvenience. Failing to freeze costs him his livelihood, potentially permanently.


The Detection Engine

The BRMS: Why Rules Are Still the Foundation

The core of any mature fraud system is a Business Rules Management System, a configurable framework that allows detection logic to be authored, tested, and deployed without requiring a full engineering cycle. The central insight is that fraud techniques change weekly. If every countermeasure requires an engineering sprint, a code review, and a production deployment, the platform is permanently behind.

Grab's Griffin rule engine is the most technically detailed public description of how this is built in practice. Rather than using a commercial rule engine like Drools, which Grab found inadequate because of its Java-based DSL, limited expressive power, and inability to handle dynamic datasets, the team built Griffin from scratch in Python. Data scientists and analysts write Python-based rules in a web portal and deploy them to production without engineering involvement. The system polls a dirty key timestamp and reloads rules only when they have changed, keeping evaluation entirely in-memory to achieve sub-6 millisecond prediction latency. At peak, Griffin processes over 100,000 queries per second on six EC2 instances.

Uber's Mastermind operates on similar principles. Rules authored on a front-end interface, validated for syntax, testable against feature values before deployment, refreshed from database into memory periodically. By the second version of the system, a front-end rule change was in production in approximately one minute, down from the hour required in the first version. The fraud scope covers payment fraud, account takeover, driver-rider collusion, and promotion abuse.

What both systems share is the recognition that fraud teams are not engineering teams. The people who understand fraud patterns are data scientists, analysts, and fraud ops investigators. The BRMS is the mechanism that lets their knowledge become production code without being filtered through an engineering backlog.

Grab's Counter Service is the feature aggregation layer that feeds Griffin and is worth describing because it solves a problem that most fraud teams underestimate. Computing real-time aggregates like "number of cashless payments between this driver-passenger pair in the last seven days" at marketplace scale is not trivial. The Counter Service uses a multi-bucket storage strategy: events are pre-aggregated into 15-minute, 1-hour, and 1-day buckets stored in ScyllaDB, chosen over Redis for being ten times cheaper at comparable stability, with p99 read latency under 150 milliseconds. When a rule needs an aggregate over a time window, the service decomposes the window into the appropriate bucket sizes, queries them in parallel, and assembles the result. This approach avoids real-time database aggregation under load while maintaining 15-minute precision across an arbitrarily large number of counter types.

The Threshold Gaming Problem

Every rules-based system has a structural vulnerability: once a fraud actor is flagged or penalized, he studies the trigger and operates just below it. A driver blocked for completing 12 short rides per hour will come back doing 9. Your detection curve goes down not because fraud declined but because the adversary adapted.

The solution is building rules that detect the shape of fraudulent behaviour relative to a peer cohort, not absolute volume. A driver completing 9 short rides per hour, when the city-zone average for his vehicle class and time window is 2.3, is still a 98th percentile anomaly. The absolute-volume rule misses him. The percentile rule catches him. And the percentile rule is significantly harder to game because the adversary does not have visibility into the distribution of the cohort.

Uber described this framing precisely: fraud is like a chameleon, trying to blend in over time. The detection model must therefore look not just for known patterns but for deviation from what blending looks like.

The Anomaly Detector: Finding What You Do Not Know to Look For

Rules catch known fraud. You write a rule because you saw a pattern. But on day zero of a new fraud type, there is no pattern yet. There is just a faint signal in the data that something is slightly off. A distribution that has shifted. A metric behaving unusually. Something that does not fit.

This is what ML-based anomaly detection is built for. Not to catch known fraud. Rules do that. But to surface the unknown. To look at the entire population of drivers or customers and say: these entities are behaving in a way that deviates from what normal looks like, and we do not yet know why.

Uber's Risk Entity Watch platform is built explicitly around this principle, using unsupervised anomaly detection to flag entities whose behaviour deviates from the normal population without requiring a prior hypothesis about what the fraud pattern looks like. The system generates what Uber calls entity-specific features: aggregates computed across every event type for every entity type, rider, driver, payment instrument, device, city zone. The feature space is intentionally massive, generating thousands of features per entity across multiple aggregation windows, dozens of entity types, hundreds of metrics, and fifty-plus checkpoints. The anomaly detection models then operate on this feature space to surface entities whose behaviour is unusual relative to the population.

The human review step is deliberate and important. Uber routes all anomaly flags to operations agents for review before actioning. Because fraud actions have serious consequences, denied income, account restrictions, and anomaly detection by its nature surfaces unusual behaviour without confirming fraud. The agent who understands the local market context looks at the flag and makes the call.

The relationship between rules and anomaly detection is not competitive. It is sequential. The anomaly detector surfaces something strange. A human analyst investigates. If it is a new fraud pattern, a rule gets written. The rule then handles that pattern at scale and at speed going forward. The anomaly detector moves on to finding the next unknown.

Fraud detection is therefore a loop. Anomaly detection surfaces unknown patterns. Human investigation confirms and characterizes them. Rules encode the confirmed pattern. Rules suppress the fraud wave. Anomaly detection watches for the next deviation.

The speed of that loop is one of your most important operational metrics.

Graph-Based Detection: When Individual Accounts Look Clean

Per-account rules are blind to coordinated networks. A network of fifty fake customer accounts, each used by a different driver, looks normal in isolation. Each account has a plausible transaction history. Each driver's ride metrics pass individual velocity checks. The fraud is invisible to any rule that evaluates entities one at a time. It becomes visible only when you map the relationships between entities and look for structural anomalies in the graph.

Uber applies Relational Graph Convolutional Networks to the driver-rider relationship graph. Drivers and riders are represented as nodes with shared information forming typed edges between them: shared device fingerprint, shared IP at registration, shared phone number. The key insight from Uber's published work: distinguishing between different types of connections amplifies the fraud detection signal. A shared device fingerprint is a stronger signal than a shared IP address, which can be coincidental. The model learns different weights for each connection type during training and uses those weights to identify dense subgraphs that represent collusion rings even when individual nodes look clean.

Grab uses a semi-supervised Relational Graph Convolutional Network trained on a graph with millions of nodes and edges where only a small percentage have fraud labels. The semi-supervised property matters operationally: in fraud detection, labels are expensive and often biased. You only know something was fraud if you caught it. A model that achieves high performance with a small percentage of labeled nodes is more robust to label scarcity. Grab's observation: fraudsters share physical properties, phone devices, Wi-Fi routers, delivery addresses, bank accounts, because resource sharing reduces cost. That cost-minimization logic creates detectable graph structure that would be invisible in any per-account view.

Gojek built GoSage, a hierarchical attention GNN specifically to detect collusion at platform scale. GoSage uses a two-layer attention mechanism: node-level attention processes each type of relationship between an entity and its neighbours independently, and relation-level attention weights different connection types by their fraud-relevance. Shared device connections are weighted differently from shared financial patterns, and the model learns these weights from data. Since deploying GoSage, Gojek reports improved detection of organized collusion networks alongside reduced false positive rates, because the graph model surfaces multi-entity patterns that per-account rules would flag too broadly.

Graph models are not replacing rule engines. They are finding what rule engines cannot see by design.

Real-Time vs. Post-Ride Detection

Most ride-hailing fraud detection operates post-ride. This is better than nothing but it means the platform has already incurred the cost. The most powerful mode is real-time detection that interrupts the payment trigger at ride completion.

The latency requirement is aggressive. When a driver taps "End Ride," the fraud evaluation must complete before the payment authorization is released, typically within 200 milliseconds to avoid perceptible delay in the app. This requires pre-computed features in low-latency caches rather than real-time database queries, and a rule engine deployed as an in-memory service. Griffin achieves sub-6 millisecond prediction latency. Uber describes making decisions within a fraction of a second.

The rider experience design for held payments matters. When a ride is flagged, the driver sees a "payment processing" state rather than an immediate payout. The customer's receipt is issued normally. The hold on the driver's payment is invisible to the customer. This prevents drivers from knowing that a specific ride triggered a fraud flag, which would leak detection logic. The design is deliberately opaque in exactly the right direction.


The KPI Framework That Actually Matters

This is the section I wish someone had handed me on my first day. Because the obvious KPIs for a fraud PM are almost all wrong, and building your team around the wrong metrics is how you create a system that looks healthy and is quietly failing.

What Not to Measure

Fraud rate going down. Sounds good. Means nothing in isolation. You could drive fraud rate to zero by blocking every suspicious account, including thousands of legitimate ones. The number looks great. The business is bleeding drivers and customers.

Number of fraud cases detected. Perverse incentive. You can inflate this by lowering your detection threshold until you are flagging everything.

Total fraud loss in absolute terms. Scales with the business. A platform that doubled its GMV and kept fraud loss flat in absolute terms reduced its fraud rate by half. Measuring absolutes hides this.

The Three Primary KPIs

One: GMV loss as a percentage, measured per use case.

Not aggregate fraud percentage. Per use case. Because aggregate fraud is a blended number that hides what is actually happening underneath. One use case could be exploding while three others are suppressed and the aggregate looks fine. You need per use case visibility to know where you are on each individual curve.

And the shape of that curve matters as much as the number. Fraud GMV loss is always sinusoidal in nature. Every fraud use case follows the same wave. It starts small, a few actors testing a new exploit. It grows as word spreads through fraud networks. It peaks when the platform catches it and takes action. Then it drops. Then a variant emerges and the curve starts again.

The Risk PM's job is to compress the amplitude and shorten the wavelength. Catch it faster, suppress it harder, so each wave is smaller than the last.

One important clarification on this metric. If you have identified something as fraud, you stop it. Recognised fraud that you allow to pass through is an operational failure. The GMV loss percentage reflects undetected fraud, the fraud you have not yet caught, not fraud you have identified and are tolerating. That distinction matters because it shapes the measurement methodology. You are not measuring what you know. You are estimating what you do not know. Which is a significantly harder problem that most fraud teams do not invest in seriously enough.

Two: Repeat fraud behaviour rate.

The same use case should not come back. If it does, your fix was a patch, not a solution. You closed the exploit but did not fix the underlying vulnerability. This is the metric that distinguishes reactive fraud management from mature fraud management. A team with a low repeat fraud rate has built structural fixes. A team with a high repeat fraud rate is playing whack-a-mole forever.

Every time a fraud wave ends, ask not just how do we stop this but why was this possible in the first place, and how do we make it structurally impossible. The driver who exploited the wallet race condition was not the problem. The wallet architecture without Auth and Capture was the problem. The driver who found it was doing you a favour by finding it before someone worse did.

Three: Time to detection.

How quickly do you move from a fraud wave starting to your system catching it. This is the day zero problem. And it is the hardest one because rules catch known fraud. On day zero of a new fraud type there is no rule. There is just an anomaly signal that something is slightly off.

The faster you can move from anomaly signal to confirmed pattern to deployed rule, the smaller each fraud wave is. The pre-JARVIS detection window at Gojek was at least 30 minutes just to surface to an analyst. That 30 minutes of blindness is 30 minutes of undetected GMV loss per fraud event. At the scale of a platform doing millions of rides, 30 minutes is expensive.

Minimizing time to detection requires investment at every stage of the loop. Better anomaly detection surfaces signals earlier. Better investigation tooling confirms patterns faster. A BRMS that deploys rules in minutes rather than days compresses the window between confirmation and suppression.

The Guardrail Metric

False positive rate, per use case,

This is not a primary KPI. It is the guardrail that prevents your offensive system from destroying the thing it is supposed to protect.

The 1 percent ceiling is not a statistical nicety. It is a human cost calculation. If you have a million active drivers, a 1 percent false positive rate means ten thousand legitimate drivers wrongly penalised per use case. Ten thousand people who lost income. Who may have missed rent. Who told their networks that the platform treats you like a criminal.

The dispute mechanism that allows drivers to challenge a fraud flag is both a fairness mechanism and a calibration signal. When a dispute comes in and the ops team overturns the flag, that is a labelled data point: a confirmed false positive on a specific rule firing under specific feature conditions. That label feeds back into rule calibration. If the same rule is generating overturned disputes repeatedly under similar conditions, the threshold is wrong. Teams that treat disputes as pure operational overhead and never connect them back to rule performance are leaving their most honest feedback data on the floor.


The Explainability Paradox

Here is a tension that no engineering blog touches because it sits at the intersection of product, legal, and fraud strategy simultaneously, and it has no clean resolution.

You need full explainability internally. You need zero explainability externally. And you need to communicate just enough externally to be legally defensible without giving the fraud actor the information he needs to re-engineer around your threshold.

When a driver is suspended and demands to know why, you need a complete internal audit trail. This ride was flagged because of these specific signals. This rule fired because these feature values crossed these thresholds. This action was taken by this system at this timestamp and reviewed by this analyst. Without that trail your suspension is arbitrary in a legal sense regardless of how technically correct it was.

But the moment you tell the driver your ride duration variance in the last eight rides was below 90 seconds, you have handed him the exact parameter he needs to adjust. He will come back doing rides with 95 seconds of variance. Your rule is now useless. You have trained your adversary at your own expense.

The communication that actually works is true enough to be legally defensible, specific enough that the driver feels he received a reason, vague enough that it reveals no threshold or feature, and consistent enough that it cannot be reverse engineered across multiple cases. Something like: our systems detected irregular activity on this trip that is inconsistent with our platform guidelines. That sentence is true. It communicates consequence. It reveals nothing actionable.

The deeper point is that this is a deliberate information asymmetry. The platform knows exactly why. The driver knows approximately why. That gap is not dishonesty. It is the operational requirement of running a detection system against an adaptive adversary.

And it creates a real organizational tension that the Risk PM lives inside daily. The fraud ops team wants to give drivers clear reasons because it reduces appeals and feels fair. The fraud product team wants to protect detection logic because disclosure destroys it. Legal wants documentation that can withstand scrutiny. These three pressures pull in different directions and the Risk PM sits exactly at their intersection with no clean answer.


The False Positive Problem Is Deeper Than You Think

When you wrongly suspend a driver for 48 hours, he lost 48 hours of income. He may have missed his rent payment. He may have turned down a loan based on expected earnings. The reinstatement email does not undo any of that. You gave him his account back. You did not give him his Tuesday back.

And unlike a customer false positive, where the harm is inconvenience, a driver false positive is a direct income event. These are not salaried employees with a safety net. A significant portion of drivers in Indian markets are living week to week. A 48-hour suspension is not an inconvenience. It is a crisis.

Most companies respond to this with one of three approaches.

Compensation: pay the driver for estimated lost earnings during the wrongful suspension period. Creates enormous problems. You now have to estimate earnings, which requires assumptions the driver will dispute. You have created a financial claim process that fraud actors will immediately learn to exploit. The compensation mechanism becomes the new attack surface.

Ride credit or priority allocation after reinstatement: less precise than cash compensation but harder to game. Does not fully address the opportunity cost but acknowledges it.

Do nothing formally: this is what most companies actually do. Reinstate the account, apologise, move on. The implicit logic is that the 1 percent false positive rate is a known cost of running the system and the company has made a policy decision, usually unstated, that individual drivers bear that cost.

Saying that third option out loud means acknowledging that you are knowingly causing financial harm to innocent people as an acceptable cost of fraud prevention. That is true. It is very hard to put in a policy document. And most companies never formally answer it, which means the answer is always the third option by default.

The fraud system protects the platform from its worst users at a cost borne partly by its most vulnerable ones. Minimizing that cost is not just good product management. It is the right thing to do.


Fraud vs. Bad Behaviour vs. Product Design Failure

This is the distinction that most fraud teams collapse into one category because it is operationally convenient. But collapsing it costs you supply, trust, and analytical clarity simultaneously.

Fraud is intentional, mechanistic gaming of the system for undue gain.

The driver knows what he is doing. He has made a deliberate choice to exploit a loophole. There is a mechanism: a spoofing app, a fake account, a coordinated ring. There is intent. Mock GPS. Fake accounts. Off-platform trips. Chargeback rings. These are fraud. Unambiguous.

Bad behaviour is policy violation without fraudulent intent.

The driver cancels too many rides. Not because he is farming cancellation fees but because he is selective about routes. The driver takes a longer route. Not to inflate the fare but because he does not know the city well enough. These are bad behaviours. They degrade the platform. They need to be addressed. But treating them as fraud is analytically wrong and operationally damaging. A driver who receives a fraud-level intervention for behaviour he did not know was problematic leaves the platform feeling wrongly accused. Because he was wrongly accused.

The diagnostic question: Is there a mechanism? Did the driver do something that required deliberate construction? Fake accounts require construction. GPS spoofing requires a downloaded application and a conscious choice to use it. A high cancellation rate does not require construction. If there is a mechanism, it is almost certainly fraud. If there is only a pattern with no evident mechanism, investigate before you act.

Product design failure is the category nobody wants to own.

A driver who consistently picks up near a competitor's surge zone and cancels rides that take him away from it. Is that fraud? He is not using a tool. He is not creating fake accounts. He found a behavioural pattern that is individually rational and collectively harmful. The platform did not prohibit it. He is optimising within the rules.

Punishing a driver for finding a loophole you left open is not fraud management. It is blame shifting. The answer is not to penalise the driver. The answer is to close the loophole and redesign the incentive.

The Risk PM who recognises this distinction and brings the incentive design team into the conversation is doing the job at a level most fraud PMs never reach.


The Driver as Signal

Fraud rate is not just a security metric. It is a platform health barometer.

The percentage of drivers being caught in a given time frame tells you something about the underlying state of the driver-platform relationship. If fraud rates are rising, it could mean your incentive structures have become too exploitable. It could mean driver earnings have dropped to the point where fraud becomes rational. It could mean a new technique has emerged. But it could also mean the platform has broken its implicit contract with drivers. That the economics are so unfavorable that gaming the system feels justified.

When you flag a driver, what happens next? There are essentially four outcomes. He stops the fraud behaviour and continues driving legitimately. He stops the fraud behaviour and leaves the platform. He continues more carefully, operating below your new threshold. He leaves temporarily and returns with a new account. Tracking which outcome follows which intervention type is genuinely valuable data. It tells you which actions are deterrents and which are merely inconveniences. Most fraud teams measure the action taken. They do not measure the behavioural outcome of the action over time.

The 8 genuine rides problem.

A driver does 10 rides a day. 8 are genuine. 2 are fraudulent. He is serving 8 real customers. He is contributing to supply. He is earning legitimately on those 8 rides. And he is stealing on 2. What do you do?

The naive answer is ban him. Zero tolerance.

The sophisticated answer is: it depends. What is the nature of the 2 fraudulent rides? If they involve a stolen credit card and a genuine customer being defrauded, that is categorically different from an incentive manipulation that only costs the platform money. What is the supply situation in his city zone? What is his trajectory? Is the fraud behaviour increasing or stable?

Progressive actioning is the answer. Match the intervention to the severity and the trajectory, not just the detection event.

First offence, low severity: warning with no penalty. You have signalled awareness without removing supply.

First offence, medium severity: penalty on the specific fraudulent rides. No suspension. The driver continues contributing genuine supply.

Repeat offence or escalating pattern: temporary suspension. Short enough that the driver can return. Long enough that the income loss is felt as a real deterrent.

Persistent repeat offender or high severity fraud: permanent ban.

The progressive ladder preserves supply from drivers who are partially legitimate. And it gives the driver a behavioural off-ramp at each stage. You are not just punishing. You are offering the choice to stop.


The Legacy Trap: How a Market Capture Decision Became a Fraud Problem

Here is the most honest thing I can tell you about how fraud really originates in ride-hailing, and it has nothing to do with detection systems.

When a large ride-hailing company entered the Indian market and offered cash as a payment option to capture supply faster, the decision looked like a growth strategy. It was. But it was also an adverse selection event that would shape the fraud profile of the platform for years afterward.

Drivers who were comfortable with digital payments, who had smartphones capable of handling app-based transactions, who were already integrated into the formal financial system, found the digital-only competitor acceptable and went there. Drivers who preferred cash, who operated outside the formal financial system, who were more comfortable with informal transaction norms, came to the cash-accepting platform.

That selection effect was invisible at the time it happened. It looked like supply growth. It was actually a decision that determined which fraud types would be most prevalent, which detection systems would need to be built, and which user education investments would be most urgent, for years afterward.

And you cannot undo it easily. By the time you realize what happened, you have a marketplace built on cash. Your drivers depend on it. Your customers expect it. Your city-level economics are calibrated around it. If you flip to digital-only overnight, you lose supply in the markets where you need it most. So you carry the legacy. You manage the fraud that cash brings because the alternative is losing the market.

This is the purest expression of what makes the Risk PM role unique. He is not just looking at his own product surface. He is living with the downstream consequences of every other PM's decisions, some of which were made years before he joined the company.

The most consequential fraud decisions are not made by the fraud team.


Geography Is Destiny

The same platform, the same app, the same incentive structure, deployed in different countries produces completely different fraud landscapes. Not variations of the same fraud. Fundamentally different fraud that requires fundamentally different responses.

Three forces drive geographic fraud variation.

Payment infrastructure. Where 2FA does not exist for card payments, the WeChat fraud syndicate model flourishes. Chinese syndicates in Australia exploited exactly this gap, acquiring stolen card numbers and selling rides through WeChat groups to genuine customers who paid cash. That fraud is structurally impossible in a market with mandatory OTP authentication on every card transaction. The fraud did not travel to India because India's payment rails closed the door before the syndicate arrived.

Cultural and social norms. Vomit fraud in the US is a product of a specific cultural and legal context. American platforms respond to passenger damage claims generously because the legal environment requires it. That generosity becomes the attack surface. The fraud exists because the platform built a mechanism to respond to legitimate claims and the mechanism became exploitable. In markets where damage claims are rarely paid out, the fraud has no purchase.

Economic conditions. The fraud that makes rational sense where a driver earns the equivalent of $3 a day is different from the fraud that makes sense where he earns $30. Incentive fraud in Southeast Asia is more sophisticated and more organised than in Western markets because the incentive bonus represents a larger percentage of total income and the economic stakes per exploit are proportionally higher.

A global fraud team running a single global rule set will always be miscalibrated somewhere. The right architecture is global infrastructure with local calibration. The BRMS, the anomaly detection platform, the graph model, those are global. The thresholds, the feature weights, the specific rules, those are local.

And the anomaly detector becomes not just useful but necessary in new markets precisely because you cannot write rules for fraud you have never seen. The anomaly detector does not need a hypothesis. It needs a baseline. It surfaces what does not fit. The local human analyst then looks at the anomaly and names it. The ML model provides the signal. The local human provides the interpretation. Neither works without the other.

Your anomaly model must also be trained on local baselines, not global ones. A driver completing 15 rides a day in Mumbai is normal. The same pattern in a small Australian city is an outlier. If your anomaly model is trained on global data, the Mumbai driver's normal behaviour looks suspicious and the Australian fraudster's behaviour looks normal relative to the global distribution. The peer group for anomaly comparison is not all drivers globally. It is drivers in the same city, same vehicle class, same time window.


The Undetected Fraud Problem

There is a measurement problem at the heart of the Risk PM's job that most teams do not confront honestly.

You can measure what you caught. You cannot directly measure what you missed. But measuring what you missed is the only honest signal of whether your detection system is actually working or just giving you the illusion of control.

The methods fraud teams use to estimate undetected fraud are all imperfect. Random sampling of unflagged completed rides and manual investigation by fraud ops teams, extrapolating the fraud rate found in the sample to the population. Post-hoc chargeback analysis, using chargebacks that arrive 60 to 90 days later to backfill fraud labels onto historical data and measure what percentage of fraud events were not flagged in real time. Cohort behaviour drift analysis, looking for groups of drivers whose earnings patterns shifted anomalously in periods before your rules were calibrated.

None of these gives you a precise number. All of them give you a lower bound on what you missed. The true number is permanently uncertain.

The Risk PM is therefore making consequential decisions, suspending accounts, blocking payments, designing entire system architectures, based on a metric he can never fully see. He is navigating by instruments that are always partially broken. And he has to be honest about that with his leadership, his team, and himself.

Every other PM can pull a dashboard and see their north star metric. The Risk PM's north star is partially obscured by definition.


The Cost Nobody Calculates

I want to end with the thing that never appears on any fraud dashboard anywhere, because it cannot. And yet it is the most important cost in the entire analysis.

The GMV loss you can calculate. The chargeback you can attribute. The fraudulent incentive payout you can trace. These are the numbers that go on the dashboard. These are the numbers the fraud team is held accountable for.

But they are not the most important numbers.

A customer who booked a ride, got taken on a longer route, felt cheated, and never opened the app again. That GMV loss is not in your fraud dashboard. It is somewhere in your retention numbers, attributed to some vague category called churn, investigated by a completely different team that has no idea fraud was the root cause.

A driver who got wrongly suspended, disputed it, got reinstated, but told every driver in his network that the platform treats you like a criminal. That supply damage is not in your fraud numbers. It is in your driver acquisition cost, months later, in a city where word of mouth turned against you.

A passenger who heard from a friend that drivers on this platform take your money and cancel. Who never downloaded the app in the first place. That GMV was never created. It has no line item anywhere. It is permanently invisible.

This is the true cost of fraud. Not what you lost. What you never gained. What was never built. The customer relationships that never started. The driver trust that was poisoned before you even had a chance to earn it. The market reputation that took years to damage and will take longer to repair.

And unlike GMV loss, which responds to better detection systems and faster rules, trust loss does not respond to technical solutions at all. You cannot write a rule that restores a customer's confidence. You cannot deploy an ML model that repairs a driver community's perception of the platform. Trust is built slowly, through thousands of ordinary interactions that go exactly as expected. It is destroyed quickly, through a handful of interactions that go catastrophically wrong.

That asymmetry means the true ROI of fraud prevention is always higher than the dashboard shows. Every fraudulent trip you prevented is not just a GMV saving. It is a trust preservation event. You kept the customer's faith intact for one more ride. You kept the driver's belief in the platform alive for one more day. You protected something that cannot be rebuilt once it is gone.

The fraud system protects the platform from its worst users at a cost borne partly by its most vulnerable ones. The undetected fraud lives in the dark corners of your data that your rules engine was never pointed at. And the deepest cost of fraud is not in any system at all. It lives in the decisions that customers and drivers make quietly and alone, to leave, to warn others, to never arrive in the first place.

The Risk PM's job is not to protect the GMV. It is to protect the thing that makes GMV possible.

That thing is trust. And it has no metric.