API Rate Limiting and Abuse Prevention for Indian Fintech

API rate limiting is the primary defence against credential stuffing, scraping, denial-of-service, and account enumeration attacks targeting Indian fintech platforms. Without it, a single unauthenticated attacker can drain your OTP budget, enumerate user accounts, trigger cost runaway on third-party services, and exhaust RBI-mandated transaction limits — all within minutes. This guide covers every algorithm, enforcement layer, and fintech-specific control your team needs to implement now.

74%Increase in API-targeted attacks on financial services globally (Akamai State of the Internet 2024)

60%Of Indian financial sector incidents involved unauthorized access via APIs (CERT-In Annual Report 2023)

Why Fintech APIs Are High-Value Targets

Indian fintech APIs are uniquely exposed. A single endpoint — say /api/v1/send-otp — can simultaneously serve as a credential-stuffing vector, a phone-number enumeration tool, and a bulk SMS cost drain. Unlike traditional web apps, APIs return structured JSON responses that make enumeration trivially scriptable.

Four categories of abuse dominate the threat landscape:

Credential stuffing and brute force. Attackers replay leaked username-password pairs from past breaches against your login and OTP endpoints. Even a 0.1% success rate across a million-credential list translates to a thousand compromised accounts.

Account enumeration via BOLA. Broken Object-Level Authorization (BOLA) lets attackers probe /api/accounts/{id} with sequential integers. Without rate limiting, a script can enumerate every customer account in your database within hours.

Scraping and competitive intelligence theft. Product pricing APIs, loan eligibility calculators, and interest-rate feeds are scraped by bots for competitive intelligence. This also inflates your API gateway costs.

Cost runaway. Every call to an SMS gateway, a credit bureau, or a KYC verification service costs money. Unthrottled endpoints turn into unlimited credit card charges for the platform operator.

🚨

DANGER

An unprotected OTP endpoint costs real money at scale. Even a tiny per-SMS cost adds up fast at bot scale — 10,000 OTP requests per hour generates a bill your legitimate revenue cannot cover — and RBI's guidelines on OTP-based authentication make this a compliance concern, not just a cost one.

Rate-Limiting Algorithms: Which One to Use

All three mainstream algorithms appear in production fintech systems. The right choice depends on your traffic pattern and precision requirement.

Token Bucket

A bucket holds a fixed number of tokens. Each request consumes one token. Tokens refill at a constant rate. If the bucket is empty, the request is rejected with HTTP 429.

Strengths: Allows natural burst absorption (a user can exhaust their daily quota in one burst if the bucket is large enough). Straightforward to implement with Redis INCR + TTL.

Best for: Payment confirmation endpoints where a user legitimately needs to retry a few times quickly.

Leaky Bucket

Requests enter a queue and are processed at a fixed output rate. Excess requests overflow and are dropped.

Strengths: Produces perfectly smooth outbound traffic — critical when your API calls a third-party service (credit bureau, UPI switch) that has its own upstream rate limits.

Best for: Any endpoint that wraps a rate-limited downstream service such as NPCI's UPI APIs or CIBIL credit checks.

Sliding Window

The request count is measured over a rolling time window (e.g., last 60 seconds), not a fixed epoch boundary.

Strengths: No boundary-burst problem. If your limit is 100 requests per minute, a user cannot fire 100 at 11:59:59 and another 100 at 12:00:01.

Best for: Public-facing endpoints (loan applications, account lookup) where consistent fairness across all users is required.

graph TD
    A[Incoming API Request] --> B{Auth present?}
    B -- No --> C[Apply strict IP-based limit]
    B -- Yes --> D{Rate limit store lookup}
    C --> E{Limit exceeded?}
    D --> E
    E -- No --> F[Decrement token / Increment counter]
    F --> G[Forward to service handler]
    G --> H[Return 2xx response]
    E -- Yes --> I{Is bot score high?}
    I -- Yes --> J[Block - 429 with Retry-After header]
    I -- No --> K{Is user verified customer?}
    K -- Yes --> L[Throttle with 429 - backoff hint]
    K -- No --> M[Challenge - CAPTCHA or OTP re-verify]

    style A fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
    style B fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
    style C fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0
    style D fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
    style E fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
    style F fill:#1e3d2f,stroke:#10B981,color:#e2e8f0
    style G fill:#1e3d2f,stroke:#10B981,color:#e2e8f0
    style H fill:#1e3d2f,stroke:#10B981,color:#e2e8f0
    style I fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0
    style J fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0
    style K fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
    style L fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0
    style M fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0

Where to Enforce: Gateway, WAF, or Application Layer

Rate limiting is most effective when enforced at multiple layers. Each layer catches a different attack class.

Layer	Tool Examples	What It Stops	Limitations
API Gateway	AWS API Gateway, Kong, Apigee	Volumetric floods, per-key limits	No business logic context
WAF	AWS WAF, Cloudflare WAF	Bot signatures, geo-blocking, IP reputation	Cannot see auth context
Application	Custom middleware in Node/Python	Per-user, per-account, per-OTP limits	Costlier per-request compute
Service Mesh	Istio, Envoy	East-west service-to-service abuse	Overkill for most SMBs

The correct answer for Indian fintech is all three, layered. The gateway enforces raw volume caps. The WAF handles bot signatures and known malicious IP ranges published by CERT-In. The application layer enforces business-rule limits — for example, "no more than 3 OTP requests per phone number per 10 minutes."

🛡️

SECURITY

Never rely on a single enforcement layer. Attackers who discover your API gateway limits will switch to low-and-slow attacks that stay below the threshold. Application-layer limits based on business context (e.g., per-user transaction frequency) catch these.

Know your vulnerabilities before attackers do

Run a free VAPT scan — takes 5 minutes, no signup required.

Book Your Free Scan

Per-IP vs Per-User vs Per-API-Key

The identifier used as the rate-limit key determines what the limit actually protects.

Per-IP limits are easy to implement but easily bypassed. Sophisticated attackers rotate through residential proxy networks with hundreds of thousands of IPs. Use per-IP limits only as a first-pass filter at the gateway/WAF layer.

Per-user limits (keyed on authenticated session or JWT subject) are essential for protecting individual accounts from abuse. An attacker who has compromised one account should not be able to use that account to hammer your downstream services.

Per-API-key limits are the correct control for B2B fintech APIs where partners integrate directly. Each key gets its own quota; a misbehaving partner cannot affect others. Publish quota exhaustion webhooks so partners can respond programmatically.

Per-phone-number / per-PAN / per-account-number limits are fintech-specific and critical. An attacker can rotate IPs but not the victim's phone number. Binding OTP limits to the destination phone number, not the requester's IP, closes this gap.

Handling 429 Responses — What Good Looks Like

A 429 Too Many Requests response should always include a Retry-After header specifying when the client may retry. This is not courtesy — it prevents a "thundering herd" where every client retries immediately after a rate-limit window resets, creating a new spike.

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1719000000
Content-Type: application/json

{"error": "rate_limit_exceeded", "retry_after_seconds": 60}

For authenticated users, distinguish between a soft throttle (slow the request down, allow it with delay) and a hard block (reject entirely). Reserve hard blocks for endpoints with high fraud risk — OTP generation, password reset, KYC document upload.

⚠️

WARNING

Never silently drop rate-limited requests. If you return a non-standard error code (200 with an error body, or 503 without headers), your partner integrations will retry immediately and create a denial-of-service against your own platform.

Bot Detection for Fintech APIs

Pure rate limiting stops volumetric attacks but misses sophisticated bots that stay under per-IP thresholds. Layer in bot detection signals:

Headless browser fingerprinting. Legitimate fintech mobile apps and web frontends send consistent TLS fingerprints, HTTP/2 settings, and device metadata. Anomalies indicate Selenium/Playwright-based bots.

Behavioral analysis. Human users exhibit jitter in inter-request timing. Bots send requests at perfectly uniform intervals. Measure coefficient of variation in request timing per session.

Honeypot fields. Add a hidden form field to your web flows that no human fills in but bots reliably populate. Any submission with a populated honeypot is blocked without revealing the mechanism.

CERT-In threat intelligence feeds. CERT-In publishes IP reputation data and malicious actor lists at cert-in.org.in. Consuming these feeds and blocking known-bad IPs at the gateway is mandatory for any RBI-regulated entity.

Fintech-Specific Concerns: RBI and Account Enumeration

RBI's Master Directions on Digital Payment Security Controls (2021) and subsequent guidelines on mobile and internet banking impose specific requirements that intersect with rate limiting:

OTP generation limits. RBI mandates that OTP-based authentication channels be protected against brute force. While the exact numeric limit is implementation-defined, a maximum of 3–5 OTP attempts per session with mandatory lockout is industry standard and auditor-expected.

Transaction velocity checks. UPI and card transaction APIs must apply velocity controls — limits on transaction count and value per unit time per customer. These are business-logic rate limits enforced at the application layer, distinct from infrastructure-level throttling.

Account enumeration prevention. Registration and account lookup APIs should return identical responses for existing and non-existing accounts. Timing attacks (where the response for a valid account is measurably faster) are a known BOLA variant. Use constant-time comparison and deliberate response-time normalization.

Audit logging. Every 429 response, every blocked request, and every triggered bot-detection rule should be logged to an immutable audit store. SEBI's guidelines for regulated entities (sebi.gov.in) require incident traceability, and your audit log is the evidence.

xychart-beta
    title "API Abuse Type Distribution in Fintech Platforms - Illustrative"
    x-axis ["Credential Stuffing", "Account Enumeration", "OTP Flooding", "Data Scraping", "DoS-DDoS"]
    y-axis "Share of Incidents Percent" 0 --> 40
    bar [34, 22, 18, 16, 10]

Implementation Checklist

Control	Layer	Priority
Global request cap per IP	Gateway/WAF	P0 — deploy first
Per-user authenticated limit	Application	P0
OTP generation limit per phone number	Application	P0 — RBI aligned
Per-API-key quota with webhook alerts	Gateway	P1
Retry-After header on every 429	Application	P1
Bot score signal integration	WAF	P1
Honeypot fields on web forms	Frontend	P2
CERT-In IP feed subscription	Gateway/WAF	P2
Audit log for all blocked requests	Application	P0
Constant-time response for account lookup	Application	P1

Practical Redis Implementation Pattern

For teams building in Node.js or Python, Redis is the standard backing store for rate limit state. The sliding window pattern using ZADD and ZREMRANGEBYSCORE is accurate and atomic:

ZADD key <timestamp_ms> <request_id> — record the request
ZREMRANGEBYSCORE key 0 <window_start_ms> — expire old entries
ZCARD key — count requests in the current window
Compare count against limit; return 429 if exceeded
EXPIRE key <window_seconds> — ensure the key auto-deletes

This pattern avoids the race conditions that plague naive INCR + EXPIRE implementations and handles Redis cluster topology correctly via Lua scripting. NIST SP 800-204A provides the authoritative architecture guidance for API security patterns at nist.gov.

🎯Key Takeaway

Rate limiting in fintech is not optional infrastructure — it is your primary defence against credential stuffing, account enumeration, OTP cost drain, and BOLA attacks. Enforce at three layers (gateway + WAF + application), key limits to business identifiers (phone number, user ID, API key) rather than IP alone, always return Retry-After on 429, and log every blocked request to an immutable audit trail for RBI and SEBI compliance.

Getting Your API Defences Audited

Even a well-designed rate-limiting implementation can have gaps — misconfigured bypass rules, missing limits on internal endpoints, or Redis clustering issues that allow limit evasion under failover. Bachao.AI by Dhisattva AI Pvt Ltd runs automated VAPT scans that specifically test rate-limit bypass, BOLA enumeration, and OTP flooding against your live API surface. You can start with a free VAPT scan to identify which of your endpoints are exposed before an attacker finds them first.

For more security implementation guides tailored to Indian tech companies, visit the Bachao.AI blog.

Frequently Asked Questions

What is the best rate-limiting algorithm for a fintech OTP API?

The sliding window algorithm is recommended because it prevents boundary bursts — attackers cannot fire 100 requests at the end of one minute window and another 100 at the start of the next. Pair it with a per-phone-number key (not per-IP) and a hard limit of 3–5 OTP requests per 10-minute window.

Does RBI require rate limiting on payment APIs?

RBI's Master Directions on Digital Payment Security Controls mandate protection against brute force and credential stuffing on authentication channels. While specific numeric thresholds are implementation-defined, any RBI-regulated entity should enforce OTP attempt limits, transaction velocity controls, and audit logging of blocked requests — all of which are forms of rate limiting.

What HTTP status code should a rate-limited API return?

Always return HTTP 429 Too Many Requests with a Retry-After header. Never use 200 with an error body, 503, or 200 with a blank response — these cause client SDKs to retry immediately, turning your rate limit into a self-inflicted denial of service.

How do I stop credential stuffing if attackers use rotating IPs?

Shift your rate-limit key from IP address to the target identifier — the phone number, email address, or account ID being authenticated against. An attacker can rotate IPs but cannot change the victim's phone number. Also integrate CERT-In threat intelligence feeds and layer in bot detection signals such as TLS fingerprinting and request timing analysis.

What is BOLA enumeration and how does rate limiting help?

BOLA (Broken Object-Level Authorization) attacks probe sequential or guessable object IDs in API paths — for example, /api/loans/1001, /api/loans/1002 — to access records belonging to other users. Rate limiting reduces enumeration speed, but the primary fix is proper authorization checks on every object access. Rate limiting should be combined with consistent response times and non-sequential IDs (UUIDs) to fully close this vulnerability class.

Should rate limits be enforced at the API gateway or in application code?

Both. The API gateway enforces global volumetric limits before requests reach your application, protecting downstream services and reducing compute cost. Application-layer limits enforce business rules — per-user, per-account, per-OTP constraints — that the gateway cannot enforce without business context. A single enforcement point is insufficient against layered attack strategies.

API Rate Limiting and Abuse Prevention for Indian Fintech

Get cybersecurity insights for Indian SMBs

Exposed to CVEs like this? Run a free VAPT scan — risk score in 2 hours, no credit card.