API rate limiting is the primary defence against credential stuffing, scraping, denial-of-service, and account enumeration attacks targeting Indian fintech platforms. Without it, a single unauthenticated attacker can drain your OTP budget, enumerate user accounts, trigger cost runaway on third-party services, and exhaust RBI-mandated transaction limits — all within minutes. This guide covers every algorithm, enforcement layer, and fintech-specific control your team needs to implement now.
Why Fintech APIs Are High-Value Targets
Indian fintech APIs are uniquely exposed. A single endpoint — say /api/v1/send-otp — can simultaneously serve as a credential-stuffing vector, a phone-number enumeration tool, and a bulk SMS cost drain. Unlike traditional web apps, APIs return structured JSON responses that make enumeration trivially scriptable.
Four categories of abuse dominate the threat landscape:
Credential stuffing and brute force. Attackers replay leaked username-password pairs from past breaches against your login and OTP endpoints. Even a 0.1% success rate across a million-credential list translates to a thousand compromised accounts.
Account enumeration via BOLA. Broken Object-Level Authorization (BOLA) lets attackers probe /api/accounts/{id} with sequential integers. Without rate limiting, a script can enumerate every customer account in your database within hours.
Scraping and competitive intelligence theft. Product pricing APIs, loan eligibility calculators, and interest-rate feeds are scraped by bots for competitive intelligence. This also inflates your API gateway costs.
Cost runaway. Every call to an SMS gateway, a credit bureau, or a KYC verification service costs money. Unthrottled endpoints turn into unlimited credit card charges for the platform operator.
Rate-Limiting Algorithms: Which One to Use
All three mainstream algorithms appear in production fintech systems. The right choice depends on your traffic pattern and precision requirement.
Token Bucket
A bucket holds a fixed number of tokens. Each request consumes one token. Tokens refill at a constant rate. If the bucket is empty, the request is rejected with HTTP 429.
Strengths: Allows natural burst absorption (a user can exhaust their daily quota in one burst if the bucket is large enough). Straightforward to implement with Redis INCR + TTL.
Best for: Payment confirmation endpoints where a user legitimately needs to retry a few times quickly.
Leaky Bucket
Requests enter a queue and are processed at a fixed output rate. Excess requests overflow and are dropped.
Strengths: Produces perfectly smooth outbound traffic — critical when your API calls a third-party service (credit bureau, UPI switch) that has its own upstream rate limits.
Best for: Any endpoint that wraps a rate-limited downstream service such as NPCI's UPI APIs or CIBIL credit checks.
Sliding Window
The request count is measured over a rolling time window (e.g., last 60 seconds), not a fixed epoch boundary.
Strengths: No boundary-burst problem. If your limit is 100 requests per minute, a user cannot fire 100 at 11:59:59 and another 100 at 12:00:01.
Best for: Public-facing endpoints (loan applications, account lookup) where consistent fairness across all users is required.
graph TD
A[Incoming API Request] --> B{Auth present?}
B -- No --> C[Apply strict IP-based limit]
B -- Yes --> D{Rate limit store lookup}
C --> E{Limit exceeded?}
D --> E
E -- No --> F[Decrement token / Increment counter]
F --> G[Forward to service handler]
G --> H[Return 2xx response]
E -- Yes --> I{Is bot score high?}
I -- Yes --> J[Block - 429 with Retry-After header]
I -- No --> K{Is user verified customer?}
K -- Yes --> L[Throttle with 429 - backoff hint]
K -- No --> M[Challenge - CAPTCHA or OTP re-verify]
style A fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
style B fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
style C fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0
style D fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
style E fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
style F fill:#1e3d2f,stroke:#10B981,color:#e2e8f0
style G fill:#1e3d2f,stroke:#10B981,color:#e2e8f0
style H fill:#1e3d2f,stroke:#10B981,color:#e2e8f0
style I fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0
style J fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0
style K fill:#1e3a5f,stroke:#3B82F6,color:#e2e8f0
style L fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0
style M fill:#5f1e1e,stroke:#EF4444,color:#e2e8f0Where to Enforce: Gateway, WAF, or Application Layer
Rate limiting is most effective when enforced at multiple layers. Each layer catches a different attack class.
| Layer | Tool Examples | What It Stops | Limitations |
|---|---|---|---|
| API Gateway | AWS API Gateway, Kong, Apigee | Volumetric floods, per-key limits | No business logic context |
| WAF | AWS WAF, Cloudflare WAF | Bot signatures, geo-blocking, IP reputation | Cannot see auth context |
| Application | Custom middleware in Node/Python | Per-user, per-account, per-OTP limits | Costlier per-request compute |
| Service Mesh | Istio, Envoy | East-west service-to-service abuse | Overkill for most SMBs |
Know your vulnerabilities before attackers do
Run a free VAPT scan — takes 5 minutes, no signup required.
Book Your Free ScanPer-IP vs Per-User vs Per-API-Key
The identifier used as the rate-limit key determines what the limit actually protects.
Per-IP limits are easy to implement but easily bypassed. Sophisticated attackers rotate through residential proxy networks with hundreds of thousands of IPs. Use per-IP limits only as a first-pass filter at the gateway/WAF layer.
Per-user limits (keyed on authenticated session or JWT subject) are essential for protecting individual accounts from abuse. An attacker who has compromised one account should not be able to use that account to hammer your downstream services.
Per-API-key limits are the correct control for B2B fintech APIs where partners integrate directly. Each key gets its own quota; a misbehaving partner cannot affect others. Publish quota exhaustion webhooks so partners can respond programmatically.
Per-phone-number / per-PAN / per-account-number limits are fintech-specific and critical. An attacker can rotate IPs but not the victim's phone number. Binding OTP limits to the destination phone number, not the requester's IP, closes this gap.
Handling 429 Responses — What Good Looks Like
A 429 Too Many Requests response should always include a Retry-After header specifying when the client may retry. This is not courtesy — it prevents a "thundering herd" where every client retries immediately after a rate-limit window resets, creating a new spike.
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1719000000
Content-Type: application/json
{"error": "rate_limit_exceeded", "retry_after_seconds": 60}For authenticated users, distinguish between a soft throttle (slow the request down, allow it with delay) and a hard block (reject entirely). Reserve hard blocks for endpoints with high fraud risk — OTP generation, password reset, KYC document upload.
Bot Detection for Fintech APIs
Pure rate limiting stops volumetric attacks but misses sophisticated bots that stay under per-IP thresholds. Layer in bot detection signals:
Headless browser fingerprinting. Legitimate fintech mobile apps and web frontends send consistent TLS fingerprints, HTTP/2 settings, and device metadata. Anomalies indicate Selenium/Playwright-based bots.
Behavioral analysis. Human users exhibit jitter in inter-request timing. Bots send requests at perfectly uniform intervals. Measure coefficient of variation in request timing per session.
Honeypot fields. Add a hidden form field to your web flows that no human fills in but bots reliably populate. Any submission with a populated honeypot is blocked without revealing the mechanism.
CERT-In threat intelligence feeds. CERT-In publishes IP reputation data and malicious actor lists at cert-in.org.in. Consuming these feeds and blocking known-bad IPs at the gateway is mandatory for any RBI-regulated entity.
Fintech-Specific Concerns: RBI and Account Enumeration
RBI's Master Directions on Digital Payment Security Controls (2021) and subsequent guidelines on mobile and internet banking impose specific requirements that intersect with rate limiting:
OTP generation limits. RBI mandates that OTP-based authentication channels be protected against brute force. While the exact numeric limit is implementation-defined, a maximum of 3–5 OTP attempts per session with mandatory lockout is industry standard and auditor-expected.
Transaction velocity checks. UPI and card transaction APIs must apply velocity controls — limits on transaction count and value per unit time per customer. These are business-logic rate limits enforced at the application layer, distinct from infrastructure-level throttling.
Account enumeration prevention. Registration and account lookup APIs should return identical responses for existing and non-existing accounts. Timing attacks (where the response for a valid account is measurably faster) are a known BOLA variant. Use constant-time comparison and deliberate response-time normalization.
Audit logging. Every 429 response, every blocked request, and every triggered bot-detection rule should be logged to an immutable audit store. SEBI's guidelines for regulated entities (sebi.gov.in) require incident traceability, and your audit log is the evidence.
xychart-beta
title "API Abuse Type Distribution in Fintech Platforms - Illustrative"
x-axis ["Credential Stuffing", "Account Enumeration", "OTP Flooding", "Data Scraping", "DoS-DDoS"]
y-axis "Share of Incidents Percent" 0 --> 40
bar [34, 22, 18, 16, 10]Implementation Checklist
| Control | Layer | Priority |
|---|---|---|
| Global request cap per IP | Gateway/WAF | P0 — deploy first |
| Per-user authenticated limit | Application | P0 |
| OTP generation limit per phone number | Application | P0 — RBI aligned |
| Per-API-key quota with webhook alerts | Gateway | P1 |
| Retry-After header on every 429 | Application | P1 |
| Bot score signal integration | WAF | P1 |
| Honeypot fields on web forms | Frontend | P2 |
| CERT-In IP feed subscription | Gateway/WAF | P2 |
| Audit log for all blocked requests | Application | P0 |
| Constant-time response for account lookup | Application | P1 |
Practical Redis Implementation Pattern
For teams building in Node.js or Python, Redis is the standard backing store for rate limit state. The sliding window pattern using ZADD and ZREMRANGEBYSCORE is accurate and atomic:
ZADD key <timestamp_ms> <request_id>— record the requestZREMRANGEBYSCORE key 0 <window_start_ms>— expire old entriesZCARD key— count requests in the current window- Compare count against limit; return 429 if exceeded
EXPIRE key <window_seconds>— ensure the key auto-deletes
INCR + EXPIRE implementations and handles Redis cluster topology correctly via Lua scripting. NIST SP 800-204A provides the authoritative architecture guidance for API security patterns at nist.gov.
Getting Your API Defences Audited
Even a well-designed rate-limiting implementation can have gaps — misconfigured bypass rules, missing limits on internal endpoints, or Redis clustering issues that allow limit evasion under failover. Bachao.AI by Dhisattva AI Pvt Ltd runs automated VAPT scans that specifically test rate-limit bypass, BOLA enumeration, and OTP flooding against your live API surface. You can start with a free VAPT scan to identify which of your endpoints are exposed before an attacker finds them first.
For more security implementation guides tailored to Indian tech companies, visit the Bachao.AI blog.