Every endpoint in your auth system is an abuse target. The login route invites credential stuffing. The password-reset route invites enumeration and spam. And the OTP-send route invites something most teams don't think about until the invoice arrives: an attacker hammering it to make you pay for thousands of SMS messages. Rate limiting is the cheapest, highest-leverage defense against all three — and the one most homegrown auth systems get subtly wrong by reaching for a single global limit.

This is how to do it properly: the layered-window pattern that stops both bursts and slow grinding, keying on the dimension that actually matters, and the one decision — fail open or fail closed — that separates a rate limiter that protects you from one that locks out your users or runs up your bill. The examples are NamoID's actual limits.

The three things you're defending against

Different endpoints face different attacks, and the limits should reflect that:

Login → brute force and credential stuffing. Attackers replay leaked username/password pairs at scale. You want to throttle attempts hard without locking out a user who just fat-fingered their password twice.
OTP send → billing amplification. This is the one people miss. Every OTP you send over SMS costs money. An unthrottled send-otp endpoint is a button an attacker can press ten thousand times to turn your messaging budget into a denial-of-wallet attack — while also flooding real users' phones. The cost of OTP delivery makes this endpoint uniquely dangerous.
Password reset / verification → enumeration and spam. Unthrottled, these leak which accounts exist and let attackers spam reset emails.

One global "100 requests per minute" limit defends none of these well. Each needs its own budget.

Layered windows: burst plus sustained

The core mistake is a single window. If you only set "5 per minute," an attacker does 5 every minute, all day — 7,200 attempts a day, perfectly within your limit. If you only set "30 per hour," they fire all 30 in the first two seconds. You need both: a tight short window to stop bursts, and a looser long window to stop slow grinding.

Here are NamoID's actual limits, per action — note that every sensitive endpoint carries two rules:

Action	Burst limit	Sustained limit
Login	5 / min	30 / hour
Register	3 / hour	20 / day
Password reset	3 / hour	—
OTP send	3 / min	10 / hour
OTP resend	1 / min	5 / hour
OTP verify	5 / min	30 / hour
MFA verify	5 / min	30 / hour

The OTP numbers are deliberately the tightest. "1 resend per minute, 5 per hour" makes a billing-amplification attack pointless while staying invisible to a real user who needs one more code. The two-window shape is the pattern; the exact numbers you tune to your traffic.

Key on the right dimension — and hash it

What you count per matters as much as the number. Limit only by IP and a distributed attack from a residential-proxy pool sails through. Limit only by account and a single IP can spray across thousands of accounts. So key on both: per-identifier (the email/phone/account being targeted) and per-IP, and trip on whichever blows first.

One detail teams overlook: don't use raw emails or phone numbers as your Redis keys. That turns your rate-limit store into a plaintext directory of who uses your service. Hash the identifier first. NamoID keys every limit on a SHA-256 hash of the identifier, so the Redis store holds opaque digests, not personal data — the same privacy discipline you'd apply anywhere else PII shows up.

The decision that defines your limiter: fail open or fail closed

Your rate limiter depends on Redis. So what happens when Redis is down? You have two choices, and picking one globally is the mistake:

Fail open — if the limiter can't reach Redis, allow the request. Prioritizes availability. Right for login: a Redis blip shouldn't lock every user out of your product.
Fail closed — if the limiter can't reach Redis, deny the request. Prioritizes protection. Right for OTP send: a Redis outage must never become an unlimited SMS firehose. Failing open here means a five-minute Redis incident could become a five-figure messaging bill.

The endpoint decides. NamoID's limiter takes a fail_closed flag per call: login and most paths fail open for availability, but OTP send fails closed — because the worst case for "allow the login" is a few extra attempts, and the worst case for "allow the OTP" is a billing attack. Make this choice deliberately for every limited endpoint; the default your library happens to ship is almost never right for all of them.

Make it auditable

A rate limit that silently drops requests tells you nothing. When a limit trips, emit an event — who (a hashed key), which scope, when — so abuse is visible. NamoID writes a security.rate_limit_exceeded event to its append-only audit store on every trip, which turns "we have rate limiting" into "here's the record of the attack we absorbed last Tuesday." That record is also what feeds anomaly detection later.

Implementation sketch

The mechanism is a Redis counter per (scope, key, window) bucket, incremented and given a TTL, checked against each rule:

async def enforce(scope, identifier, ip, *, fail_closed=False):
    key_hash = sha256(identifier)[:32]            # never store raw PII
    try:
        for rule in RATE_LIMITS[scope]:           # e.g. (5/min, 30/hour)
            bucket = f"rl:{scope}:{key_hash}:{rule.window}:{now // rule.window}"
            count = await redis.incr(bucket)
            await redis.expire(bucket, rule.window + 10)
            if count > rule.max_requests:
                emit("security.rate_limit_exceeded", scope, key_hash)
                raise RateLimited(retry_after=rule.window - now % rule.window)
    except RedisError:
        if fail_closed:
            raise RateLimited("storage unavailable")  # OTP send: deny
        return                                        # login: allow

Return a Retry-After so well-behaved clients back off instead of retrying into the wall.

Gotchas worth knowing

Fixed-window edge bursts. Counting per calendar window lets an attacker do max at the end of one window and max at the start of the next — up to 2× in a short span across the boundary. For most auth endpoints that's acceptable; if you need precision, a sliding-window or token-bucket algorithm smooths it out at more cost.
Distributed stuffing defeats per-IP. Residential-proxy botnets rotate IPs, so per-IP limits alone won't catch them — which is why you also key per-account and layer in credential-stuffing defenses like breached-password checks and MFA.
Don't leak existence through limits. If a non-existent account hits a different limit (or error) than a real one, you've built an enumeration oracle. Keep responses and limits uniform.

FAQ

One limit or two per endpoint? Two — a short burst window and a long sustained window. A single window is trivially gamed from one side or the other.

Should login fail open or closed on a Redis outage? Open, for availability — a Redis blip shouldn't lock out your whole user base. OTP send should fail closed, so an outage can't become a billing attack.

Why hash the rate-limit key? So your Redis store isn't a plaintext list of your users' emails and phone numbers. Hash the identifier; key on the digest.

Is rate limiting enough to stop credential stuffing? No — it's one layer. Pair it with breached-password detection, MFA, and anomaly monitoring. Rate limiting slows the attack; the other layers catch what slips through.

Throttle the abuse, not your users

Good auth rate limiting is a few deliberate choices: layered burst-plus-sustained windows, keys on both identifier and IP, hashed so they hold no PII, a fail-open/fail-closed decision made per endpoint, and an audit event when a limit trips. Get the OTP path right in particular — that's the one that's not just a security control but a guard on your bill.

NamoID ships all of this — per-scope layered limits in Redis, hashed keys, fail-closed OTP sends, and security.rate_limit_exceeded events in the audit trail — as part of the identity layer, so credential-stuffing and OTP-flood defense are defaults, not homework.