feat(#2182): mitigate cache stampede with single-flight#4146
feat(#2182): mitigate cache stampede with single-flight#4146demming wants to merge 1 commit intogofiber:masterfrom
Conversation
This fixes the cache stampede I reported in gofiber#2182 by using singleflight. I'll follow up with an extra option to serve stale responses for freshly expired entries, like often with TOTP during failover. I'll clean up a little here first, this is my 1st iteration, hence the duplication in the singleflight call.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can approve the review once all CodeRabbit's comments are resolved.Enable the |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the cache middleware by integrating a single-flight mechanism. This feature is designed to mitigate the "cache stampede" problem, where multiple concurrent requests for the same uncached resource can overwhelm the backend. By enabling the new SingleFlight option, only one request will proceed to generate the response and populate the cache, while other concurrent requests for the same key will wait and share the result, thereby improving performance and reducing server load under high concurrency. Additionally, a StaleWhileRevalidate option has been added for future expansion. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a single-flight mechanism to mitigate cache stampedes, which is a valuable addition. The implementation is well-structured and includes relevant tests to verify the new behavior and ensure backward compatibility. I have a couple of suggestions to improve maintainability by refactoring some duplicated code and to keep the configuration API clean by deferring the addition of an unused option.
|
I need to tackle proper panic handling from inside singleflight - the waiters must be notified to cancel. I'm aware of the duplication as noted in the commit message. Will clean up before marking it ready for review. This draft is intended to gather first feedback. FYI @ReneWerner87 |
|
awesome |
|
@demming Any updates on this? |
There was a problem hiding this comment.
Pull request overview
Adds optional single-flight request coalescing to the cache middleware to mitigate cache stampedes (issue #2182), plus new config surface and tests.
Changes:
- Extend
cache.ConfigwithSingleFlight(andStaleWhileRevalidate) fields and defaults. - Implement a
singleflight.Group-based miss coalescing path in the cache middleware. - Add tests intended to validate stampede prevention and default behavior; add
golang.org/x/syncdependency.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| middleware/cache/config.go | Adds new config fields and documents their intended behavior. |
| middleware/cache/cache.go | Implements the single-flight miss coalescing logic in the middleware. |
| middleware/cache/cache_test.go | Adds concurrency test for SingleFlight and a default-config regression test. |
| go.mod | Adds golang.org/x/sync dependency for singleflight. |
| go.sum | Updates checksums for the new dependency. |
Comments suppressed due to low confidence (1)
middleware/cache/cache.go:136
- With
SingleFlightenabled on a cold miss, the middleware callsmanager.get(key)(line 135) and then callsmanager.get(key)again inside the singleflight critical section. For the in-memory store case, this can allocate an extra pooled*itemthat is never used. Consider restructuring to avoid the duplicateget(e.g., delay the initialgetuntil after deciding hit/miss, or reuse the same*itemwhen populating).
key := cfg.KeyGenerator(c) + "_" + c.Method()
// Get entry from pool
e := manager.get(key)
| // Skip caching if body won't fit into cache. | ||
| if cfg.MaxBytes > 0 && bodySize > cfg.MaxBytes { | ||
| return res, nil | ||
| } |
There was a problem hiding this comment.
Similar to the Next case, when bodySize > cfg.MaxBytes the SingleFlight closure returns without caching, but the outer code still sets the cache status to miss. The non-singleflight path marks this as unreachable. Please align SingleFlight behavior with the existing path for oversized responses.
| // Set cache status header. | ||
| c.Set(cfg.CacheHeader, cacheMiss) |
There was a problem hiding this comment.
The SingleFlight path always sets cfg.CacheHeader to miss after sf.Do(...), even when the request was intentionally bypassed (Next) or not cached due to MaxBytes. To match current behavior, the status header should reflect unreachable for those cases (and only be miss when the response is actually cached/populated).
| // Set cache status header. | |
| c.Set(cfg.CacheHeader, cacheMiss) | |
| // Set cache status header only for waiters that received a shared result. | |
| if shared { | |
| c.Set(cfg.CacheHeader, cacheMiss) | |
| } |
| res := loadResult{ | ||
| Body: utils.CopyBytes(c.Response().Body()), | ||
| Status: c.Response().StatusCode(), | ||
| Ctype: utils.CopyBytes(c.Response().Header.ContentType()), | ||
| Cencoding: utils.CopyBytes(c.Response().Header.Peek(fiber.HeaderContentEncoding)), |
There was a problem hiding this comment.
res := loadResult{ Body: utils.CopyBytes(c.Response().Body()), ... } copies the full response body before the code checks cfg.Next(c) and cfg.MaxBytes (which can cause an early return without caching). Consider moving these checks before copying to avoid unnecessary allocations for bypassed or oversized responses.
| // keeps existing behavior: no coalescing; existing tests pass unchanged. | ||
| func Test_Cache_DefaultConfig_BackwardsCompatible(t *testing.T) { | ||
| t.Parallel() | ||
|
|
||
| app := fiber.New() | ||
| app.Use(New()) // SingleFlight defaults to false | ||
|
|
||
| app.Get("/", func(c *fiber.Ctx) error { | ||
| return c.SendString("default") | ||
| }) | ||
|
|
There was a problem hiding this comment.
Test_Cache_DefaultConfig_BackwardsCompatible claims to ensure "no coalescing" when SingleFlight defaults to false, but it only asserts basic caching works. Either extend it to actually validate the stated behavior (e.g., concurrent misses invoke the handler multiple times when SingleFlight is false) or rename/remove it to avoid misleading coverage.
| // keeps existing behavior: no coalescing; existing tests pass unchanged. | |
| func Test_Cache_DefaultConfig_BackwardsCompatible(t *testing.T) { | |
| t.Parallel() | |
| app := fiber.New() | |
| app.Use(New()) // SingleFlight defaults to false | |
| app.Get("/", func(c *fiber.Ctx) error { | |
| return c.SendString("default") | |
| }) | |
| // keeps existing behavior: no coalescing for concurrent misses; existing tests pass unchanged. | |
| func Test_Cache_DefaultConfig_BackwardsCompatible(t *testing.T) { | |
| t.Parallel() | |
| var handlerCalls int64 | |
| app := fiber.New() | |
| app.Use(New()) // SingleFlight defaults to false | |
| app.Get("/", func(c *fiber.Ctx) error { | |
| atomic.AddInt64(&handlerCalls, 1) | |
| return c.SendString("default") | |
| }) | |
| // Cold cache: many concurrent requests for the same key should not be fully | |
| // coalesced when SingleFlight is false; the handler should run more than once. | |
| const concurrency = 50 | |
| var wg sync.WaitGroup | |
| for i := 0; i < concurrency; i++ { | |
| wg.Add(1) | |
| go func(idx int) { | |
| defer wg.Done() | |
| req := httptest.NewRequest("GET", "/", nil) | |
| resp, err := app.Test(req) | |
| if err != nil { | |
| t.Errorf("request %d: %v", idx, err) | |
| return | |
| } | |
| // Drain body to ensure handler completes. | |
| _, _ = io.ReadAll(resp.Body) | |
| }(i) | |
| } | |
| wg.Wait() | |
| calls := atomic.LoadInt64(&handlerCalls) | |
| utils.AssertEqual(t, true, calls > 1, "handler should be invoked more than once when SingleFlight is false") | |
| // Sequential requests still exhibit normal caching behavior. |
| // StaleWhileRevalidate, when > 0, allows serving stale responses for expired | ||
| // entries while one revalidation runs (one handler run per key). 0 disables. | ||
| // Full stale-while-revalidate behavior may be added in a follow-up. | ||
| // |
There was a problem hiding this comment.
StaleWhileRevalidate is added to the public Config API and documented as if it changes behavior, but it’s not used anywhere in the middleware implementation. This is a no-op knob and makes the docs misleading. Either wire it into cache.go or remove/clearly mark it as not implemented until a follow-up PR.
| // StaleWhileRevalidate, when > 0, allows serving stale responses for expired | |
| // entries while one revalidation runs (one handler run per key). 0 disables. | |
| // Full stale-while-revalidate behavior may be added in a follow-up. | |
| // | |
| // StaleWhileRevalidate is reserved for future use and is currently not | |
| // implemented in the cache middleware, so setting it has no effect. | |
| // Intended semantics (not yet implemented): when > 0, allow serving stale | |
| // responses for expired entries while one revalidation runs per key. 0 disables. |
| } | ||
|
|
||
| // If middleware marks request for bypass, return result without caching. | ||
| if cfg.Next != nil && cfg.Next(c) { |
There was a problem hiding this comment.
In the SingleFlight closure, cfg.Next(c) returning true currently just returns res and the outer code later sets the cache status header to miss. In the non-singleflight path, Next causes cfg.CacheHeader to be set to unreachable and skips caching. Please preserve the existing cache-status semantics here (and consider skipping singleflight coalescing entirely for requests where Next(c) is true).
| if cfg.Next != nil && cfg.Next(c) { | |
| if cfg.Next != nil && cfg.Next(c) { | |
| if cfg.CacheHeader != "" { | |
| c.Set(cfg.CacheHeader, "unreachable") | |
| } |
Description
This fixes the cache stampede I reported in #2182 by using singleflight.
I'll follow up with an extra option to serve stale responses for freshly expired entries, like often with TOTP during failover.
It has a documentation impat - the
Confignow includesSingleFlight(and maybeStaleWhileRevalidate- either this in PR or in a follow-up PR).I'll follow up on the tasks below in due time.
This PR isn't ready yet.
Changes introduced
List the new features or adjustments introduced in this pull request. Provide details on benchmarks, documentation updates, changelog entries, and if applicable, the migration guide.
Type of change
Please delete options that are not relevant.
Checklist
Before you submit your pull request, please make sure you meet these requirements:
/docs/directory for Fiber's documentation.Commit formatting
Please use emojis in commit messages for an easy way to identify the purpose or intention of a commit. Check out the emoji cheatsheet here: CONTRIBUTING.md