Skip to content

304 handling for fetch() #412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
annevk opened this issue Aug 13, 2014 · 73 comments
Closed

304 handling for fetch() #412

annevk opened this issue Aug 13, 2014 · 73 comments

Comments

@annevk
Copy link
Member

annevk commented Aug 13, 2014

Per #398 we need to decide how to handle 304 responses for fetch() and we need to decide if developers setting conditional headers has any "magic" effects. Implementations for XMLHttpRequest reportedly do different things, though I have not tested this recently. https://www.w3.org/Bugs/Public/show_bug.cgi?id=22414 has additional context.

@KenjiBaheux
Copy link
Collaborator

Currently Blink's implementation doesn't support 30x (it just fails)

Was there any discussion about this?

We believe this is a minor caveat that we could fix later. Meanwhile, affected developers should be able to avoid the issue (e.g. by avoiding any redirection). Let me know if this is narrow minded.

As for the conditional headers, I'm not sure what would happen in Blink and if/how that would be wrong. Any extra details?

Thanks.

@KenjiBaheux
Copy link
Collaborator

cc/ @annevk

@annevk
Copy link
Member Author

annevk commented Sep 5, 2014

Note that 304 is a very special case among 30x. 30x is redirects and they should work. 304 is about caching and should also do something one way or another, it cannot just fail. Either it is passed through or you get a 200 out.

Conditional headers are headers such as If-Match. Again, this is not about redirects.

@KenjiBaheux
Copy link
Collaborator

Talked to Horo-san, here is what Blink does: "if the content is in the browser cache and "Last-Modified:" is set, fetch() sends the request with "If-Modified-Since" header.
and if the server returns a 304 response, the result of fetch() will be a 200 response with the cached content."

@KenjiBaheux
Copy link
Collaborator

@annevk I think this behavior is reasonable. We could expand later for more flexibility as needed (in which case we probably don't need the impact MVP label on this issue). Let me know if I'm missing something or if there are specific scenario that you are worrying about.

@annevk
Copy link
Member Author

annevk commented Sep 17, 2014

Does Blink do the same with Etag / If-Etag-Match?

What if the web developer sets the If-Modified-Since header?

Transforming 304 into 200 is precisely the kind of "magic" that we were hoping to avoid with fetch() although we could make avoiding it an option of sorts of course. Maybe matching XMLHttpRequest's precedent is not so bad.

@KenjiBaheux
Copy link
Collaborator

@annevk re etag, I assume Blink does the same thing (will check with Horo-san and ask about the second question).

I think the magic by default, crafty as an option approach is reasonable. It sounds that this issue might not be "impacts MVP" (feels like an enhancement).

@annevk
Copy link
Member Author

annevk commented Sep 18, 2014

I'd like to hear from @sicking and @domenic and ideally some network people whether they think this is reasonable. And what the answer is to the second question of course.

@KenjiBaheux
Copy link
Collaborator

Horo-san checked fetch's behavior in Blink against the scenarios mentioned here and confirmed that it's exactly the same as XHR.

@annevk
Copy link
Member Author

annevk commented Sep 18, 2014

So currently XMLHttpRequest requires that if the developer sets the conditional header (e.g. If-Modified-Since) he will get a 304 back if the server transmitted that. Is that what happens?

@ylafon
Copy link
Member

ylafon commented Sep 18, 2014

It seems logical that setting If-Modified-Since: or If-None-Match: explicitly triggers sending back the 304 at the API level, I can imagine SW wanting to refresh cached content that way, altering only caching metadata but not the content (http://tools.ietf.org/html/rfc7232#section-4.1 and http://tools.ietf.org/html/rfc7234#section-4.3.4)

If not set, then it's up to the native browser cache to generate IMS/INM digest the (potential) 304 back and give back a 200 at the API level (where it could be detected as from the cache via an Age: header for example, if absolutely needed, see http://tools.ietf.org/html/rfc7234#section-5.1 )

@horo-t
Copy link
Member

horo-t commented Sep 18, 2014

So currently XMLHttpRequest requires that if the developer sets the conditional header (e.g. If-Modified-Since) he will get a 304 back if the server transmitted that. Is that what happens?

Yes. I'm adding tests for this.
https://codereview.chromium.org/580023002/

@domenic
Copy link
Contributor

domenic commented Sep 18, 2014

I last gave my thoughts in #348 (comment). I think making caching work transparently (which I guess means translating 304s to 200s) is a good default, and a raw-mode later would be good too.

I don't feel that strongly, but to me using the presence of If-Modified-Since or If-None-Match to switch to raw-mode seems bad. I'd rather that be done explicitly, so that there is only one flag (e.g. { ignoreCache: true }) that controls the transparent caching behavior.

@annevk
Copy link
Member Author

annevk commented Sep 18, 2014

I agree with @domenic. @horo-t, if you have not implemented the XMLHttpRequest behavior yet, would you consider not doing so and instead adding an explicit flag to get back 304 always?

By the way, if the server gave back a 304 and there's a cache miss, does the API get a 304 or a network error? It would be good if we finally nailed all of this down, and ideally in the same way for both XMLHttpRequest and fetch().

@KenjiBaheux
Copy link
Collaborator

@annevk Isn't the default behavior (XHR) just fine? I'm not sure what you meant by "not doing so and instead [...]". We should do the raw-mode flag later as an enhancement.

For the second question, I assume that this could happen if for some reason the cache got corrupted or modified/deleted. I think we should go all the way and give back the network error by default. The raw mode should give back the 304.

@ylafon
Copy link
Member

ylafon commented Sep 19, 2014

@domenic, if you prefer an explicit switch, then you should have a failure mode when setting IMS/INM in "basic" mode, at least an indication that it was not set.
@annevk, as the ETag, or the LMD used in INM/IMS is taken from a cache, it should be difficult to have a cache miss, at least the cache entry should be marked as in use to avoid it being removed from the cache while waiting for the network.

@annevk
Copy link
Member Author

annevk commented Sep 19, 2014

@KenjiBaheux no, I don't think that if the developer set one of a magic set of headers, they suddenly get back a 304 whereas they normally do not. It's somewhat weird and magical.

@KenjiBaheux
Copy link
Collaborator

@annevk I think I got it: If we went all the way to 200 in one case, we should do the same in the scenario where the developer set the headers (e.g. IMS/INM). That makes sense. @horo-t what do you think?

@horo-t
Copy link
Member

horo-t commented Sep 19, 2014

I've already implemented the same behavior of XMLHttpRequest in fetch API.

I don't have strong opinions but I think that passing the 304 response to the caller of fetch API is reasonable.
When the developers set these headers (If-None-Match, If-Modified-Since) to the request of fetch API, they must know the ETag and Last-Modified and content of the previous response.
So they don't need the 200 response and the content which is generated from the browser's cache.
And also the browser's cache may have already forgotten the previous response when fetch API is called.

@annevk
Copy link
Member Author

annevk commented Sep 19, 2014

@horo-t okay. I guess I can live with properly defining that in Fetch plus defining a flag to turn off the automatic 304 -> 200 mapping. To be clear, we only disable the automatic 304 -> 200 mapping at the moment if there's a header set whose name is one of If-None-Match, If-Modified-Since?

@domenic
Copy link
Contributor

domenic commented Sep 19, 2014

Here is how this should work.

  • fetch() with no INM/IMS and { ignoreCache: false }: INM/IMS is automatically set for you; 304s get translated to 200s. (like XHR)
  • fetch() with INM/IMS set and { ignoreCache: false }: user-set INM/IMS set, and 304s get translated to 200s. (unlike XHR)
  • fetch() with no INM/IMS set and { ignoreCache: true }: no INM/IMS automatically set; 304s returned as-is
  • fetch() with INM/IMS set and { ignoreCache: true }: user-set INM/IMS set; 304s returned as-is

The default is { ignoreCache: false }, since it is a boolean option and boolean options should always default to false.

Note how the second case is different from XHR.

@domenic
Copy link
Contributor

domenic commented Sep 19, 2014

The ignoreCache name is not perfect; other possibilities I can think of are bypassAutocache or noAutoCache or noAuto304 or similar. (Wanting to make false be the default makes the naming a little awkward, but I think it's important.)

annevk added a commit to whatwg/fetch that referenced this issue Oct 13, 2014
@annevk
Copy link
Member Author

annevk commented Oct 13, 2014

@mayhemer it's not entirely clear to me how to deal with partial content. Does us handling that mean we also pay special attention to a 206 response? What's the logic here, if we notice we have a partial cache entry, we set If-Range, then if there's a 206 response, we combine some stuff and hand out the combined response as a 200 to content? Does this only happen for media elements basically?

annevk added a commit to whatwg/fetch that referenced this issue Oct 13, 2014
@jakearchibald
Copy link
Contributor

@mayhemer if I set my cache mode to revalidate and set some If- headers, what happens if I 304 is returned?

@jakearchibald
Copy link
Contributor

I like the idea of offline, but this is a security leak right? I can work out where you've been depending on assets that are in the cache or not.

I guess we wouldn't expose fromCache on opaque responses, so no problem there.

@jakearchibald
Copy link
Contributor

(btw the rest looks great)

@annevk
Copy link
Member Author

annevk commented Oct 13, 2014

I guess it makes it easier to obtain what's already obtainable through a timing attack.

@jakearchibald
Copy link
Contributor

If it gets a "NO" from security, it could be restricted to same-origin (or if the cached response has CORS headers)

@wanderview
Copy link
Member

Not to bikeshed, but does "isFromCache" or "fromCache" refer to the HTTP cache? I assume it does. What about Response objects returned from the Cache API? I assume they shouldn't set this?

@wanderview
Copy link
Member

@jakearchibald @slightlyoff @annevk Would this cache bypass mechanism address the duplicate data concerns with the Cache API? If an origin knows it will be storing large assets in the Cache API, then it could make the decision to bypass the http cache to avoid wasting resources.

@slightlyoff
Copy link
Contributor

@wanderview : the duplication questions about Cache API aren't about the Cache API, they're about implmentation quality. The Cache API already makes it possible to entirely de-duplicate resources on-disk. If there's an issue, it's because an implementation isn't good (yet), not because the spec is any sort of problem child.

@wanderview
Copy link
Member

@slightlyoff I agree its possible, buts its added complexity. I was merely suggesting that this proposed spec change might reduce the need to pay for that complexity.

@slightlyoff
Copy link
Contributor

@wanderview : it's not added complexity for moderately advanced HTTP caches, and it's still not a spec issue. HTTP caches should already be de-duping based on hashes and only storing metadata independently. There are even filesystems and DBs which implement this, meaning that as long as the blobs are stored independently from the metadata, some impls might even get this for free.

@mnot
Copy link
Member

mnot commented Nov 24, 2014

@slightlyoff spittake

If only. De-dup was done for a brief time by Inktomi Traffic Server in the early '00s (according to their release notes and friends who worked on it), but that got ripped out sometime before it got Open Sourced as Apache Traffic Server. To my knowledge, none of the browser caches do it (happy to be proven wrong), and other proxy caches have looked at it to varying degrees, but either backed away slowly, or said "yeah, you should use a de-deplicating filesystem for that."

Anyway.

@mnot
Copy link
Member

mnot commented Nov 24, 2014

@annevk I like how that looks.

Just to be annoying, I'll map from the proposal to HTTP request cache-control directives:

  • bypass = no-store
  • revalidate = no-cache
  • force cache = max-age=[some high number]
  • offline = only-if-cached

Now, I'll be the last to defend HTTP's choice of terms for these concepts, but is it really good to introduce yet another set of terms for the same things?

@annevk
Copy link
Member Author

annevk commented Nov 24, 2014

"max-age=[some high number]" isn't really a term we could reuse, but agreed about the others. To be clear, this still needs security review as I'm not entirely convinced exposing this much of the cache is going to work.

@mnot
Copy link
Member

mnot commented Nov 25, 2014

WFM, thx.

"force cache" is almost something like "disregard freshness". If you can think of something similar that's zippier, it might be a better fit.

@mayhemer
Copy link

bypass != no-store !!!

"Cache-Control: no-store" means to cache, but not persistently. HTTP's "no-store" is actually "session only".

"bypass" in this spec draft means "there is no cache at all = don't read from cache + don't write to cache"

@annevk
Copy link
Member Author

annevk commented Nov 25, 2014

@mayhemer the description in https://httpwg.github.io/specs/rfc7234.html#cache-request-directive for no-store seems to disagree.

@mayhemer
Copy link

@annevk no, it perfectly confirms what I say and reflects how we handle no-store in Gecko.

' "MUST NOT store" in this context means that the cache MUST NOT intentionally store the information in non-volatile storage '

It means, no persistent storage media can be used. But there is nothing said about non-volatile one. Non-volatile can be used.

@mnot
Copy link
Member

mnot commented Nov 25, 2014

@mayhemer you're selectively reading the text. It very clearly says:

The "no-store" request directive indicates that a cache MUST NOT store any part of either this request or any response to it.

@mayhemer
Copy link

mayhemer commented Feb 9, 2015

"store" - what is the meaning of this word exactly?

@mayhemer
Copy link

mayhemer commented Feb 9, 2015

BTW, I just repeat what Gecko does when Cache-control: no-store header is received on a response.

The main point I originally wanted to make was that "bypass" doesn't mean "no-store" (which could be ambiguous with the cache-control header value.)

@mnot
Copy link
Member

mnot commented Feb 10, 2015

If you go on to actually read the linked text, it says:

"MUST NOT store" in this context means that the cache MUST NOT intentionally store the information in non-volatile storage, and MUST make a best-effort attempt to remove the information from volatile storage as promptly as possible after forwarding it.

@annevk
Copy link
Member Author

annevk commented Apr 5, 2015

I opened three new issues for the remainder of this thread:

If there's anything I missed please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants