Targeted Audience: Senior front-end developer, Solution architect
Background
It is agile to keep releasing new versions of the frontend application to the users. However, it is annoying that the users’ browser does NOT get the latest version in the first place and keeps using the cached content, and eventually behaves differently from expected. It is a big headache to both developers and users.
A quick but temporary solution to fix this cache issue is to force the browser to clear the cache, by pressing Ctrl + F5. But it is too much for the users without IT background to do it.
This blog is to look at a permanent solution which can disable the browser cache and always fetch the web content from the server. And the solution should cover most of the browsers, including those ‘ancient’ ones and their variants.
Note
1, There are three types of web applications (please refer to Appendix 1), this blog just talks about the Single Page Application(SPA).
2, The cache can exist at browser, CDN, load balance or many other layers. The lack of cache refreshes at CDN side can cause the similar symptoms. However, this blog won’t talk about CDN cache or others, but just browser cache. You can find some solutions related to CDN cache, like invalidate in AWS CloudFront ([1]), or purge the content in Azure CDN.
3, This browser cache issue is more common in some specific countries, where a lot of ancient browsers like IE6 or their variants are still being used.
Why the cache is an issue?
There is a great blog [2] Avoid cache trap when serving Angular app which introduces a scenario when the new release has been deployed but the old and cached index.html
in the browser still takes effect until a route is asked.
We all love cache, I think. It helps to load our apps faster, lower some load from the server, and let our users a great user experience in our app.
But sometimes, this cache is working against us and causing our users not to get the last version of the app, which is probably because of a wrong cache configuration in our server.
To avoid cache issues in our Angular application and manage app versioning, when we are building our app to production using
ng build --prod
Angular adds (by default) a hash to our js files and updates ourindex.html
file to refer to the hash files. When we deploy a new version, the hash keys are changed, and when the user asks the site again, theindex.html
will ask to load the new files from the server. Since the browser doesn't have those files in the cache, it will get them from the server.So it looks like Angular guys cover us? Well, not entirely. The problems are starting when the
index.html
file is cached. Let’s say we just deployed a new version of our app, and the static files, andindex.html
file with them, are cached. In this scenario, when the user is starting to use our app from the main URL, he is getting the old version of our app because the cachedindex.html
asks to load the old js files, and they will probably load from the browser cache.But let’s move on with the scenario and complicate it a little. The user uses our app (remember — with the old version), moving between screens, and doing some actions. He decided to refresh at some point — and now he is getting the new version of the app.
Wait, the new version? But why?
index.html
is in the cache; why are we getting the new version? So this is related to the redirect we are doing when serving SPA from a server.SPA’s handling the app navigation on the client-side, every navigation to a new route Angular router changed the URL in the address bar dynamically. When a user types in the browser address bar a route of our app, say
https://some-domain.com/home,
and clicks enter, we actually don’t have this route in our server, but instead of returning 404, we are configuring the server to return ourindex.html
, Angular router is doing the job and directing the user to the right screen.Now we can understand what is going on in our scenario. If we enter our app from the main URL, we are getting the old version — since we have this endpoint on the server, the browser can cache it. But if we are asking for a route, we will always get the
index.html
from the server and not from the cache — so we will see the new version after refresh.Nice, isn’t it? and it’s all because of a one
index.html
in cache.
How do we check it is cached or not?
The blog [2] also introduces how to check whether the doc of the frontend application is cached or not in the browser. It is the cache-control
in the response header (and others) that controls whether the doc need be cached or not.
It’s pretty easy to find out if the
index.html
file is configured to be cached or not.1, Open the browser Dev-tools.
2, Go to the Network tab.
3, Ensure the Disable cache checkbox is not marked.
4, Filter to Doc.
5, Refresh the screen.
6, Click on the first document.
7, Check the
cache-control
header.Let’s take, for example, the angular.io/docs site:
We can see that the
cache-control
header isno-cache
. Which values in thecache-control
header are good for us and which are not?*
no-cache
— This will cache ourindex.html
file, but tell any cache system to check if there is a newer version in the server. We are Ok with that.*
no-store
— This will tell any cache system not to cache theindex.html
file — also good.*
max-age=0
— This is also won't cache theindex.html
.*
max-age=31536000
— This isn't good. The value in the max-age represents seconds — ourindex.html
will be cached for a year. It’s really up to you which values are ok by you, but I think we can agree we don't want theindex.html
in the cache for a full year.Those are the popular values for the
cache-control
header; if you see something else in your response, you can check it out [3] HTTP/1.1 theCache-Control
header
Options to disable cache on HTTP/1.0 and IE6
Till now, we have four HTTP versions, HTTP/0.9, HTTP/1.0, HTTP/1.1, and HTTP/2.0. Today the version in common use is HTTP/1.1 and the future will be HTTP/2.0.
HTTP/1.0 is still being used in some old browsers, proxies, or network tools (like wget). The following header could be used to disable cache for HTTP/1.0 :
Pragma: no-cache
Expires: 0
According to [4] HTTP/1.0,
The
Pragma
general-header field is used to include implementation-specific directives that may apply to any recipient along the request/response chain. When the “no-cache
" directive is present in a request message, an application should forward the request toward the origin server even if it has a cached copy of what is being requested.The
Expires
entity-header field gives the date/time after which the entity should be considered stale. A value of zero (0) or an invalid date format should be considered equivalent to an “expires immediately.”
The above option should also work on Internet Explorer 6 as well, although Microsoft did not strictly follow HTTP standard at that time.
Options to disable cache on HTTP/1.1
[3] HTTP/1.1 the Cache-Control
header has given a few options to disable browser cache:
Option 1: cache-control : no-cache
, recommended by the blog[2]. It still caches, but requires the browser to always check the new version from the server.
Option 2: cache-control : max-age=0
It caches for 0 second, which means no cache
Option 3: cache-control : no-store
It disables cache completely, stricter than option 1&2.
Meanwhile, the directive must-revalidate
is also recommended. When it is enabled, the cache must verify the status of stale resources before using them. Expired resources should not be used.
In general, a good solution to disable browser cache for HTTP/1.1 would be:
Cache-Control: no-store, must-revalidate
Discussions on HTTP/2.0
[6] HTTP/2.0 mentioned that
Caching responses that are pushed is possible based on the guidance provided by the origin server in the Cache-Control header field.
Hence, we can assume HTTP/2.0 still supports cache-control header and its usage.
How big organisations deal with browser cache?
It would be a good idea to learn from big organisations since they are dealing with a big number of users from varieties of backgrounds and Internet scenarios.
Amazon.com.au
The response headers of the user account doc of amazon.com.au
Cache-Control: no-cache
pragma: no-cache
expires: -1
Nab.com.au
The response headers of the login doc of nab.com.au
Cache-Control: max-age=0, no-cache, no-store
pragma: no-cache
expires: Thu, 01 Jan 1970 00:00:00 GMT
Alibaba.com
The response headers of the login doc of alibaba.com
Cache-Control: no-cache
Cache-Control: no-store
pragma: no-cache
HSBC online bank
The response headers of the login doc of HSBC online bank
Cache-Control:max-age=1,s-maxage=0,no-store,must-revalidate,private Cache-Control:post-check=0, pre-check=0
pragma: no-cache
expires: Sat, 06 May 1995 12:00:00 GMT
Summary
It appears that those big organisations all have different settings to deal with the browser cache issue for their unique client lists.
Overall, the following solution could be the strictest, and also should be accepted by most of the browsers:
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Hope it helps those developers who are fighting with browser cache issues.
Reference
[1] Invalidate files using the CloudFront console
[2] Avoid cache trap when serving Angular app
[3] HTTP/1.1 the Cache-Control
header
[4] HTTP/1.0
[5] HTTP/1.1
[6] HTTP/2.0
Appendix
1, Three types of web applications:
- Static web application
The HTML pages of a static web application were pre-generated and are stored as such on the server. It is called ‘static’ because the server sends the HTML file as-is to your browser. There are still tons of static web applications available on the internet, like some marketing pages and simple personal blogs.
- Dynamic web application
The HTML pages of a dynamic web application were created dynamically at the server side, with the help of a server-side programming language. One of the most popular server-side language is PHP. And the famous blog-hosting provider, WordPress, uses PHP to dynamically generate varieties of different HTML pages and their content.
- Single Page Application (SPA)
The SPA server returns one single, pre-generated HTML page, which in turn contains JavaScript code that changes the page dynamically in the browser. Since the JavaScript code can be so huge and critical, and it is sometimes called a frontend application, separated from the backend/server-side application.
Some common SPA frameworks are Angular, React and Vue.