Fix the annoying web page caching issue, permanently

Fred Gu
8 min readMar 5, 2022

--

Targeted Audience: Senior front-end developer, Solution architect

Background

It is agile to keep releasing new versions of the frontend application to the users. However, it is annoying that the users’ browser does NOT get the latest version in the first place and keeps using the cached content, and eventually behaves differently from expected. It is a big headache to both developers and users.

A quick but temporary solution to fix this cache issue is to force the browser to clear the cache, by pressing Ctrl + F5. But it is too much for the users without IT background to do it.

This blog is to look at a permanent solution which can disable the browser cache and always fetch the web content from the server. And the solution should cover most of the browsers, including those ‘ancient’ ones and their variants.

Note

1, There are three types of web applications (please refer to Appendix 1), this blog just talks about the Single Page Application(SPA).

2, The cache can exist at browser, CDN, load balance or many other layers. The lack of cache refreshes at CDN side can cause the similar symptoms. However, this blog won’t talk about CDN cache or others, but just browser cache. You can find some solutions related to CDN cache, like invalidate in AWS CloudFront ([1]), or purge the content in Azure CDN.

3, This browser cache issue is more common in some specific countries, where a lot of ancient browsers like IE6 or their variants are still being used.

Why the cache is an issue?

There is a great blog [2] Avoid cache trap when serving Angular app which introduces a scenario when the new release has been deployed but the old and cached index.html in the browser still takes effect until a route is asked.

We all love cache, I think. It helps to load our apps faster, lower some load from the server, and let our users a great user experience in our app.

But sometimes, this cache is working against us and causing our users not to get the last version of the app, which is probably because of a wrong cache configuration in our server.

To avoid cache issues in our Angular application and manage app versioning, when we are building our app to production using ng build --prod Angular adds (by default) a hash to our js files and updates our index.html file to refer to the hash files. When we deploy a new version, the hash keys are changed, and when the user asks the site again, the index.html will ask to load the new files from the server. Since the browser doesn't have those files in the cache, it will get them from the server.

So it looks like Angular guys cover us? Well, not entirely. The problems are starting when the index.html file is cached. Let’s say we just deployed a new version of our app, and the static files, and index.html file with them, are cached. In this scenario, when the user is starting to use our app from the main URL, he is getting the old version of our app because the cached index.html asks to load the old js files, and they will probably load from the browser cache.

But let’s move on with the scenario and complicate it a little. The user uses our app (remember — with the old version), moving between screens, and doing some actions. He decided to refresh at some point — and now he is getting the new version of the app.

Wait, the new version? But why? index.html is in the cache; why are we getting the new version? So this is related to the redirect we are doing when serving SPA from a server.

SPA’s handling the app navigation on the client-side, every navigation to a new route Angular router changed the URL in the address bar dynamically. When a user types in the browser address bar a route of our app, say https://some-domain.com/home, and clicks enter, we actually don’t have this route in our server, but instead of returning 404, we are configuring the server to return our index.html, Angular router is doing the job and directing the user to the right screen.

Now we can understand what is going on in our scenario. If we enter our app from the main URL, we are getting the old version — since we have this endpoint on the server, the browser can cache it. But if we are asking for a route, we will always get the index.html from the server and not from the cache — so we will see the new version after refresh.

Nice, isn’t it? and it’s all because of a one index.html in cache.

How do we check it is cached or not?

The blog [2] also introduces how to check whether the doc of the frontend application is cached or not in the browser. It is the cache-controlin the response header (and others) that controls whether the doc need be cached or not.

It’s pretty easy to find out if the index.html file is configured to be cached or not.

1, Open the browser Dev-tools.

2, Go to the Network tab.

3, Ensure the Disable cache checkbox is not marked.

4, Filter to Doc.

5, Refresh the screen.

6, Click on the first document.

7, Check the cache-control header.

Let’s take, for example, the angular.io/docs site:

Cache-control of angular.io

We can see that the cache-control header is no-cache. Which values in the cache-control header are good for us and which are not?

* no-cache — This will cache our index.html file, but tell any cache system to check if there is a newer version in the server. We are Ok with that.

*no-store — This will tell any cache system not to cache the index.html file — also good.

*max-age=0 — This is also won't cache the index.html.

*max-age=31536000 — This isn't good. The value in the max-age represents seconds — our index.html will be cached for a year. It’s really up to you which values are ok by you, but I think we can agree we don't want the index.html in the cache for a full year.

Those are the popular values for the cache-control header; if you see something else in your response, you can check it out [3] HTTP/1.1 the Cache-Control header

Options to disable cache on HTTP/1.0 and IE6

Till now, we have four HTTP versions, HTTP/0.9, HTTP/1.0, HTTP/1.1, and HTTP/2.0. Today the version in common use is HTTP/1.1 and the future will be HTTP/2.0.

HTTP/1.0 is still being used in some old browsers, proxies, or network tools (like wget). The following header could be used to disable cache for HTTP/1.0 :

Pragma: no-cache
Expires: 0

According to [4] HTTP/1.0,

The Pragma general-header field is used to include implementation-specific directives that may apply to any recipient along the request/response chain. When the “no-cache" directive is present in a request message, an application should forward the request toward the origin server even if it has a cached copy of what is being requested.

The Expires entity-header field gives the date/time after which the entity should be considered stale. A value of zero (0) or an invalid date format should be considered equivalent to an “expires immediately.”

The above option should also work on Internet Explorer 6 as well, although Microsoft did not strictly follow HTTP standard at that time.

Options to disable cache on HTTP/1.1

[3] HTTP/1.1 the Cache-Control header has given a few options to disable browser cache:

Option 1: cache-control : no-cache , recommended by the blog[2]. It still caches, but requires the browser to always check the new version from the server.

Option 2: cache-control : max-age=0 It caches for 0 second, which means no cache

Option 3: cache-control : no-store It disables cache completely, stricter than option 1&2.

Meanwhile, the directive must-revalidateis also recommended. When it is enabled, the cache must verify the status of stale resources before using them. Expired resources should not be used.

In general, a good solution to disable browser cache for HTTP/1.1 would be:

Cache-Control: no-store, must-revalidate

Discussions on HTTP/2.0

[6] HTTP/2.0 mentioned that

Caching responses that are pushed is possible based on the guidance provided by the origin server in the Cache-Control header field.

Hence, we can assume HTTP/2.0 still supports cache-control header and its usage.

How big organisations deal with browser cache?

It would be a good idea to learn from big organisations since they are dealing with a big number of users from varieties of backgrounds and Internet scenarios.

Amazon.com.au

The response headers of the user account doc of amazon.com.au

Cache-Control: no-cache
pragma: no-cache
expires: -1

Nab.com.au

The response headers of the login doc of nab.com.au

Cache-Control: max-age=0, no-cache, no-store
pragma: no-cache
expires: Thu, 01 Jan 1970 00:00:00 GMT

Alibaba.com

The response headers of the login doc of alibaba.com

Cache-Control: no-cache
Cache-Control: no-store
pragma: no-cache

HSBC online bank

The response headers of the login doc of HSBC online bank

Cache-Control:max-age=1,s-maxage=0,no-store,must-revalidate,private Cache-Control:post-check=0, pre-check=0
pragma: no-cache
expires: Sat, 06 May 1995 12:00:00 GMT

Summary

It appears that those big organisations all have different settings to deal with the browser cache issue for their unique client lists.

Overall, the following solution could be the strictest, and also should be accepted by most of the browsers:

Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0

Hope it helps those developers who are fighting with browser cache issues.

Reference

[1] Invalidate files using the CloudFront console

[2] Avoid cache trap when serving Angular app

[3] HTTP/1.1 the Cache-Control header

[4] HTTP/1.0

[5] HTTP/1.1

[6] HTTP/2.0

Appendix

1, Three types of web applications:

  • Static web application

The HTML pages of a static web application were pre-generated and are stored as such on the server. It is called ‘static’ because the server sends the HTML file as-is to your browser. There are still tons of static web applications available on the internet, like some marketing pages and simple personal blogs.

  • Dynamic web application

The HTML pages of a dynamic web application were created dynamically at the server side, with the help of a server-side programming language. One of the most popular server-side language is PHP. And the famous blog-hosting provider, WordPress, uses PHP to dynamically generate varieties of different HTML pages and their content.

  • Single Page Application (SPA)

The SPA server returns one single, pre-generated HTML page, which in turn contains JavaScript code that changes the page dynamically in the browser. Since the JavaScript code can be so huge and critical, and it is sometimes called a frontend application, separated from the backend/server-side application.

Some common SPA frameworks are Angular, React and Vue.

--

--

Fred Gu

Solution Architect, Data Scientist, Full-Stack Developer, Mobile App Maker, Consultant, Project Manager, Product Owner, A Thinker, Doer and Top-performer