AVIF has landed
Back in ancient July I released a video that dug into how lossy and lossless image compression works and how to apply that knowledge to compress a set of different images for the web. Well, that's already out of date because AVIF has arrived. Brilliant.
AVIF is a new image format derived from the keyframes of AV1 video. It's a royalty-free format, and it's already supported in Chrome 85 on desktop. Android support will be added soon, Firefox is working on an implementation, and although it took Safari 10 years to add WebP support, I don't think we'll see the same delay here, as Apple are a member of the group that created AV1.
Update: Support was added in Firefox 93, Android Chrome 89, and Safari on macOS Ventura and iOS 16.
What I'm saying is, the time to care about AVIF is now. You don't need to wait for all browsers to support it – you can use content negotiation to determine browser support on the server, or use <picture>
to provide a fallback on the client:
Also, Squoosh now supports AVIF, which is how I compressed the examples in this post.
Let's take a look at how AVIF performs against the image formats we already know and love…
F1 photo
I picked this image because it's a photo with a mixture of low frequency detail (the road) and high frequency detail (parts of the car livery). Also, there are some pretty sharp changes of colour between the red and blue. And I like F1.
Roughly speaking, at an acceptable quality, the WebP is almost half the size of JPEG, and AVIF is under half the size of WebP. I find it incredible that AVIF can do a good job of the image in just 18 kB.
Before I compare things further:
What is 'acceptable quality'?
For the majority of images on the web, my rules are:
- If a user looks at the image in the context of the page, and it strikes them as ugly due to compression, then that level of compression is not acceptable. But, one tiny notch above that boundary is fine.
- It's ok for the image to lose noticeable detail compared to the original, unless that detail is significant to the context of the image.
Context is key here. Image compression should be judged at the size it'll be presented to the user, and in a similar surrounding. If you're presenting a picture as a piece of art to be examined, quality and detail preservation become more important, but that's an edge case.
Most images I see on the web are a much higher quality than they need to be, which results in a slower experience for users. I'm generally impressed by The Guardian's use of images. Take this article. If I open the image at the top of the article and zoom in, I can see the distinctive WebP artefacts. The street has been smoothed. There's some ringing around the graffiti. But we shouldn't optimise the user experience for people who might zoom in looking for flaws. When I look at the image within the article, in the size and context it's presented, I just see someone cycling past a closed pub, which is the intent of the image. The compression used there produces a small resource size, which means I saw the image quickly.
In this article, I'm optimising images as if they were appearing in an article, where their CSS width is around 50% of their pixel width, meaning they're optimised for high-density displays.
Technique
Well, 'technique' might be too strong a word. To compress the images I used Squoosh. I zoomed the image out to 50%, dragged the quality slider down until it looked bad, then moved it back a bit. If the codec had an 'effort' setting, I set it to maximum. I also used one or two advanced settings, and I'll point those out along the way.
But these are just my reckons. I'm comparing the images using the human balls of eye I keep safely inside my skull, rather than any kind of algorithm that tries to guess how humans perceive images. And of course, there are biases with human perception.
In fact, when I showed this article to Kornel Lesiński (who actually knows what he's talking about when it comes to image compression), he was unhappy with my F1 comparison above, because the DDSIM score of the JPEG is much lower than the others, meaning it's closer in quality to the original image, and… he's right.
I struggled to compress the F1 image as JPEG. If I went any lower than 74 kB, the banding on the road became really obvious to me, and some of the grey parts of the road appeared slightly purple in a noticeable way, but Kornel was able to tweak the quantization tables in MozJPEG to get a better result:
Although I'm happy to spend time manually compressing key images of a web site, I don't really have the skills to tweak a JPEG encoder in that way. So the results in this post are also a reflection of what the codec can do with my moderate talent and perseverance.
I also realise that manually tuning codec settings per image doesn't scale. If you need to automate image compression, you can figure out the settings manually from a few representative images, then add a bit of extra quality for safety, and use those settings in an automated tool.
I'm providing the full output for each image so you can make your own judgement, and you can try with your own images using Squoosh.
*cough* Sorry about all that. Just trying to get ahead of the "what if"s, "how about"s, and "well actually"s.
Back to the F1 image
Let's take a closer look and see how the codecs work:
The fine detail of the road is lost in all of the compressed versions, which I'm ok with. However, you can see the difference in detail Kornel was talking about. Look at the red bodywork in the original, there are three distinct parts – the mirror, the wing connecting the bargeboard, and the top of the sidepod. In the AVIF, the smoothing removes the division between these parts, but they're still mostly there in the JPEG, especially the 74 kB version.
In the JPEG version you can also see the individual 8x8 blocks of the DCT, but they aren't obvious when zoomed out. WebP avoids this blockiness using decoding filters, and by, well, just being better. AVIF does much better at preserving sharp lines, but introduces smoothing. These are all ways of reducing data in the image, but the artefacts in AVIF are much less ugly.
If you're thinking "wait, what's he talking about? The AVIF is really blocky around the red/blue", well, chances are you're looking at it in Chrome 85. There's a bug in the decoder when it comes to upscaling the colour detail. This is mostly fixed in 86, although there are some edge cases where it still happens.
If you want more details on how lossy codecs work, check out my talk starting at 4:44.
At equal file sizes
One way to make the differences between the codecs really obvious is to test them at roughly the same file size:
I couldn't even get the JPEG and WebP down to 18 kB, even at lowest settings, so this isn't a totally fair test. The JPEG suffers from awful banding, which started to appear as soon as I went below 74 kB. The WebP is much better, but there's still noticeable blockiness compared to the AVIF. I guess that's what a decade or two of progress looks like.
Conclusion
Unless it's automated, offering up 3 versions of the same image is a bit of a pain, but the savings here are pretty significant, so it seems worth it, especially given the number of users that can already benefit from AVIF.
Here's a full-page comparison of the results.
Ok, next image:
Flat illustration
This is an illustration by Stephen Waller. I picked it because of the sharp edges and solid colours, so it's a good test of lossless compression.
The image doesn't look like it has a lot of colours, but due to the antialiasing around the edges, it has thousands. I was able to reduce the colours to 68 before things started looking bad. This makes a huge difference for WebP lossless and PNG, which switch to 'paletted' mode when there are 256 colours or fewer, which compresses really well.
In the same way AVIF is derived from the keyframes of AV1 video, WebP's lossy compression is based on the keyframes of VP8 video. However, lossless WebP is a different codec, written from scratch. It's often overlooked, but it outperforms PNG every time.
I don't have the original vector version of this image, but I created a 'traced' SVG version using Adobe Illustrator to get a very rough feel for how SVG would perform.
What's notable is how badly AVIF performs here. It does have a specific lossless mode, but it isn't very good.
But wait…
Why not lossy?
I went straight for palette reduction and lossless compression with this image, because experience has taught me lossy compression always does a bad job on these kinds of images. Or so I thought…
Turns out lossy AVIF can handle solid colour and sharp lines really well, and produces a file quite a bit smaller than the SVG. I disabled chroma subsampling in the AVIF to keep the colours sharp.
A closer look
I expected a lossy codec to destroy the edges, but it looks great! There's a very slight bit of blurring above the glasses of the guy on the left, and on the ear of the guy on the right. If anything, AVIF has introduced some sharpening – see the left-hand side of the glasses. That kind of sharpening is usually produced by palette reduction, but here it's just how AVIF works due to the directional transforms and filters.
The PNG and WebP have sharp edges particularly around the green shirt due to the palette reduction, but it isn't really noticeable at normal size.
Of course, the SVG looks super sharp due to vector scaling, but you can see where the tracing lost details around the hair and pocket of the guy on the right.
At equal file sizes
Let's push the other codecs down to the size of the AVIF:
Things aren't as bad as they were with the F1 image, but the JPEG is very noisy and changes the colours significantly, the WebP is blurry, and the PNG shows that, well, you need more than 8 colours.
Conclusion
AVIF has kinda blown my mind here. It's made me reconsider the kinds of images lossy codecs are suited to.
But all said and done, a proper SVG is probably the right choice here. But even if SVG couldn't be used, the difference between the PNG and AVIF is only a few kB. In this case it might not be worth the complexity of creating different versions of the image.
Here's a full-page comparison of the results.
Right, it's time for the next image…
Heavy SVG
I find it incredible that this image was created with SVG. However, it comes at a cost. The number of shapes and filters involved means it takes a lot of CPU for the browser to render it. It's one of those edge cases where it's better to avoid the original SVG, even if the alternative is larger.
PNG struggles here due to the smooth gradients. I reduced the colours to 256, but I had to dither them to avoid visible banding, which also hurt compression.
WebP performs significantly better by mixing lossy compression with an alpha channel. However, the alpha channel is always encoded losslessly in WebP (except for a bit of palette reduction), so it suffers in a similar way to PNG when it comes to the transparent gradient beneath the car.
AVIF aces it again at a significantly smaller size, even compared to the SVG. Part of AVIF's advantage here is it supports a lossy alpha channel.
A closer look
When zoomed into the PNG, you can see the effects of the palette reduction. The WebP is getting blurry, and suffers from some colour noise.
The AVIF looks similar to the WebP, but at a much smaller size. Interestingly, the AVIF just kinda gives up drawing the bonnet, but it's hardly noticeable when it's zoomed out.
At equal file sizes
As always, let's push the other formats down to the size of the AVIF:
The PNG version looks kinda cool! Whereas the WebP version makes me want to clean my glasses.
Conclusion
Yeahhhh going from 86/50 kB down to 13 kB is a huge saving, and worth the extra effort.
Here's a full-page comparison of the results.
Ok, one more:
Illustration with gradients
This is another one from Stephen Waller. I picked this because it has a lot of flat colour and sharp lines, which usually points to lossless compression, but it also has a lot of gradients, which lossless formats can struggle with.
Even if I take the colours down to 256 and let WebP work its lossless magic, the result is still 170 kB. In this case, the lossy codecs perform much better.
I disabled chroma subsampling for the JPEG and AVIF, to keep the colours sharp. Unfortunately lossy WebP doesn't have this option, but it has "Sharp YUV", which tries to reduce the impact of the colour resolution reduction.
JPEG doesn't do a great job here – anything lower than 80 kB starts to introduce obvious blockiness. WebP handles the image much better, but again I'm staggered by how well AVIF performs.
A closer look
The JPEG is pretty noisy when zoomed in, and you can start to see the 8x8 blocks in the background.
With the reduced-palette WebP, you can start to see the effects of palette reduction, especially in the elf's hat.
The lossy WebP is pretty blurry, and suffers from colour artefacts, which are a side-effect of "Sharp YUV".
The AVIF has really clean colours, but some blurring, and even changes some of the shapes a bit – the circle looks almost octagonal due to the edge detection. But c'mon, 12 Kb!
At equal file sizes
For one last time, let's push the other codecs down to AVIF's size:
At these sizes, JPEG has done its own art, and the WebP looks blocky and messy.
Conclusion
In this case, WebP offers a huge drop in size compared to the JPEG, so it's definitely worth providing the WebP to browsers that support it. However, the difference between the WebP and AVIF isn't huge, so it might not be worth creating an AVIF too.
Here's a full-page comparison of the results.
So, is AVIF the champion?
I was initially sceptical of AVIF – I don't like the idea that the web has to pick up the scraps left by video formats. But wow, I'm seriously impressed with the results above. That said, it isn't perfect.
Progressive rendering
Because AVIF is an off-cut of a video format, it's missing some useful image features and optimisations that aren't relevant to video:
The above shows a high-resolution (2000x1178), high-quality image loading at 2g speeds. To get roughly the same quality, the JPEG is 249 kB, the WebP is 153 kB, and the AVIF is 96 kB.
Although they're all loading at the same rate, the much-larger JPEG feels faster because of how it renders in multiple passes. WebP renders from top to bottom, which isn't as good, but at least you see the progress. Unfortunately, with AVIF it's all-or-nothing.
Video doesn't need to render a partial frame, so it isn't something the format is set up to do. It's possible to have a top-to-bottom render like WebP, but the implementation would be complicated, so we're unlikely to see it in browsers in the foreseeable future.
Because of this, AVIF feels better suited to smaller quicker-loading images. But that still covers most images on the web.
Maybe this could be solved if the format could provide a way to embed a 'preview' version of the image at the start of the file. The browser would render this if it doesn't have the rest of the file. Because it's a different image, the developer would get to choose the quality, resolution, and even apply filters like blurring:
Adding 5 kB to a big image like this seems worth it to get a low-quality early render. Here's what it would look like:
I've proposed this to the AVIF spec folks.
Encoding time
Encoding AVIF takes a long time in general, but it's especially bad in Squoosh because we're using WebAssembly, which doesn't let us use SIMD or multiple threads. Those features are starting to arrive to standards and browsers, so hopefully we'll be able to improve things soon.
At an 'effort' of 2, it takes a good few seconds to encode. 'Effort' 3 is significantly better, but that can take a couple of minutes. 'Effort' 10 (which I used for images in this article) can take over 10 minutes to encode a single image.
AVIF supports tiling images, which chops the image into smaller blocks that can be encoded and decoded separately. This is interesting for encoding, because it means the blocks can be encoded in parallel, making full use of CPU cores, although Squoosh doesn't take advantage of this yet.
The command line tools are orders of magnitude faster. You can either compile libavif yourself, or on OSX, install it via Homebrew:
brew install joedrago/repo/avifenc
There's also a Rust implementation, cavif.
My current workflow is to use Squoosh to figure out decent settings at 'effort' 2, then use libavif to try the same settings at 'effort' 10. Hopefully we can speed up the Squoosh version soon!
Decoding time
There's also a question of CPU usage vs other formats when it comes to decoding, but I haven't dug into that yet. Although AV1 is starting to get hardware support, I'm told that dedicated hardware will be tuned for video, and not so great at decoding a page full of images.
What about JPEG-XL and WebPv2?
One of the reasons we built Squoosh is so developers could bypass the claims made about particular codecs, and instead just try it for themselves. JPEG-XL isn't quite ready yet, but we'll get it into Squoosh as soon as possible. In the meantime, I'm trying to take JPEG-XL's claims of superiority with a pinch of salt. However, there's a lot to get excited about.
JPEG-XL is an image format, rather than an off-cut of a video format. It supports lossless and lossy compressions, and progressive multi-pass rendering. It looks like the lossless compression will be better than WebP's, which is great. However, the lossy compression is tuned for high quality rather than 'acceptable quality', so it might not be a great fit for most web images. But, the benefit of multi-pass rendering might mean it's worth taking a hit when it comes to file size. I guess we'll wait and see!
There aren't many details around WebPv2 yet, so again it's best to wait and see until we can test it with our own images.
And that's it!
Phew! I didn't expect this post to get so long. I wanted to include a dive into the more obscure settings these codecs offer, but I'll save that for another day.
I really enjoyed building the demos for this article. In case you want to dig into the details:
- I built a Preact component to handle image loading and decoding, so AVIF/WebP works even without browser support. A worker handles the actual decoding, using the WebAssembly decoders from Squoosh. I'd usually use comlink to help with worker communication, but lack of worker-module compatibility meant I went for something smaller/hackier instead.
- I wanted the demos on this page to be part of the static build to avoid layout shifting, but I didn't want to re-render the whole page with JS (a pattern you see a lot with things like Gatsby and Next.JS). I hacked together a solution where my markdown contains
<script type="component">
, which is replaced with the HTML for that component when the markdown is parsed, and becomes live on the client. - The full page compare view uses the two-up and pinch-zoom web components from Squoosh.
- Here's the progressive image loading demo. It uses a
TransformStream
in a service worker to throttle the image data. - For the talk rather than this article, I built a tool that lets you experiment with chroma subsampling.
- Also from the talk, I built a tool to visualise the DCT patterns that form an 8x8 block.
Thanks to Kornel Lesiński, Surma, Paul Kinlan, Ingvar Stepanyan, and Sam Jenkins for proof-reading and fact checking! And since publishing, thanks to Hubert Sablonnière and Mathias Bynens for more typo-busting.