A view on VP9 and AV1 part 1: specifications
The success of a video coding standard depends on many factors. Many articles try to benchmark the performance of codec implementations or make comments about the ecosystem of codecs. But I have not seen any article about the standardization process or the bitstream format. While the video quality is key, I believe the bitstream format is important too, but is probably less accessible or easy to communicate about.
Bitstream format comparison with HEVC
The VP9 bitstream format specification was frozen in June 2013. But it seems that some features (e.g. Levels) were added afterwards. AV1 derives from VP9 with most of the additions being made by the Google teams based on the developments of VP10. AV1 still moves a lot, but AV1 high-level structures likely won’t move. I expect the authors to add low level tools such as transformation types (ADCT, ADST, …) which are inferred in HEVC but can be custom in AV1; and also tools from the other codec makers (Daala from Mozilla and Thor from Cisco).
VP9 and AV1 bitstreams are quite different from AVC/HEVC in several ways:
- In VP9/AV1, each frame is complete. There is no such thing as Slices, only Tiles. As a consequence, there is no high-level abstraction like NALUs for HEVC (and AVC and derivatives) and frame fragmentation over RTP is simpler but less robust to packet loss.
- The state of a VP9/AV1 encoder or decoder can be represented by the set of reference frames plus the arithmetic context. There is no need like in AVC/HEVC to maintain which parameter set (SPS, PPS, …) is active.
- In VP9/AV1, a “Frame parallel decoding mode” allows to parse frames in parallel.
- The arithmetic coding is way simpler than AVC/HEVC. The state of the arithmetic coder can be duplicated for a group of frames.
- Reference Picture Set (RPS) is simpler in VP9.
- There is no anti-emulation as required in AVC/HEVC. This simplifies the decoding process but at the cost of not being able to transport VP9 or AV1 on MPEG-2 TS.
- The pixel reconstruction phase of VP9 requires more hardware surface than HEVC, because VP9 uses numbers with higher precision requiring more adders. And the transform coefficients are ordered in a less predicable way in VP9 (which will result in more hardware surface).
- There is no such thing as VP9 or AV1 File Format (AVC/HEVC has raw, Annex B, and canonical/MP4). One needs to use the IVF File Format or WebM.
In my view, the VP9 specification shines by its simplicity, but there is one concept that seems weird to me (although I did not follow all the codec history): Superframes. The Superframe concept seems to be a workaround on a B-frame patent. A VP9 encoder first produces a frame that won’t be displayed now, packed with a frame that is displayed now, and the encoder produces an almost empty frame (skip) that only references the previously non-displayed frame. What is also weird is that the header of the Superframe is at the end of the frame (bug filed for AV1). A possible explanation is that Superframes were introduced when some decoders were already out and that was the way to avoid breaking those decoders.
VP9 and AV1 have one strong problem: the encoding process may lead to integer overflows and these overflows cannot be predicted by the encoder before reconstructing the local coding block. Hence, in the worst case, if the overflow happens late in the frame encoding, the encoder might need to re-encode the entire frame. This may be ok for offline encoding, where multipass is used anyway, but this is unacceptable for real-time encoding. More specifically these overflows may occur when encoding residuals. Practically you may overflow at all intermediate values so you need to check them all. If a check fails, you have to re-encode with another quantizer otherwise the 18 bits intermediate registers used for quantizer at the decoder will overflow. So, by making life simple to decoders with 18 bits intermediate registers, VP9 and current AV1 make encoders life much more difficult because some possible combinations are de-facto impossible to get conformant.
Note that there is one bug tagged as ‘won’t fix’ in VP9. The funny fact is that the issue was present in AVC but solved in HEVC (so the VP9 issue was likely imported from AVC).
This part 1 focused on the VP9 and AV1 bitstream specifications. We like VP9 and AV1 for the simplicity of the design and bitstream format. But at the moment these codecs are not suitable to compete with HEVC for live for bitstream reasons. VP9 was frozen in 2013 but AV1 is still a few months away from being bitstream frozen (planned in Q1 2017). We hope AV1 contributors will keep on the effort and provide us with a strong competitor for HEVC.
Stay tuned for Part 2 where I’ll talk about considerations such as the factors of success of a codec, patents, standardization, and deployment considerations.
And of course feel free to share and comment this article 🙂