LmCast :: Stay tuned in

Perceptual Image Codec: What Matters in Practical Learned Image Compression

Recorded: May 24, 2026, 2:01 p.m.

Original Summarized

What Matters in Practical Learned Image Compression

What Matters in PracticalLearned Image Compression

arXiv

Data

BibTeX

Kedar Tatwawadi
Parisa Rahimzadeh
Zhanghao Sun
Zhiqi Chen
Ziyun Yang
Sanjay Nair
Divija Hasteer
Oren Rippel

Apple

About

We introduce PICO (Perceptual Image Codec) — the first learned codec that is both practical, and optimized directly for the human visual system. To derive it, we perform a comprehensive study of modeling choices for practical learned codecs, and search over millions of model configurations to jointly optimize over perceptual quality and on-device runtime.

Based on large-scale subjective user studies, PICO provides 2.3-3× bitrate savings against AV1, AV2, VVC, ECM and JPEG-AI, and 20-40% bitrate savings against the best learned codec alternatives. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as 230ms, and decodes them in 150ms — faster than most top ML-based codecs run on a V100 GPU. Different from most learned codecs, PICO furthermore comes with cross-platform robustness guarantees.

PICO (Ours)

Interactive comparison across different images. PICO (Ours) is fixed on the left. Select an image and comparison method from the overlay buttons, then drag the slider to compare. Best viewed on a large screen.

Comparisons of state-of-the-art traditional and learned codecs across different considerations of practicality.

Comparisons of state-of-the-art traditional and learned codecs. Perceptual BD-rates are based on human ratings from a large-scale subjective study. Speed benchmarks on iPhone 17 Pro Max use identical compiler optimizations.

Citation
If you find our work useful, please cite:

@article{tatwawadi2026pico,
title={What Matters in Practical Learned Image Compression},
author={Tatwawadi, Kedar and Rahimzadeh, Parisa and Sun, Zhanghao and Chen, Zhiqi and Yang, Ziyun and Nair, Sanjay and Hasteer, Divija and Rippel, Oren},
journal={arXiv preprint arXiv:2605.05148},
year={2026}
}

Copyright © Apple Inc. All rights reserved.

The authors introduce PICO, the Perceptual Image Codec, which is presented as the first learned codec specifically designed to be practical and optimized directly for the human visual system. This development stems from a comprehensive investigation into the modeling choices for practical learned codecs, involving a search across millions of model configurations to simultaneously optimize for perceptual quality and on-device runtime. Based on extensive large-scale subjective user studies, PICO demonstrates significant compression advantages. Specifically, it achieves 2.3 to 3 times the bitrate savings compared to established codecs such as AV1, AV2, VVC, ECM, and JPEG-AI, and provides 20 to 40 percent bitrate savings when benchmarked against the best available learned codec alternatives. Furthermore, PICO offers cross-platform robustness guarantees, a feature distinct from many existing learned codecs.

In terms of practical performance, PICO exhibits superior speed on mobile devices. Benchmarks conducted on an iPhone 17 Pro Max show that PICO can encode 12 megapixel images in as fast as 230 milliseconds and decode them in 150 milliseconds. This performance is noted as being faster than most top machine learning-based codecs when executed on a V100 GPU. The comparisons of state-of-the-art traditional and learned codecs are grounded in perceptual bitrate rates derived from human ratings, ensuring that the evaluation reflects actual human perception. The speed benchmarks utilized identical compiler optimizations across all tested codecs.