LmCast :: Stay tuned in

1-Bit Bonsai Image 4B Image Generation for Local Devices

Recorded: May 31, 2026, 4 p.m.

Original Summarized

PrismML — Introducing 1-bit and Ternary Bonsai Image 4B: Image Generation for Local Devices

AboutBlogNewsCareersContactLAUNCH1001011100 001011 11010Back to all postsIntroducing 1-bit and Ternary Bonsai Image 4B: Image Generation for Local DevicesMay 26, 2026•PrismMLImages generated from Ternary Bonsai Image 4BToday we’re releasing Bonsai Image 4B, a family of compact image-generation models designed to run high-quality diffusion inference on local hardware: from laptops to phones.Bonsai Image 4B comes in two variants:1-bit Bonsai Image 4B uses binary {−1, +1} transformer weights with an FP16 group-wise scaling factor, giving 1.125 effective bits per weight. It targets maximum compression and is the right fit when memory pressure, bandwidth, and the deployment footprint are the primary constraints.Ternary Bonsai Image 4B uses {−1, 0, +1} transformer weights with an FP16 group-wise scaling factor, giving 1.71 effective bits per weight. The additional zero state gives the model more representational flexibility, improving visual quality and prompt fidelity while remaining extremely compact.The result is a new deployment regime for image generation: capable outputs, open weights, and practical local inference on devices that were previously out of reach for this class of model. To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.Built for local generationImages generated from 1-bit Bonsai Image 4BLocal image generation starts with a hard constraint: the model has to fit within the device’s memory budget.For a 4B-class image model, the diffusion transformer is the largest part of the model and the part that runs repeatedly during generation. Each denoising step invokes the transformer again, so transformer size directly shapes memory pressure, bandwidth demand, and local inference speed.Bonsai Image 4B is built from the FLUX.2 Klein 4B. It keeps the architecture intact but changes how the transformer weights are represented. By moving those weights into binary and ternary form, Bonsai reduces the part of the image pipeline that matters most for local deployment.

Model
Diffusion Transformer
Reduction vs FP16

FLUX.2 Klein 4B
7.75 GB
1.0x

1-bit Bonsai Image 4B
0.93 GB
8.3x

Ternary Bonsai Image 4B
1.21 GB
6.4x

Table I: Diffusion transformer footprint for models.The binary layers provide roughly a 14x reduction relative to full-precision transformer weights. A small set of precision-sensitive supporting tensors (~5%), called the projection layers, remains in FP16 so the final 1-bit Bonsai Image 4B transformer is 0.93 GB: an 8.3x reduction from the 7.75 GB full-precision FLUX.2 Klein 4B. The ternary variant follows the same structure. Its ternary layers provide roughly a 10x reduction and the final Ternary Bonsai Image 4B transformer is 1.21 GB, a 6.4x reduction from the full-precision transformer. It is slightly larger than the 1-bit model, but the additional zero state improves visual quality and prompt fidelity.Including the compressed text encoder and FP16 VAE, the Apple Silicon deployment payload is 3.42 GB for 1-bit Bonsai Image 4B and 3.88 GB for Ternary Bonsai Image 4B. For comparison, the full precision FLUX.2 Klein 4B requires a deployment payload of 15.97 GB. Since, at runtime, the text encoder is offloaded after prompt encoding, the mean memory usage is smaller than the total payload. When generating a 512x512 image, the mean-active memory is 1.5 GB and 1.96 GB, for the binary and ternary models, compared to 11.74 GB for the original FLUX.2 Klein 4B (a reduction of 7.8x and 6.0x, respectively). For a 1024x1024 image, the mean-active memory is 1.95 GB and 2.38 GB, for the binary and ternary models, compared to 14.39 GB for the original FLUX.2 Klein 4B (a reduction of 7.4x and 6.0x, respectively).This reduction in memory footprint changes where the model can run. Our deployment stack supports Apple Silicon iPhones, iPads and Macs and CUDA GPUs, using MLX low-bit paths on Apple hardware and Gemlite low-bit GEMM kernels on CUDA. On iPhone 17 Pro Max, the full-precision FLUX.2 Klein 4B pipeline does not fit within the device memory budget, while both Bonsai Image variants run on-device.

Video I: Image generation on Bonsai StudioIn practice, Bonsai Image 4B generates a 512x512 image in 9.4 seconds on an iPhone 17 Pro Max and about 6 seconds on Mac M4 Pro. On Mac M4 Pro, Bonsai Image 4B is up to 5.6x faster than the stock full-precision MFLUX pipeline.Benchmarking performanceCompression only matters if the model remains useful. We evaluated Bonsai Image 4B across three complementary benchmarks: GenEval for object composition and attribute binding; HPSv3 human preference and aesthetic quality; DPG-Bench dense prompt following and semantic faithfulness.Qualitative comparison across Bonsai Image and FLUX.2 Klein 4B models.

Model
DiffusionTransformerFootprint (GB)
GenEval
HPSv3
DPG-Bench
Size reductionrelative toFLUX.2 Klein 4B
Performancerelative toFLUX.2 Klein 4B

1-bit Bonsai Image 4B
0.93
0.671
11.15
0.822
8.3x
88%

Ternary Bonsai Image 4B
1.21
0.723
12.22
0.851
6.4x
95%

FLUX.2 Klein 4B
7.75
0.819
12.84
0.853
1x
100%

SDXL
5.14
0.3
10.05
0.74
1.5x
67%

BK-SDM-Small
0.98
0.297
3.05
0.559
7.9x
42%

Stable Diffusion 1.5
1.72
0.396
4.2
0.601
4.5x
51%

PixArt-Σ XL 2
1.2
0.541
11.93
0.769
6.4x
83%

Table II: Image quality benchmark comparison across Ternary Bonsai Image 4B and other models.Ternary Bonsai Image 4B is the quality-oriented variant. At 1.21 GB, it retains 95% of the FLUX.2 Klein 4B accuracy across GenEval, HPSv3, and DPG-Bench, while reducing the diffusion transformer footprint by 6.4x.1-bit Bonsai Image 4B is the footprint-oriented variant. It brings the diffusion transformer below 1 GB, an 8.3x reduction, while still delivering strong benchmark scores across the same three evaluations (it retains 88% of the accuracy of FLUX.2 Klein 4B).Together, the two variants move the quality–footprint frontier. Bonsai Image remains competitive with modern 4B-class image models while using a fraction of their diffusion-transformer footprint. At the same time, it substantially outperforms smaller models with similar memory footprints. That is the same Pareto shift we have seen in our prior Bonsai language models. Bonsai Image brings modern diffusion-transformer behavior into a memory range that previously belonged to much smaller, lower-capability models.Why this is importantImage generation is not only a model-quality problem. It is also a deployment problem.Cloud APIs will continue to be the right choice for many products. But cloud-only generation imposes certain product constraints: every prompt is a remote request, every iteration carries marginal serving cost, and every interaction adds round-trip latency.That matters because image generation is naturally iterative. Users rarely stop at one image. They revise prompts, compare outputs, generate variations, discard failures, and try again. When each attempt is a server-side job, the creative loop becomes something users have to meter and wait for. Local inference changes that. Once the model fits on the device, generation can sit directly inside the product experience. It becomes cheaper to run, faster to iterate on, and easier to use in environments where prompts, and generated assets should remain private.Bonsai Image 4B is a step toward that deployment regime: capable image generation running closer to the user, on hardware they already own.Images generated from Ternary Bonsai Image 4BAvailabilityBoth 1-bit and Ternary Bonsai Image 4B will be released with open weights and code under the Apache 2.0 license.With this launch, we are also launching Bonsai Studio, its iOS app for trying Bonsai Image 4B directly on iPhone.Join UsPrismML emerged from a team of Caltech researchers and was founded with support from Khosla Ventures, Cerberus and Google. We’ve spent years tackling one of the field’s hardest problems: compressing neural networks without sacrificing their reasoning ability.If you want to help build the next generation of state-of-the-art AI, we’d love to hear from you. Check out our careers page.ResourcesWhitepaperHugging FaceWebGPU demoBonsai Studio for iPhoneGitHubBack to all postsAnnouncing 1-bit Bonsai: The First Commercially Viable 1-bit LLMsMarch 31, 2026Today, we are announcing 1-bit Bonsai models that bring advanced intelligence to the devices where people actually live and work.PrismML Launches World's First 1-Bit AI Model to Redefine Intelligence at the EdgeMarch 31, 2026PrismML, a pioneer in high-performance AI models, today emerged from stealth to introduce the world’s first commercially viable 1-bit large language models, built on groundbreaking research developed at Caltech.ResourcesWhitepaperBonsai Studio for iPhoneModelsHugging FaceWebGPU demoBonsai Image demoGitHubFollowXDiscordLinkedInAboutCareersContact© 2026 Prism ML, Inc. All rights reserved.Terms & ConditionsPrivacy Policy

PrismML introduces Bonsai Image 4B, a family of compact image-generation models engineered to facilitate high-quality diffusion inference directly on local hardware, ranging from laptops to mobile devices. This family includes two primary variants: 1-bit Bonsai Image 4B and Ternary Bonsai Image 4B, which achieve efficiency gains by fundamentally restructuring how transformer weights are represented. The 1-bit variant utilizes binary {−1, +1} transformer weights employing an FP16 group-wise scaling factor, resulting in approximately 1.125 effective bits per weight, prioritizing maximum compression suitable for environments constrained by memory, bandwidth, and deployment footprint. Conversely, the Ternary variant employs {−1, 0, +1} transformer weights with an FP16 group-wise scaling factor, yielding 1.71 effective bits per weight. The inclusion of the zero state in the ternary representation grants the model enhanced representational flexibility, which improves visual quality and prompt fidelity while maintaining extreme compactness.

The foundation of Bonsai Image 4B is derived from the FLUX.2 Klein 4B architecture, where the core modification involves altering the representation of the transformer weights to facilitate local deployment. By converting weights into binary or ternary forms, PrismML reduces the computational footprint of the image generation pipeline, particularly the diffusion transformer which is the most resource-intensive component and executes repeatedly during the denoising steps. For example, the 1-bit variant achieves an 8.3x reduction relative to the full-precision FLUX.2 Klein 4B, while the ternary variant achieves a 6.4x reduction. This compression is achieved by encoding the majority of the transformer weights in binary or ternary form, with only a small percentage of precision-sensitive supporting tensors, such as projection layers, retained in FP16.

This reduction in memory footprint has profound implications for deployment, especially on edge devices. The deployment stack supports varied hardware, including Apple Silicon devices and CUDA GPUs, leveraging specialized low-bit paths on Apple hardware and Gemlite low-bit GEMM kernels on CUDA. This capability allows models that were previously too large to run on-device, such as the full-precision FLUX.2 Klein 4B, to execute on devices like the iPhone 17 Pro Max. For example, when generating a 512x512 image, the mean-active memory usage for the binary and ternary models is significantly lower than that of the original FLUX.2 Klein 4B, demonstrating a substantial reduction in active memory consumption.

In terms of performance, the local generation capability translates into tangible speed improvements. In practical usage, Bonsai Image 4B can generate a 512x512 image in approximately nine and a half seconds on an iPhone 17 Pro Max and about six seconds on a Mac M4 Pro. Furthermore, on the Mac M4 Pro, Bonsai Image 4B demonstrated speeds up to five and a half times faster than the standard full-precision MFLUX pipeline. This speed advantage is achieved because local inference negates the latency associated with cloud-based requests, allowing the iterative creative loop—where users revise prompts and generate variations—to occur instantaneously within the product experience rather than requiring server-side processing and round-trip latency.

The models were evaluated across benchmarks focusing on object composition and attribute binding, human preference and aesthetic quality, and dense prompt following. The Ternary Bonsai Image 4B is positioned as the quality-oriented variant; at its reduced footprint of 1.21 GB, it retains ninety-five percent of the FLUX.2 Klein 4B accuracy across the established benchmarks while reducing the transformer footprint by 6.4x. In contrast, the 1-bit Bonsai Image 4B is the footprint-oriented option, reducing the transformer to below one gigabyte, achieving an eight and a third times reduction while still maintaining eighty-eight percent of the benchmark accuracy. Together, these variants establish a new boundary for the quality-footprint trade-off, allowing image generation to be performed locally while remaining highly competitive with larger models and substantially outperforming smaller, lower-capability models that share similar memory constraints. Ultimately, Bonsai Image 4B represents a shift toward a deployment regime where capable image generation runs closer to the user on their personal hardware.