Flux 2 Klein pure C inference
Recorded: Jan. 19, 2026, 10:03 a.m.
| Original | Summarized |
GitHub - antirez/flux2.c: Flux 2 image generation model pure C inference Skip to content Navigation Menu Toggle navigation
Sign in
Appearance settings PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing Search or jump to... Search code, repositories, users, issues, pull requests...
Search Clear
Search syntax tips Provide feedback Include my email address so I can be contacted Cancel Submit feedback Saved searches
Name Query To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up
Appearance settings Resetting focus You signed in with another tab or window. Reload to refresh your session. Dismiss alert antirez flux2.c Public
Notifications
Fork
Star Flux 2 image generation model pure C inference MIT license 572 25 Branches Tags Activity
Star
Notifications Code Issues Pull requests Actions Projects Security Uh oh! There was an error while loading. Please reload this page. Insights
Additional navigation options
Code Issues Pull requests Actions Projects Security Insights
antirez/flux2.c
mainBranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History42 Commitsimagesimages test_vectorstest_vectors .gitignore.gitignore IMPLEMENTATION_NOTES.mdIMPLEMENTATION_NOTES.md LICENSELICENSE MakefileMakefile README.mdREADME.md download_model.pydownload_model.py flux.cflux.c flux.hflux.h flux_image.cflux_image.c flux_kernels.cflux_kernels.c flux_kernels.hflux_kernels.h flux_metal.hflux_metal.h flux_metal.mflux_metal.m flux_qwen3.cflux_qwen3.c flux_qwen3.hflux_qwen3.h flux_qwen3_tokenizer.cflux_qwen3_tokenizer.c flux_safetensors.cflux_safetensors.c flux_safetensors.hflux_safetensors.h flux_sample.cflux_sample.c flux_tokenizer.cflux_tokenizer.c flux_transformer.cflux_transformer.c flux_vae.cflux_vae.c main.cmain.c View all filesRepository files navigationREADMEMIT licenseFLUX.2-klein-4B Pure C Implementation # Download the model (~16GB) # Generate an image Generated with: ./flux -d flux-klein-model -p "A picture of a woman in 1960 America. Sunglasses. ASA 400 film. Black and White." -W 250 -H 250 -o /tmp/woman.png, and later processed with image to image generation via ./flux -d flux-klein-model -i /tmp/woman.png -o /tmp/woman2.png -p "oil painting of woman with sunglasses" -v -H 256 -W 256 Zero dependencies: Pure C implementation, works standalone. BLAS optional for ~30x speedup (Apple Accelerate on macOS, OpenBLAS on Linux) Usage 0.0 = no change (output equals input) Command Line Options Generation options: Image-to-image options: Output options: Other options: Reproducibility To reproduce the same image, use the printed seed: Building macOS Apple Silicon: make mps For make blas on Linux, install OpenBLAS first: # Fedora VAE (~300MB) Technical Details Component Transformer VAE Text Encoder Inference steps: This is a distilled model that produces good results with exactly 4 sampling steps. Phase Text encoding Diffusion Peak The text encoder is automatically released after encoding, reducing peak memory during diffusion. If you generate multiple images with different prompts, the encoder reloads automatically. Size 512x512 256x256 64x64 Notes: The C implementation uses float32 throughout, while PyTorch uses bfloat16 with highly optimized MPS kernels. The next step of this project is likely to implement such an optimization, in order to reach similar speed, or at least try to approach it. Resolution Limits int main(void) { /* Configure generation parameters. Start with defaults and customize. */ /* Generate the image. This handles text encoding, diffusion, and VAE decode. */ /* Save to file. Format is determined by extension (.png or .ppm). */ /* Clean up */ int main(void) { /* Load the input image */ /* Set up parameters. Output size defaults to input size. */ /* Transform the image */ if (!painting) { flux_image_save(painting, "painting.png"); flux_image_free(painting); 0.3 - Subtle style transfer, preserves most details Generating Multiple Images /* Generate 5 variations with different seeds */ flux_image *img = flux_generate(ctx, "A mountain landscape at sunset", ¶ms); char filename[64]; flux_free(ctx); flux_image *flux_generate(flux_ctx *ctx, const char *prompt, const flux_params *params); /* Initialize with sensible defaults */ About Flux 2 image generation model pure C inference Readme MIT license Uh oh! There was an error while loading. Please reload this page. Activity 572 1 25 Report repository Releases Packages No packages published Contributors antirez
claude
Languages C Objective-C Makefile Python Footer © 2026 GitHub, Inc. Footer navigation Terms Privacy Security Status Community Docs Contact Manage cookies Do not share my personal information You can’t perform that action at this time. |
The GitHub repository for *flux2.c* presents a pure C implementation of the FLUX.2-klein-4B image generation model, developed by Salvatore Sanfilippo (antirez) as an experiment in AI-assisted code generation and open-source software. The project aims to provide a lightweight, dependency-free inference system for diffusion models, leveraging modern AI tools like Claude Code while emphasizing accessibility and performance. The implementation is designed to operate without Python, PyTorch, or CUDA, relying solely on the C standard library with optional support for BLAS and Apple’s Metal Performance Shaders (MPS) for acceleration. The model runs directly on safetensors files, avoiding quantization and enabling float-based computations for simplicity and flexibility. The repository’s core functionality revolves around text-to-image and image-to-image generation, utilizing a rectified flow transformer architecture optimized for speed. Key components include a transformer with 5 double blocks and 20 single blocks, an AutoencoderKL-based VAE for latent space compression, and a Qwen3-4B text encoder integrated directly into the system. This encoder eliminates the need for external embedding computations, streamlining the process. The project’s design prioritizes memory efficiency by automatically releasing the text encoder after encoding, freeing approximately 8GB of memory during diffusion steps. This feature is particularly beneficial for generating multiple images with varying prompts, as the encoder reloads dynamically when required. Performance benchmarks highlight the trade-offs between different backends. On Apple M3 Max hardware, the C implementation with MPS acceleration achieves inference times of 32.4 seconds for 256x256 images and 49.6 seconds for 512x512 images, compared to PyTorch’s 3.0 and 5.4 seconds respectively. However, the C code uses float32 precision uniformly, whereas PyTorch employs bfloat16 with optimized MPS kernels. The generic C backend without BLAS or MPS is significantly slower, taking 605.6 seconds for 64x64 images, underscoring the importance of hardware acceleration. The project’s author acknowledges that future optimizations, such as adopting bfloat16 or further refining the transformer architecture, could narrow this gap. The maximum supported resolution is 1024x1024, constrained by attention mechanism memory requirements, while minimum dimensions are 64x64 pixels, with all resolutions needing to be multiples of 16 due to VAE downsampling. The repository includes both a command-line tool and a C library API for integration into custom applications. The CLI allows users to generate images via commands like `./flux -d flux-klein-model -p "A woman wearing sunglasses" -o output.png`, with options for image-to-image transformations, seed control, and resolution adjustments. The C library API enables developers to embed the model into their projects by linking against `libflux.a` and including headers like `flux.h`. Example code demonstrates loading the model, configuring parameters, generating images, and handling errors. Key functions include `flux_load_dir` for model initialization, `flux_generate` for text-to-image synthesis, and `flux_img2img` for modifying existing images. Memory management is critical, with explicit freeing of resources via `flux_free` and `flux_image_free` to prevent leaks. Development context reveals that the codebase was largely generated by Claude Code, with the author emphasizing their role in guiding design decisions and ensuring correctness. This approach reflects a broader effort to challenge the dominance of Python-centric AI toolchains, advocating for C-based systems that reduce dependency sprawl and improve accessibility. The project’s README underscores its experimental nature, noting that while the AI-generated code required human oversight, it successfully replicated core functionality from existing projects like GGML. The author also highlights the educational value of the endeavor, learning about non-trivial project management and AI collaboration. Technical details further elaborate on the model’s architecture. The transformer employs 3072 hidden dimensions and 24 attention heads, while the VAE compresses images into 128 latent channels with an 8x spatial reduction. The Qwen3-4B text encoder, with 36 layers and 2560 hidden dimensions, is optimized for efficiency but remains a significant memory consumer. The inference process involves four sampling steps, a design choice that balances speed and quality for the FLUX.2-klein-4B model. Memory requirements peak at ~16GB when both the encoder and diffusion components are active, though this is halved after the encoder is released. These constraints necessitate careful resource management, particularly on systems with limited RAM. The repository also includes a `download_model.py` script for fetching the 16GB model weights from Hugging Face, alongside Makefiles for building with different backends. Users can choose between generic C (slow), BLAS-accelerated (30x faster), or MPS-optimized (fastest on Apple Silicon) configurations. The documentation emphasizes reproducibility through seed values, which are printed to stderr and can be reused for consistent results. Error handling is robust, with functions returning NULL on failure and `flux_get_error()` providing descriptive messages for debugging. In conclusion, *flux2.c* represents a significant effort to democratize AI model inference by leveraging C’s performance and simplicity. While its current benchmarks lag behind Python-based systems, the project’s focus on minimal dependencies, open-source transparency, and AI-assisted development offers a compelling alternative for developers seeking to avoid entrenched ecosystems. The integration of advanced features like image-to-image generation and memory efficiency, combined with a clear API, positions it as a viable tool for both experimentation and practical applications. Future work may involve further optimization, support for additional backends, or exploration of hybrid approaches to balance speed and accessibility. |