LmCast :: Stay tuned in

Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete

Recorded: Jan. 22, 2026, 11:03 a.m.

Original Summarized

sweepai/sweep-next-edit-1.5B · Hugging Face

Hugging Face

Models

Datasets

Spaces

Community

Docs

Enterprise

Pricing

Log In

Sign Up

sweepai

/
sweep-next-edit-1.5B

like
63

Follow

Sweep AI
28

GGUF

code

autocomplete

next-edit

License:
apache-2.0

Model card



Files
Files and versions

xet

Community


Deploy

Use this model

Sweep Next-Edit 1.5B (GGUF)
Model Description

Usage

Model Details

Example

Links

License

Sweep Next-Edit 1.5B (GGUF)

A 1.5B parameter model for next-edit autocomplete, quantized to Q8_0 GGUF format.

Model Description

Sweep Next-Edit predicts your next code edit before you make it. It runs locally on your laptop in under 500ms (with speculative decoding) and outperforms models over 4x its size on next-edit benchmarks.

Usage

Download run_model.py and the model file, then:
uv pip install llama-cpp-python huggingface_hub
python run_model.py

Model Details

Format: GGUF (Q8_0 quantization)
Parameters: 1.5B
Context Length: 8192 tokens
Base Model: Qwen2.5-Coder

Example

The model uses a specific prompt format with file context, recent diffs, and current state to predict the next edit. See run_model.py for a complete example.

Links

Blog Post - Technical details and benchmarks
JetBrains Plugin - Sweep AI JetBrains Plugin

License

Apache 2.0

Downloads last month71

GGUF
Model size
1B params
Architecture
qwen2

Hardware compatibility
Log In

to view the estimation
8-bit

Q8_0

1.54 GB

Inference Providers
NEW

This model isn't deployed by any Inference Provider.
🙋

Ask for provider support

System theme

Company
TOS
Privacy
About
Careers

Website
Models
Datasets
Spaces
Pricing
Docs

Sweep Next-Edit 1.5B, offered through Hugging Face, represents a locally executable AI model designed for next-edit autocomplete functionality. This model, presented in the GGUF format with a quantization of Q8_0, is optimized for performance, demonstrating impressive capabilities relative to substantially larger models. Its core purpose is to anticipate and suggest the next code edit a user intends to make, operating autonomously on a user's laptop with inference times often under 500 milliseconds, facilitated by speculative decoding techniques. The model's architecture is based on the Qwen2.5-Coder base model, utilizing 1.5 billion parameters.

The operational mechanics of Sweep Next-Edit 1.5B rely on a defined prompt structure, incorporating several key contextual elements. Specifically, the model demands a file context, reflecting the relevant source code files, recent code diffs—changes made to the code recently—and the current state of the coding environment. This comprehensive input allows the model to generate a prediction of the immediate subsequent edit. The model’s efficiency is underscored by its GGUF format and Q8_0 quantization, which significantly reduces memory requirements and enhances computational speed compared to unquantized counterparts.

The model’s performance benchmarks indicate a notable advantage over models with substantially more parameters. Specifically, Sweep Next-Edit 1.5B outperforms models exceeding four times its size—1.5 billion parameters—on next-edit benchmarks. This suggests a highly efficient architecture and training methodology, prioritizing predictive accuracy within a compact model footprint. The utilization of speculative decoding also contributes substantially to the speed of inference.

The technical details of the model include a context length of 8192 tokens, enabling it to process and consider a considerable amount of code history and context when formulating its predictions. The model's distribution is through the GGUF format, which is designed to be easily deployed on a variety of hardware platforms. The files delivered with the model include `run_model.py`, a Python script that facilitates running the model using the `llama-cpp-python` and `huggingface_hub` libraries. This script provides a framework for input preparation and output processing.

The model’s deployment and application are further supported by a JetBrains Plugin, allowing seamless integration within the JetBrains IDE ecosystem. The model’s legal framework is defined by the Apache 2.0 license, offering users considerable freedom to use, modify, and distribute the model, subject to the terms of that license. Data downloads for the model reached 71 last month, reflecting growing interest and adoption within the developer community. Currently, Sweep Next-Edit 1.5B isn’t deployed by any Inference Provider.