Show HN: Streambed – Stream Postgres to Iceberg on S3, Supports Postgres Wire

Recorded: May 31, 2026, 8:02 p.m.

Original

Summarized

GitHub - viggy28/streambed: Stream Postgres to Apache Iceberg on S3 via logical replication, queryable over the Postgres wire protocol. · GitHub

Navigation Menu

Toggle navigation

Appearance settings

PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersView all resourcesOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorGitHub StarsArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

viggy28

/

streambed

Public

Notifications
You must be signed in to change notification settings

Fork
4

Star
23

Code

Issues
5

Pull requests
0

Actions

Projects

Security and quality
0

Insights

Additional navigation options

Code

Issues

Pull requests

Actions

Projects

Security and quality

Insights

viggy28/streambed

mainBranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History44 Commits44 Commits.github.github cmdcmd configconfig docsdocs internalinternal scriptsscripts sitesite testtest .gitignore.gitignore .goreleaser.yml.goreleaser.yml CHANGELOG.mdCHANGELOG.md CLAUDE.mdCLAUDE.md CONTRIBUTING.mdCONTRIBUTING.md LICENSELICENSE README.mdREADME.md architecture.svgarchitecture.svg docker-compose.ymldocker-compose.yml go.modgo.mod go.sumgo.sum View all filesRepository files navigationREADMEContributingApache-2.0 licenseStreambed

Postgres-to-Iceberg CDC engine. Offload analytical queries from your production database without changing your application.
streambed streams WAL changes via logical replication, writes Parquet files to S3, and commits Iceberg metadata. Query the result with any Iceberg-compatible engine -- or use the built-in query server, which speaks the Postgres wire protocol so you can connect with psql.
See It In Action
Same analytical query on pgbench (1M accounts, 500K history rows). Postgres on the left, Streambed on the right.

No ETL. No Spark. Just Postgres + S3.
Quick Start
# Start Postgres + MinIO locally
docker compose up -d

# Build
go build -o streambed ./cmd/streambed

# Start syncing + query server on :5433
./streambed sync \
--source-url="postgres://postgres:test@localhost:5432/postgres" \
--s3-bucket="streambed" \
--s3-endpoint="http://localhost:9000" \
--s3-prefix="test" \
--query-addr=:5433

# Query your Postgres tables via Iceberg
psql -h localhost -p 5433 -U postgres -d postgres
Run streambed sync --help for all configuration options. All flags support environment variables with STREAMBED_ prefix (e.g. STREAMBED_SOURCE_URL).
Architecture

How It Works
Postgres WAL ──▶ Decode ──▶ Buffer ──▶ Parquet ──▶ S3 ──▶ Iceberg Commit
│
DuckDB ◀──┘ (query server)

Streambed connects to Postgres as a logical replication subscriber. It decodes WAL messages (inserts, updates, deletes), buffers rows per table, and periodically flushes them as Parquet files to S3 with Iceberg metadata commits. Updates and deletes use copy-on-write merging against existing Parquet data.
A query server exposes Iceberg tables over the Postgres wire protocol using embedded DuckDB, so you can query with psql or any Postgres client.
Commands

Command
What it does

streambed sync
Main daemon. Streams WAL, writes Iceberg, optionally serves queries.

streambed resync --table=public.users
One-shot backfill via COPY under a consistent snapshot.

streambed query
Standalone query server (no sync). Points at existing Iceberg tables.

streambed cleanup --table=public.users
Deletes S3 objects and state for a table. Useful before resync.

Development
Requires Go 1.22+ and CGO (for go-duckdb and go-sqlite3).
# Build
go build -o streambed ./cmd/streambed

# Unit tests
go test ./internal/... ./config/...

# Integration tests (requires Docker)
./scripts/test-integration.sh
Integration tests use the integration build tag and run against Postgres (port 5434) and MinIO (port 9002) from test/integration/docker-compose.yml.

About

Stream Postgres to Apache Iceberg on S3 via logical replication, queryable over the Postgres wire protocol.

streambed.dev/

Topics

postgres

parquet

iceberg

duckdb

Resources

Readme

License

Apache-2.0 license

Contributing

Uh oh!

There was an error while loading. Please reload this page.

Activity
Stars

23
stars
Watchers

0
watching
Forks

4
forks

Report repository

Releases
No releases published

Packages
0

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Go
99.9%

Shell
0.1%

Footer

Footer navigation

Terms

Privacy

Security

Status

Community

Docs

Contact

Manage cookies

Do not share my personal information

You can’t perform that action at this time.

The streambed project describes an engine designed to stream data from a PostgreSQL database to Apache Iceberg stored on Amazon S3, utilizing logical replication to achieve change data capture while maintaining queryability over the standard PostgreSQL wire protocol. The primary objective of this system is to offload analytical query processing from the production database environment without requiring modifications to existing applications.

The core architecture involves a pipeline where changes in the PostgreSQL Write-Ahead Log (WAL) are captured, decoded, and buffered. These buffered row changes are subsequently flushed as Parquet files onto S3, accompanied by the commitment of Iceberg metadata. Updates and deletions are managed using copy-on-write merging strategies against the existing Parquet data, ensuring consistency in the data lake format.

A key feature is the integration of a query server that exposes the resulting Iceberg tables. This query server embeds DuckDB to facilitate querying directly from the data stored in S3, allowing users to interact with the data using standard PostgreSQL clients such as psql. This ability to query the stream data through the Postgres wire protocol directly links the analytical layer back to the source database interface.

Operationally, the system is managed through distinct commands. The streambed sync command functions as the main daemon, handling the continuous streaming of WAL changes, writing the data to Iceberg, and optionally serving query requests. For specific synchronization needs, the resync command allows for a one-shot backfill operation using a COPY mechanism under a consistent snapshot. Furthermore, management tasks include the cleanup command, which facilitates the deletion of S3 objects and associated state for specific tables prior to complex operations like resync.

The underlying mechanism leverages concepts from logical replication and modern data lake formats. The flow dictates that PostgreSQL WAL messages are decoded, buffered, converted into Parquet files, and persisted in S3, with the Iceberg metadata managing the structure of these files. Consequently, the system bridges the transactional consistency of PostgreSQL with the analytical flexibility of the Iceberg data lake architecture, creating a mechanism where stream processing is achieved directly from the database logs without traditional extract transform load processes involving tools like Spark. The development of this system is implemented primarily in Go, demonstrating a focus on efficient, concurrent data handling.