Show HN: Streambed – Stream Postgres to Iceberg on S3, Supports Postgres Wire
Recorded: May 31, 2026, 8:02 p.m.
| Original | Summarized |
GitHub - viggy28/streambed: Stream Postgres to Apache Iceberg on S3 via logical replication, queryable over the Postgres wire protocol. · GitHub Skip to content Navigation Menu Toggle navigation
Sign in
Appearance settings PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersView all resourcesOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorGitHub StarsArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing Search or jump to... Search code, repositories, users, issues, pull requests...
Search Clear
Search syntax tips Provide feedback Include my email address so I can be contacted Cancel Submit feedback Saved searches
Name Query To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up
Appearance settings Resetting focus You signed in with another tab or window. Reload to refresh your session. Dismiss alert viggy28 streambed Public
Notifications
Fork
Star Code Issues Pull requests Actions Projects Security and quality Insights
Additional navigation options
Code Issues Pull requests Actions Projects Security and quality Insights
mainBranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History44 Commits44 Commits.github.github cmdcmd configconfig docsdocs internalinternal scriptsscripts sitesite testtest .gitignore.gitignore .goreleaser.yml.goreleaser.yml CHANGELOG.mdCHANGELOG.md CLAUDE.mdCLAUDE.md CONTRIBUTING.mdCONTRIBUTING.md LICENSELICENSE README.mdREADME.md architecture.svgarchitecture.svg docker-compose.ymldocker-compose.yml go.modgo.mod go.sumgo.sum View all filesRepository files navigationREADMEContributingApache-2.0 licenseStreambed Postgres-to-Iceberg CDC engine. Offload analytical queries from your production database without changing your application. No ETL. No Spark. Just Postgres + S3. # Build # Start syncing + query server on :5433 # Query your Postgres tables via Iceberg How It Works Streambed connects to Postgres as a logical replication subscriber. It decodes WAL messages (inserts, updates, deletes), buffers rows per table, and periodically flushes them as Parquet files to S3 with Iceberg metadata commits. Updates and deletes use copy-on-write merging against existing Parquet data. Command streambed sync streambed resync --table=public.users streambed query streambed cleanup --table=public.users Development # Unit tests # Integration tests (requires Docker) About Stream Postgres to Apache Iceberg on S3 via logical replication, queryable over the Postgres wire protocol. streambed.dev/ Topics postgres s3 parquet iceberg duckdb Resources Readme Apache-2.0 license Contributing Contributing Uh oh! There was an error while loading. Please reload this page. Activity 23 0 4 Report repository Releases Packages
Uh oh! There was an error while loading. Please reload this page. Contributors Uh oh! There was an error while loading. Please reload this page. Languages Go Shell
Footer © 2026 GitHub, Inc. Footer navigation Terms Privacy Security Status Community Docs Contact Manage cookies Do not share my personal information You can’t perform that action at this time. |
The streambed project describes an engine designed to stream data from a PostgreSQL database to Apache Iceberg stored on Amazon S3, utilizing logical replication to achieve change data capture while maintaining queryability over the standard PostgreSQL wire protocol. The primary objective of this system is to offload analytical query processing from the production database environment without requiring modifications to existing applications. The core architecture involves a pipeline where changes in the PostgreSQL Write-Ahead Log (WAL) are captured, decoded, and buffered. These buffered row changes are subsequently flushed as Parquet files onto S3, accompanied by the commitment of Iceberg metadata. Updates and deletions are managed using copy-on-write merging strategies against the existing Parquet data, ensuring consistency in the data lake format. A key feature is the integration of a query server that exposes the resulting Iceberg tables. This query server embeds DuckDB to facilitate querying directly from the data stored in S3, allowing users to interact with the data using standard PostgreSQL clients such as psql. This ability to query the stream data through the Postgres wire protocol directly links the analytical layer back to the source database interface. Operationally, the system is managed through distinct commands. The streambed sync command functions as the main daemon, handling the continuous streaming of WAL changes, writing the data to Iceberg, and optionally serving query requests. For specific synchronization needs, the resync command allows for a one-shot backfill operation using a COPY mechanism under a consistent snapshot. Furthermore, management tasks include the cleanup command, which facilitates the deletion of S3 objects and associated state for specific tables prior to complex operations like resync. The underlying mechanism leverages concepts from logical replication and modern data lake formats. The flow dictates that PostgreSQL WAL messages are decoded, buffered, converted into Parquet files, and persisted in S3, with the Iceberg metadata managing the structure of these files. Consequently, the system bridges the transactional consistency of PostgreSQL with the analytical flexibility of the Iceberg data lake architecture, creating a mechanism where stream processing is achieved directly from the database logs without traditional extract transform load processes involving tools like Spark. The development of this system is implemented primarily in Go, demonstrating a focus on efficient, concurrent data handling. |