LmCast :: Stay tuned in

Durable Execution the Hard Way

Recorded: May 28, 2026, 11:01 p.m.

Original Summarized

GitHub - hatchet-dev/durable-execution-the-hard-way: Set up a durable execution engine from scratch using Postgres with no dependencies. · GitHub

Skip to content

Navigation Menu

Toggle navigation

Sign in

Appearance settings

PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersView all resourcesOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorGitHub StarsArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback


We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

hatchet-dev

/

durable-execution-the-hard-way

Public

Notifications
You must be signed in to change notification settings

Fork
0

Star
16

Code

Issues
0

Pull requests
0

Actions

Projects

Security and quality
0

Insights

Additional navigation options

Code

Issues

Pull requests

Actions

Projects

Security and quality

Insights


hatchet-dev/durable-execution-the-hard-way

 mainBranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History16 Commits16 Commitslessonslessons  LICENSELICENSE  README.mdREADME.md  go.modgo.mod  go.sumgo.sum  View all filesRepository files navigationREADMEMIT licenseDurable execution, the hard way
Inspired by Kelsey Hightower's Kubernetes the hard way, we're going to build a durable execution engine from scratch using Go and Postgres.
Durable execution is a mechanism to incrementally checkpoint the state of a function as it makes progress, so that in the case of unexpected failure, the function can recover from where it left off. It's particularly relevant in newer stacks and projects implementing AI agents, which are long-running and stateful. A system which implements durable execution is often called a "workflow engine."
This guide uses Go and templated SQL using sqlc. The only dependencies are:

Go 1.25+
Postgres (by default, created via Docker)
pgx

If you are interested in contributing support for other languages, please create a Github issue. I'll be sharing updates (new lessons, other languages) for this guide on Twitter if you'd like to follow along.
Target audience
You will benefit from this guide if you:

Want to understand how durable execution engines like Hatchet and Temporal work at a deeper level
Are implementing your own workflow engine and would like a simple starting point for your architecture

This guide expects that you understand the foundations of SQL databases, can read code, and are familiar with some minimal backend engineering concepts, such as queues. More advanced terminology will be introduced in each lesson.
For a motivating guide on durable execution, see the blog post How to think about durable execution.
Navigating lessons
Each directory in /lessons is set up with an identical structure:

A README.md file for navigating the lesson
A main.go file for running the example code produced by the lesson, which can be run via go run .
A sql directory which contains a schema.sql file, a queries.sql file, and some files for generating templated queries via sqlc

By the final lesson, we'll have a minimal but fully-working workflow engine. Note that these lessons are not focused on developer ergonomics: we'll be building the bare minimum to understand the fundamentals, but won't implement the typical niceties you'd see in a client SDK.
Lessons

Prerequisites
Simple task queue
Limiting concurrent tasks
Task queue improvements
Durable event log
Tracking non-determinism
Durable tasks

Opinions
This guide is a somewhat opinionated view on durable execution. Specifically, it implements:

Durable execution entirely in Postgres.
Two types of functions: durable tasks and regular tasks. These map directly to durable tasks and tasks in Hatchet, and are akin to Temporal workflows and activities.
Regular tasks invokable as standalone tasks, meaning this guide implements a simple Postgres-backed task queue as well in the first few lessons.
Multiple types of retries and replays, which are treated as distinct:

Retries will retry a durable task without resetting the event history (preserving the execution state of the function)
Replays will reset a durable task's execution history to start from scratch
Forking will reset a durable task's execution history at a given point in the execution history, effectively creating a "fork" of that task. This will be the subject of a future lesson

Modifying lessons
You can modify the schema, queries, and code in each lesson to experiment. To regenerate the SQL files in each directory, run the following:
go run github.com/sqlc-dev/sqlc/cmd/sqlc generate --file sql/sqlc.yaml

Reporting issues
If you discovered an error in the core logic of any lesson, please file a Github issue. We'd be happy to reward you with a baked good from a bakery near you (yes, we're serious). If a bakery isn't available, we'd be happy to send you a Hatchet tee or hat. If you understandably don't want more vendor swag, you'll have my eternal gratitude.
Use of AI
AI has not been used to write any prose in this guide. All mistakes and turns of phrase are my own. AI has been used to:

Verify that each lesson of this guide is independently runnable and instructions are easy to follow
Generate mermaid diagrams

Ideas for future lessons
If there's sufficient interest, I'd be happy to put together additional lessons, such as:

Using Postgres LISTEN/NOTIFY to speed up processing significantly
Durable sleep
Branching and forking the durable event log

About

Set up a durable execution engine from scratch using Postgres with no dependencies.

Resources

Readme

License

MIT license

Uh oh!

There was an error while loading. Please reload this page.


Activity

Custom properties
Stars

16
stars
Watchers

0
watching
Forks

0
forks

Report repository

Releases
No releases published

Packages
0

 

 

 

Uh oh!

There was an error while loading. Please reload this page.


Contributors

Uh oh!

There was an error while loading. Please reload this page.


Languages

Go
100.0%

Footer

© 2026 GitHub, Inc.

Footer navigation

Terms

Privacy

Security

Status

Community

Docs

Contact

Manage cookies

Do not share my personal information

You can’t perform that action at this time.

This project outlines the construction of a durable execution engine from scratch utilizing Go and PostgreSQL, deliberately avoiding external dependencies to establish a foundational understanding of durable systems. The core concept revolves around durable execution, which is a mechanism designed to incrementally checkpoint the state of a function as it progresses, ensuring that a function can recover gracefully from unexpected failures by resuming execution from the last recorded checkpoint. This concept is particularly relevant in modern, stateful environments such as AI agents and complex workflow systems, leading to the implementation of a system often referred to as a workflow engine.

The implementation leverages Go for the application logic and templated SQL via sqlc for database interaction, relying solely on Go 1.25+, Postgres created via Docker, and the pgx driver as its dependencies. The guide is structured as a series of lessons designed to provide a step-by-step technical deep dive, assuming the reader possesses a foundational understanding of SQL databases, code reading, and basic backend engineering concepts like queuing.

The system distinguishes between two fundamental types of functions implemented within the engine: durable tasks and regular tasks. Durable tasks directly map to the concepts found in frameworks like Hatchet and Temporal, representing long-running operations. In contrast, regular tasks can be invoked as standalone tasks, allowing the engine to also support a simple, Postgres-backed task queue in its initial stages.

The architecture supports sophisticated state management through distinct mechanisms for handling execution control. The system differentiates between retries, replays, and forking, each managing the execution history differently. Retries are implemented to re-execute a durable task without resetting the accumulated event history, thereby preserving the function's execution state. Replays are used to completely reset a durable task's execution history, effectively restarting the process from the beginning. Forking allows for creating a branch of a durable task's execution history by resetting the execution history at a specific point, enabling parallel or divergent execution paths.

The lessons progress systematically, covering prerequisites such as simple task queuing, limiting concurrent tasks, task queue improvements, durable event logging, tracking non-determinism, and finally, durable tasks. Furthermore, the guide emphasizes an opinionated approach to durable execution, focusing on implementing these concepts entirely within the PostgreSQL environment. The lessons are designed to build up to a minimal yet fully functional workflow engine, prioritizing fundamental understanding over implementing typical client-side ergonomic features. Future ideas for expansion include leveraging Postgres LISTEN/NOTIFY for enhanced processing speed, implementing durable sleep functionality, and expanding the capabilities of the durable event log through branching and forking mechanisms.