PostHog will train AI models with your data (opted-in by default)
Recorded: May 27, 2026, 5 p.m.
| Original | Summarized |
Training our own AI models - PostHogProduct OSPricingDocsCommunityCompanyMoreGet started – free1home.mdxProduct OSPricingcustomers.mdxdemo.movDocsTalk to a humanAsk a questionSign up ↗Switch to website modeWhy PostHog?ChangelogCompany handbookStoreWork hereTrashTraining our own AI modelsJames HawkinsMay 27, 2026CEO diariesContentsWhat we want to buildHow this will workWhy this is opt out, not opt inI really think we're on the verge of some of our best work through the next six months.Over the past year, we've started building more AI-powered features into PostHog, like our AI installation wizard, PostHog AI, and our MCP. They're all wildly popular, but they're only the start.PostHog's next chapter is about building more proactive, self-driving products. Products that surface answers and solutions for you, act on them, and improve over time. This is the vision for PostHog Code, which is now in beta. To enable this and more products like it, we want to try something new. We want to train models on data in PostHog.What we want to buildWe have two goals here:Make our existing products smarter, more proactive, and useful to youBuild entirely new products, like PostHog Code, that help teams build better products, fasterThe first area we're interested in is session replay analysis. PostHog AI can already detect issues in replays, but it's expensive and doesn't scale well. We want replays to be as powerful at scale as they are for diagnosing the problems of individual users, and we think a model trained on the underlying data that powers replays will help us achieve this. Another idea I'm especially excited about is synthetic user testing – i.e. using our knowledge of user behavior to identify when users might get confused, or what flows might break, before you ship to production. As coding models improve, many people are seeing test and review workload increase hugely. We want to automate this, so you can focus on your product.And, if we can get better at predicting user behavior, we should be able to suggest changes that will improve conversion, and reduce user frustration, for features you've already shipped as well. If we can automate this work for you, you'll spend less time on manual analysis and burn fewer tokens in the process.Our ideas here are experimental. It will take iteration to figure out how to train models effectively, and what data is actually useful. But, so far, every time we've added AI in a way that makes the product simpler or more powerful, it's worked well, so we think it's worth trying.How this will workWe've spent a lot of time thinking about this from a user perspective, especially the tradeoffs.The upside is the kinds of improvements described above.Most tools are focused on providing you with the best code; we want to focus our energy into making your product the best it can be. This is why we describe PostHog Code as a product editor.The downside is that this involves using data in PostHog to train models.Most companies would bury this change in a deceptively boring T&Cs update, but we value transparency, so here's what you need to know in an internet-friendly numbered list: Users on our EU cloud instance are opted out by defaultSo too users with agreements that prevent training (e.g. BAA, MSA, or similar)All other users on our US cloud instance are opted in by defaultWe will anonymize all data before it's used for trainingWe will only use data that already exists in your PostHog instanceWe will do all the model training ourselves, which means...We won't sell or send your data to third-party model providersYou can opt out at any time via your org settings in PostHog (admin access required)Training won't start until June 29, so there's plenty of time to decideIn terms of comms, we are:Emailing all our customers and making it super obvious what the email is aboutNotifying all our users through in-app notifications (in case you don't read emails)Communicating our plans very publicly (like in this post)I want to stress that our goal here is to improve PostHog as a product for our customers, not to expose or sell models trained on your data, or monetize your data.Why this is opt out, not opt inPut simply, because otherwise we will not have enough data to train a model that's actually useful.If you choose to opt out, the new features that we're building with these models won't be available to you, as they'll depend on this data. If you're opted out by default (e.g. because you're on our EU cloud instance), you can choose to opt in manually provided any legal agreements you have with us don't exclude this option.We're choosing to be upfront about this rather than quietly rolling something out, because we think that's the right way to do it.If you want to talk about this, I'm james at you can guess it.We're also hiring AI researchers, so get in touch if you want to work on this with us.PostHog is an all-in-one developer platform for building successful products. We provide product analytics, web analytics, session replay, error tracking, feature flags, experiments, surveys, AI Observability, logs, workflows, endpoints, data warehouse, CDP, and an AI product assistant to help debug your code, ship features faster, and keep all your usage and customer data in one stack.Community questionsAsk a questionQuestions about this page? Ask PostHog AI or post a community question. |
PostHog is pursuing a vision for its next phase centered on building more proactive and self-driving products that can surface answers, act upon them, and continuously improve over time. This ambition drives efforts to train models on data within the PostHog platform to achieve two primary goals: making existing products smarter and more proactive, and creating entirely new products, such as PostHog Code, designed to help teams build superior products more rapidly. The planned applications for this data-driven approach focus on several key areas. One major area of interest is session replay analysis, aiming to scale the capability beyond PostHog AI’s current function, which is costly and does not scale effectively. The intention is to develop a model trained on the underlying replay data to provide powerful, scalable tools for diagnosing problems. Another exciting goal is synthetic user testing, which seeks to leverage knowledge of actual user behavior to anticipate points of confusion or potential flow breaks before features reach production. Furthermore, as coding models and review workloads increase, the goal is to automate this process, allowing teams to concentrate on product development. By improving the ability to predict user behavior, the system aims to suggest modifications that can enhance conversion rates and reduce user frustration for features that have already been released. Although these ideas are experimental and will require significant iteration to determine the most useful data, the approach is justified by past experience where incorporating AI into the product has yielded positive results. The core philosophy is to focus energy on perfecting the product itself, positioning PostHog Code as a product editor rather than solely focusing on providing the best code solutions. The implementation involves careful consideration of user experiences and data governance. Transparency regarding the use of data for training is paramount, establishing specific opt-out mechanisms. Users on the EU cloud instance are opted out by default, while other users on the US cloud instance are opted in by default. PostHog commits to anonymizing all data prior to training, ensuring that only data already present in the user’s PostHog instance is utilized. The entire model training process is performed in-house, meaning PostHog will not sell or transmit user data to external model providers. Users maintain control, being able to opt out at any time through their organization settings within PostHog. The decision regarding training rollout is scheduled to begin on June 29th, allowing users ample time to make their choices. Communication regarding these plans will be direct and explicit, involving emailing all customers and using in-app notifications to ensure visibility, alongside public communication of these strategies. The fundamental objective of this initiative is purely to enhance PostHog as a product for its customers, explicitly avoiding the exposure or monetization of models trained on user data. This opt-out mechanism is framed not as an imposition, but as a necessity, arguing that without user consent, sufficient data for training a genuinely useful model cannot be obtained. PostHog is an integrated developer platform providing analytics, web analytics, error tracking, feature flags, experiments, surveys, AI Observability, logs, data warehouses, and a product assistant. James Hawkins, the CEO, is also engaging in hiring AI researchers to contribute to this development effort. |