Understanding ZFS Scrubs and Data Integrity

Recorded: Jan. 20, 2026, 10:03 a.m.

Original

Summarized

Understanding ZFS Scrubs and Data Integrity - Klara Systems Skip to content AnnouncementGet up to 35% off your first year of ZFS Infrastructure Support Learn More ArticlesContact UsMake a Payment Emergency Help Embedded Embedded CPU & Arm DevelopmentBoard Support PackagesPerformance TuningProduct Development Embedded ARM Development ExpertsKlara is a trusted partner for world-class businesses, delivering exceptional Embedded Development & ARM solutions. We support startups to enterprises, with dedicated engineers guiding each development phase. Our expertise spans embedded systems design, real-time software, IoT development, and more. Get Started Now ZFS ZFS Development Volume Management EnhancementsCaching Enhancements - L2ARCCompression and Encryption - ZIOCustom Feature DevelopmentUpstreaming Solutions ZFS Storage DesignZFS Performance AuditZFS Bug InvestigationZFS MigrationZFS Disaster Recovery Support ZFS Infrastructure SupportZFS Developer Support OpenZFS Development & SupportKlara is the premier provider of OpenZFS solutions, serving businesses of all sizes globally. Our engineers excel in ZFS development and support, ensuring seamless data management and innovative storage solutions. With a commitment to quality and a focus on reliability, Klara is your trusted partner for OpenZFS excellence. Get Started Now FreeBSD FreeBSD Development Driver DevelopmentKernel DevelopmentToolchain UpdateCustom Feature DevelopmentUpstreaming Solutions FreeBSD Bug InvestigationFreeBSD Package BuildingFreeBSD Performance AuditFreeBSD Platform Validation Support FreeBSD Infrastructure SupportFreeBSD Developer Support FreeBSD Development & SupportKlara specializes in advanced FreeBSD development, providing bespoke solutions in custom development and thorough bug resolution. With a dedication to excellence, our team guarantees optimal system performance and client support. Klara is the best choice for comprehensive FreeBSD solutions. Get Started Now Resources Resources ArticlesWebinarsContentZFS Basecamp Stay Informed and Make Smart Business Decisions with Klara's ResourcesKlara's resources ensure you stay informed on the latest developments and best practices in open source software technology, including platforms like FreeBSD and advancements in storage solutions such as OpenZFS. Our articles and webinars offer deep insights and guidance on a wide range of topics, providing detailed views on specific technologies and solutions. This content offers a unique perspective on the latest trends and developments in the open source community. Get Smarter About Company AboutCareersContact Us Community LicensingOpen Source Unlock the Power of OpenZFS, Linux, and FreeBSD with Klara's Open Source Development ExpertsKlara's decades of experience in OpenZFS, Linux, and FreeBSD software development enables us to leverage the full potential of these open-source technologies. We can work closely with your team to identify areas where these technologies can provide the most value to your organization in system optimization, security, or custom application development. Confidently expedite your development efforts and achieve your business objectives with Klara's mastery in open-source development. Partner with us today to unlock the potential of OpenZFS, Linux, and FreeBSD for your business. Learn MoreArticlesContact UsMake a Payment Emergency Help Home Resources Articles Understanding ZFS Scrubs and Data IntegrityUnderstanding ZFS Scrubs and Data Integrity January 14, 2026ZFS scrubs are a core part of how ZFS ensures data integrity. By walking the entire pool, verifying every block against its checksum, and repairing minor corruption early, scrubs prevent silent data loss before it becomes catastrophic. This article explains how scrubs work, how to interpret zpool status output, and how regular scrubbing supports long-term reliability in production ZFS systems. Your subscription could not be saved. Please try again. Your subscription has been successful. Enter your email address to subscribe to Klara's newsletter. SUBSCRIBE I agree to receive your newsletters and accept the data privacy statement. You may unsubscribe at any time using the link in our newsletter. Umair Khurshid Developer, open source contributor, and relentless homelab experimenter. Umair Khurshid Did you know?Improve the way you make use of ZFS in your company.Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS? ZFS Support ZFS DevelopmentAdditional ArticlesHere are more interesting articles on ZFS that you may find useful:Unwrapping ZFS: Gifts from the Open Source CommunityWhat We Built: Top ZFS Capabilities Delivered by Klara in 2025Can You Have Too Many VDEVs? A Practical Guide to ZFS ScalingIs DWPD Still a Useful SSD Spec?When RAID Isn’t Enough: ZFS Redundancy Done Right View More ArticlesZFS refuses to trust anything it cannot verify, whereas most filesystems assume that storage hardware will return the correct data unless it reports an error, ZFS makes no such assumptions: every block must be proven correct. That difference matters, because silent corruption is one of the most dangerous failure modes in modern storage—by the time you notice it, the damage is already done. ZFS checks everything it stores, keeps the checksums inside the parent block pointers, and leans on redundancy to repair anything that does not match. Scrubs are a specialized patrol read that walks the entire pool and confirms that the data still matches the record of what should be there. In this article, we will walk through what scrubs do, how the Merkle tree layout lets ZFS validate metadata and data from end to end, how redundancy ties into checksum repair, and why scrubs are not the same as resilvers. What Are ZFS Scrubs? A ZFS scrub is a pool-wide verification procedure that reads every allocated block of data and metadata, and checks it against its stored checksum. This verification includes metadata blocks, user data blocks, and even the parity blocks ZFS stores to be able to recover from checksum errors. Many descriptions of scrubs incorrectly imply that only user data is checked, but on the contrary, ZFS treats metadata with the same level of protection, and a scrub verifies both thoroughly.During a scrub, ZFS walks the entire tree of block pointers that make up the dataset. ZFS is built around a Merkle tree structure in which each parent block contains block pointers for its children, and each block pointer contains the checksum for the block it references. The parent checksum therefore protects the child metadata. This recursive structure continues downward until the physical blocks on disk. If a leaf block is corrupted, the mismatch propagates upward, making it impossible for corruption to hide. When scrub reads a block, it recalculates the checksum from the data returned by the disk and compares the computed value to the checksum stored in the block pointer. If they match, ZFS can be sure the block is valid, whereas if the values differ, the block is corrupt. ZFS then tries to repair the block using the available redundancy. Scrubs differ significantly from traditional filesystem checks. Tools such as fsck or chkdsk examine logical structures and attempt to repair inconsistencies related to directory trees, allocation maps, reference counts, and other metadata relationships. ZFS does not need to perform these operations during normal scrubs because its transactional design ensures metadata consistency. Every transaction group moves the filesystem from one valid state to another. The scrub verifies the correctness of the data and metadata at the block level, not logical relationships. Checksums, Redundancy, and Self Healing ZFS ensures block correctness through a combination of strong checksums and redundancy. Checksums detect corruption and redundancy makes it possible to repair corruption. Both are necessary and neither alone is sufficient. HDDs typically have a BER (Bit Error Rate) of 1 in 1015, meaning some incorrect data can be expected around every 100 TiB read. That used to be a lot, but now that is only 3 or 4 full drive reads on modern large-scale drives. Silent corruption is one of those problems you only notice after it has already done damage. In mirrored configurations, ZFS read from any of the copies. In a three-way mirror, ZFS can lose two copies and still recover the correct block from the third. Unlike with legacy RAID mirrors, the checksum allows ZFS to determine which mirror copies are correct, and which are corrupt, and apply the relevant repair writes, where hardware RAID, or mdraid would simply synchronize the two copies to make them identical again, possibly spreading the damage and destroying the remaining correct copies of the data. In RAID-Z configurations, ZFS reconstructs the block from parity and writes the repaired version back to disk. The repaired block is then available for future reads. This behavior is the self-healing property of ZFS. When ZFS reads any block, if it detects data corruption, it will automatically issue additional reads for other copies or parity to reconstruct the correct data, then write it back to maintain integrity. Scrubs extend the same behavior across the entire pool, even the parity itself, which is not normally read. Scrubs ensure that blocks are corrected before corruption can accumulate to a dangerous level. Small errors appear naturally over time due to cosmic radiation, media decay, and mechanical issues. A system that never performs scrubs allows these small errors to accumulate. Once accumulated, they may exceed redundancy capacity and lead to data loss. A system that scrubs regularly prevents the accumulation from ever reaching that threshold. ZFS follows a clear rule: if it does not have enough redundancy to rebuild the correct data, it reports an error instead of returning corrupted data. You see the errors listed in zpool status and can act accordingly. This combination of detection, repair, and strict failure behavior forms the foundation of ZFS reliability. Interpreting zpool status The zpool status command provides insight into scrub progress, scrub results, and pool health. A scrub report includes the number of blocks examined, the number of blocks repaired, the duration of the scrub, and the average scan rate. It also includes error counts for each device. Repaired blocks indicate that ZFS found mismatched checksums and corrected the underlying data. A small number of repaired blocks is normal for large storage pools, and occasional bit rot is expected. A rising number of repaired blocks over multiple scrubs signals a problem. Checksum errors that occur during normal operation are often more serious. If ZFS repairs the data using redundancy, the pool remains healthy, but the device involved should be examined carefully. A single device that repeatedly produces checksum errors often requires replacement. The scan rate during a scrub is another value that administrators and storage engineers frequently examine. However, this value requires interpretation. Scrub progress estimates can be misleading, this is because the scrub process goes through different phases that involve varying amounts (and sizes) of I/O. Early in the scrub, the estimate may promise an unrealistically short completion time. Later, it may predict an extremely long duration. Both extremes are normal. This is why it’s better to focus on long-term patterns rather than instantaneous values. Historical scrub data is often more informative than a single scrub result. A significant increase in duration may indicate fragmentation, device slowdowns, workload changes, or controller problems. Monitoring systems that record scrub durations can reveal trends that manual inspections overlook. Below is an example of a zpool status report: pool: tank state: ONLINE scan: scrub in progress since Wed Dec 10 10:14:27 2025 1.23T scanned at 1.38G/s, 842G issued, 31.2G repaired 10 percent done, 1 days 02:13:48 to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 sda ONLINE 0 0 5 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 errors: No known data errors In this report, the repaired count identifies mismatched blocks that ZFS corrected. Occasional repaired blocks are normal in large pools but, rising totals across scrubs indicate degrading hardware. The READ and WRITE counters indicate device level I/O errors. Persistent values on a single device suggest cable problems, controller instability, or a failing disk. The CKSUM column indicates checksum mismatches detected during normal reads. ZFS also provides the zpool events command, which records asynchronous pool events. These events capture issues like checksum errors, failed reads, or device removals, giving context to scrub results by showing which files or blocks were affected. ZFS then integrates with ZED (or zfsd on FreeBSD), which can take automatic action, such as replacing a failed disk with a spare, or onlining a disk that is reattached. Additionally, these tools can be extended to allow the administrator to add their own responses to such events, such as triggering locate LEDs and sending notifications. While zpool events does not report hardware-level occurrences such as controller resets, combining its output with scrub observations can help identify underlying data integrity or device problems. Scrubs vs Resilvers Scrubs and resilvers are both scanning operations, but they serve different purposes and exhibit different behaviours. A scrub verifies the integrity of blocks that already exist on disk. It reads all allocated blocks and verifies their checksums. Scrub repairs blocks that do not match their stored checksums and is a maintenance process designed to detect corruption. A resilver rebuilds one or more devices after a failure or replacement. During a resilver, ZFS identifies which blocks need to be reconstructed on the replacement device. If there are multiple vdevs, a resilver can skip data on healthy vdevs to speed up the resilvering process, making it much faster than a scrub. Whereas a scrub always reads all data, including the parity, to ensure it is correct. The performance profile of scrubs differs from resilvers in focus rather than scope. Both traverse the entire metadata tree, but a resilver only processes blocks belonging to missing or replaced devices, skipping healthy data, while a scrub examines all allocated blocks in the pool to detect silent corruption. Automation and Monitoring Most systems that include ZFS schedule scrubs once per month. This frequency is appropriate for many environments, but high churn systems may require more frequent scrubs. Archival systems that contain largely static data also benefit from more frequent scrubbing because infrequently accessed data is the most vulnerable to silent corruption. Administrators can adjust scrub schedules through cron or through native periodic maintenance frameworks. When adjusting schedules, administrators must consider workload patterns, peak I/O periods, and the impact of scrubs on latency-sensitive applications. Scrubs consume read bandwidth across all devices, so scheduling scrubs during low activity periods often produces better performance and more predictable estimates. Monitoring systems should record scrub duration, repair counts, error counts, and device anomalies. ZED provides immediate notifications for pool events, including scrub completion and errors encountered during scrubs. Integrating ZED with email, Slack, or a ticketing system ensures that administrators receive timely alerts. When automation and monitoring are combined, scrubs become a predictable part of storage operations rather than an occasional task. Best Practices Administrators can significantly improve ZFS reliability by following consistent best practices related to scrubs, redundancy, monitoring, and hardware management. Run Scrubs On a Regular Basis Monthly scrubs are the most common, but every environment is different. No environment should go more than 4 months without a scrub. The goal is to prevent the accumulation of corruption rather than to react to it. Respond to Scrub Findings Immediately Even small increases in repaired counts should be investigated. Repeated checksum errors on a device indicate instability, and devices that produce repeated errors should be replaced before they cause degraded performance or unrecoverable damage. Maintain Hardware Properly Faulty or loose cables cause checksum errors that resemble disk failure, but they are only one category of problems. I have also run into controller firmware bugs that caused intermittent errors even when the cabling was solid. Similarly, drive trays with vibration issues can also degrade device performance. You should verify hardware health regularly, especially when repeated scrub anomalies appear. Maintain Separate Backups Although ZFS provides strong protection against many forms of data loss, it is not a substitute for backups and proper disaster recovery policies. Scrubs cannot repair data that was written in a corrupted state or data that was overwritten accidentally. Wrapping Up ZFS scrubs conduct complete, block-level verification of the entire pool. They validate checksums, repair corruption through redundancy, and warn you about developing problems. A well planned ZFS deployment includes scrub automation, monitoring, redundancy planning, hardware validation, and operational discipline. This combination ensures predictable behavior across years of service. Klara supports countless organizations that rely on ZFS for production workloads. Our team assists with designing scrub policies, pool architecture, and long-term scaling strategies. With Klara’s ZFS Storage Design service, work with ZFS engineers to ensure you make the right choices and are well served by your storage in the long term. Topics / Tags health checks ZFS scrub Back to Articles × Umair KhurshidDeveloper, open source contributor, and relentless homelab experimenter. Learn About Klara © 2026 Klara Inc. All Rights Reserved. Privacy Policy EmbeddedCPU & Arm DevelopmentBoard Support PackagesPerformance TuningProduct DevelopmentZFSDevelopmentCustom Feature DevelopmentSolutionsZFS Storage DesignSupportZFS Infrastructure SupportFreeBSDDevelopmentKernel DevelopmentSolutionsFreeBSD Bug InvestigationSupportFreeBSD Infrastructure SupportResourcesArticlesWebinarsContentZFS BasecampEmergency HelpAboutCompanyAboutCareersContact UsCommunityLicensingOpen Source This website stores cookies on your computer. These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media. To find out more about the cookies we use, see our Privacy Policy.AcceptDecline

ZFS scrubs are a critical mechanism for ensuring data integrity in ZFS storage systems, designed to proactively detect and repair silent corruption that may go unnoticed by traditional filesystems. Unlike most filesystems, which assume storage hardware reliably returns correct data unless it explicitly reports an error, ZFS operates under the principle that every block must be verified for correctness. This approach is essential because silent data corruption—where data becomes altered without the system detecting it—can lead to irreversible damage before it is noticed. ZFS mitigates this risk by embedding checksums within block pointers and using redundancy to repair discrepancies. Scrubs are a specialized form of "patrol read" that systematically verifies the entire storage pool by reading every allocated block, including metadata and parity data, and cross-checking it against its stored checksum. This process not only identifies corrupted blocks but also leverages redundancy to repair them before the corruption escalates into data loss. The article emphasizes that scrubs are not merely a routine check but an essential maintenance activity that underpins the reliability of ZFS in production environments.

At the core of ZFS’s data validation is its Merkle tree architecture, which organizes block pointers recursively to ensure end-to-end integrity. Each parent block contains checksums for its child blocks, creating a hierarchical structure that propagates errors upward if any corruption is detected. During a scrub, ZFS traverses this Merkle tree, recalculating checksums from the data read and comparing them to stored values. If a mismatch occurs, ZFS initiates repairs using available redundancy, such as mirroring or RAID-Z parity. This self-healing capability is a defining feature of ZFS, as it enables the system to automatically correct errors without human intervention. However, scrubs differ fundamentally from traditional filesystem checks like fsck or chkdsk, which focus on logical structures such as directory trees and allocation maps. ZFS’s transactional design ensures metadata consistency, so scrubs primarily address block-level correctness rather than logical inconsistencies. This distinction underscores the efficiency of ZFS’s approach, as it minimizes overhead while maximizing reliability through targeted verification.

The article highlights the interplay between checksums, redundancy, and automated repair mechanisms. For example, in mirrored configurations, ZFS can recover from corruption by reading from any of the available copies and using checksums to identify which mirror is correct. In RAID-Z setups, parity data is used to reconstruct corrupted blocks, a process that requires significant I/O but ensures data integrity. However, the article warns that both checksums and redundancy are indispensable—checksums alone cannot repair errors, while redundancy without verification would fail to detect corruption. The balance between these elements is critical, as silent data corruption can still occur if either component is lacking. The text also notes that modern storage devices, such as HDDs with a Bit Error Rate (BER) of 1 in 10¹⁵, are prone to accumulating small errors over time. Without regular scrubs, these minor issues can escalate beyond the capacity of redundancy to correct them, leading to data loss. This is why ZFS enforces a strict policy: if it cannot rebuild correct data from redundancy, it will report an error rather than risk returning corrupted information.

Interpreting the output of `zpool status` is a key skill for administrators managing ZFS pools. The command provides detailed insights into scrub progress, including the number of blocks scanned, repaired, and the duration of the operation. The "repaired" count indicates how many blocks ZFS corrected during the scrub, with a small number typically acceptable in large pools due to normal wear and tear. However, an increasing trend across multiple scrubs suggests hardware degradation, such as failing drives or faulty cables. The "READ" and "WRITE" counters track I/O errors at the device level, with persistent values on a single drive often pointing to issues like unstable controllers or failing disks. The "CKSUM" column, meanwhile, reflects checksum mismatches detected during normal reads, which may indicate uncorrected errors that require investigation. The article also mentions the `zpool events` command, which logs asynchronous pool events such as checksum errors or device failures. Integrating this with ZED (ZFS Event Daemon) or zfsd on FreeBSD allows for automated responses, such as replacing failed drives with spares or triggering alerts. While `zpool events` does not capture hardware-level occurrences like controller resets, combining its data with scrub results can help identify underlying issues.

A critical distinction in ZFS operations is the difference between scrubs and resilvers. While both involve scanning data, they serve distinct purposes. A scrub verifies the integrity of existing blocks by checking their checksums and repairing any mismatches, acting as a proactive maintenance task. In contrast, a resilver rebuilds data on a replacement or failed device, reconstructing missing blocks from redundancy. Resilvers focus on restoring functionality after a failure, whereas scrubs are about detecting and correcting corruption before it becomes critical. The performance characteristics of these processes also differ: resilvers typically target only the affected devices, skipping healthy data to accelerate the process, while scrubs examine all allocated blocks, including parity, to ensure comprehensive validation. This distinction is crucial for understanding how each operation impacts system performance and resource usage, with resilvers often being faster but scrubs more thorough.

Automation and monitoring are essential for effective ZFS management, as regular scrubs require careful scheduling to minimize disruptions. The article recommends monthly scrubs for most environments, though high-activity systems or archival storage may necessitate more frequent checks. Scheduling scrubs during low I/O periods helps reduce latency impacts, as the process consumes read bandwidth across all devices. Tools like `cron` or native maintenance frameworks allow administrators to automate this, but they must also account for workload patterns and peak usage times. Monitoring systems should track metrics such as scrub duration, repair counts, error rates, and device anomalies to identify trends that manual inspections might miss. ZED provides real-time notifications for pool events, including scrub completion and errors encountered during scans, which can be integrated with email, Slack, or ticketing systems for timely alerts. When combined with logging and analysis tools, automation transforms scrubs from occasional tasks into predictable, routine operations that enhance system reliability.

Best practices for ZFS scrub management emphasize consistency and vigilance. Administrators are advised to run scrubs at least once every four months, as longer intervals increase the risk of undetected corruption. Any rise in repair counts should prompt immediate investigation, as persistent checksum errors on a device often signal hardware failure. Maintaining physical infrastructure is equally important: faulty cables, unstable controllers, or vibration issues in drive trays can generate errors that mimic disk failure. Regular hardware health checks, especially when anomalies are detected, help prevent cascading failures. Despite ZFS’s robust protection mechanisms, the article stresses that it is not a substitute for traditional backups. Scrubs cannot recover data written in a corrupted state or overwritten accidentally, making offsite backups and disaster recovery plans indispensable.

In conclusion, ZFS scrubs are a foundational component of data integrity in modern storage systems, combining advanced checksumming, redundancy, and automated repair to safeguard against silent corruption. By systematically verifying every block and leveraging metadata structures like Merkle trees, ZFS ensures that errors are detected and corrected before they escalate. The interplay between scrubs, resilvers, and monitoring tools creates a resilient framework that supports long-term reliability. However, effective implementation requires adherence to best practices, including regular scheduling, hardware maintenance, and integration with monitoring systems. For organizations relying on ZFS for critical workloads, these measures are not just beneficial but essential for maintaining data trustworthiness. Klara Systems’ expertise in ZFS development and support underscores the importance of these practices, offering tailored solutions for designing scrub policies, optimizing pool architecture, and ensuring scalable storage strategies.