Kore: Binary File Format Optimized for Modern Data Systems (Open Source)
Recorded: May 30, 2026, 10:02 p.m.
| Original | Summarized |
GitHub - arunkatherashala/Kore · GitHub Skip to content Navigation Menu Toggle navigation
Sign in
Appearance settings PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersView all resourcesOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorGitHub StarsArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing Search or jump to... Search code, repositories, users, issues, pull requests...
Search Clear
Search syntax tips Provide feedback Include my email address so I can be contacted Cancel Submit feedback Saved searches
Name Query To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up
Appearance settings Resetting focus You signed in with another tab or window. Reload to refresh your session. Dismiss alert arunkatherashala Kore Public
Notifications
Fork
Star Code Issues Pull requests Actions Projects Security and quality Insights
Additional navigation options
Code Issues Pull requests Actions Projects Security and quality Insights
release/v0.1.0BranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History93 Commits93 Commits.github/workflows.github/workflows .venv.venv build/lib/language-bindingsbuild/lib/language-bindings cloud-connectorscloud-connectors hadoophadoop kore-binary-parserkore-binary-parser kore_fileformat.egg-infokore_fileformat.egg-info kore_fileformatkore_fileformat kore_fileformat_killerkore_fileformat_killer language-bindingslanguage-bindings pythonpython query-optimizationquery-optimization rust-bindingsrust-bindings spark-scalaspark-scala srcsrc toolstools .GITHUB_ACTIONS_SETUP.md.GITHUB_ACTIONS_SETUP.md .gitignore.gitignore .yamllint.yamllint ALL_PHASES_COMPLETE.mdALL_PHASES_COMPLETE.md CHANGELOG.mdCHANGELOG.md COMPILATION_REPORT.mdCOMPILATION_REPORT.md Cargo.lockCargo.lock Cargo.tomlCargo.toml DELIVERABLES.mdDELIVERABLES.md DEPLOYMENT_COMPLETE.mdDEPLOYMENT_COMPLETE.md DEPLOYMENT_GUIDE.mdDEPLOYMENT_GUIDE.md DEPLOYMENT_MANIFEST.mdDEPLOYMENT_MANIFEST.md DETAILED_SETUP.mdDETAILED_SETUP.md DockerfileDockerfile FINAL_TEST_REPORT.mdFINAL_TEST_REPORT.md IMPLEMENTATION_WAVE_2_COMPLETE.mdIMPLEMENTATION_WAVE_2_COMPLETE.md INSTALL_MISSING_TOOLS.mdINSTALL_MISSING_TOOLS.md LICENSELICENSE MANIFEST.inMANIFEST.in MAVEN_CENTRAL_GPG_SETUP.mdMAVEN_CENTRAL_GPG_SETUP.md PHASE2_IMPLEMENTATION.mdPHASE2_IMPLEMENTATION.md PHASES_2_7_PARALLEL_IMPLEMENTATION.mdPHASES_2_7_PARALLEL_IMPLEMENTATION.md PHASES_STATUS.mdPHASES_STATUS.md PRODUCTION_STATUS.mdPRODUCTION_STATUS.md QUICK_GPG_SETUP.mdQUICK_GPG_SETUP.md QUICK_START.mdQUICK_START.md README.mdREADME.md README_FINAL.mdREADME_FINAL.md RELEASE_NOTES.mdRELEASE_NOTES.md SETUP_DOCKER_MAVEN.mdSETUP_DOCKER_MAVEN.md SPARK_HADOOP_INTEGRATION_PLAN.mdSPARK_HADOOP_INTEGRATION_PLAN.md SPARK_INTEGRATION_DELIVERY.mdSPARK_INTEGRATION_DELIVERY.md TEST_RESULTS.mdTEST_RESULTS.md cargo_build_output.txtcargo_build_output.txt deploy_all_platforms.ps1deploy_all_platforms.ps1 docker_log.txtdocker_log.txt docker_log2.txtdocker_log2.txt docker_log3.txtdocker_log3.txt docker_log4.txtdocker_log4.txt generate-gpg-key.ps1generate-gpg-key.ps1 generate-gpg-key.shgenerate-gpg-key.sh ghcr_log.txtghcr_log.txt h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256 h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256).Hashh .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256).Hash inspect_wheel.pyinspect_wheel.py integration_tests.ps1integration_tests.ps1 killer_err.txtkiller_err.txt killer_out.txtkiller_out.txt kore-private-key.asckore-private-key.asc kore-public-key.asckore-public-key.asc kore_builtin_regression.killerkore_builtin_regression.killer kore_builtin_regression_test.korekore_builtin_regression_test.kore kore_fileformat.killerkore_fileformat.killer kore_fileformat_copy_test.korekore_fileformat_copy_test.kore kore_fileformat_test.korekore_fileformat_test.kore latest_maven_log.txtlatest_maven_log.txt maven_full_log.txtmaven_full_log.txt maven_log.txtmaven_log.txt pyproject.tomlpyproject.toml runlog.txtrunlog.txt sample_10mb.csvsample_10mb.csv sample_builtin_out.koresample_builtin_out.kore temp_log.txttemp_log.txt temp_run.txttemp_run.txt test_all_phases.ps1test_all_phases.ps1 test_suite.ps1test_suite.ps1 warnings_build.txtwarnings_build.txt warnings_build_utf8.txtwarnings_build_utf8.txt workflow_log.txtworkflow_log.txt View all filesRepository files navigationREADMELicense🚀 Kore — Killer Optimized Record Exchange 38% compression ratio (vs 63% for Parquet) Quick Start // Write data // Read data // Read specific column // Get file info spark = SparkSession.builder.appName("KoreExample").getOrCreate() # Read Kore file # Write to Kore (38% compression!) # Spark SQL support (3.5+) Ensure Cargo.toml metadata is correct (authors, repository, keywords). Notes About No description, website, or topics provided. Readme View license Uh oh! There was an error while loading. Please reload this page. Activity 9 0 1 Report repository Releases Release v1.2.9 Latest Packages
Uh oh! There was an error while loading. Please reload this page. Contributors Uh oh! There was an error while loading. Please reload this page. Languages Python Makefile Rust PowerShell Java Scala Other
Footer © 2026 GitHub, Inc. Footer navigation Terms Privacy Security Status Community Docs Contact Manage cookies Do not share my personal information You can’t perform that action at this time. |
Kore is presented as a high-performance binary file format specifically engineered to optimize analytical workloads for big data processing. The core advantages of the Kore format are centered on superior data compression and enhanced query efficiency. It achieves a 38 percent compression ratio when compared to formats like Parquet, offering significant space savings. Furthermore, the format is designed to accelerate data retrieval by enabling query speedups of up to 131 times through techniques such as column pruning and predicate pushdown. Data integrity is ensured through zero data loss verification, which has been tested across over four hundred thousand cells. The architecture of Kore strongly emphasizes integration with the big data ecosystem. A key feature is its native integration with Apache Spark, allowing for direct reading and writing of Kore files using PySpark, which supports Spark SQL functionality starting from version 3.5. This native support means that data stored in the Kore format can be seamlessly utilized within the Spark environment. The format provides a structured programming interface, exemplified by a Rust library that allows users to perform fundamental operations on Kore files, such as writing simple data, reading data, extracting specific columns, and retrieving file metadata. This indicates that Kore is designed not only for storage efficiency but also for efficient manipulation at the file format level. The project also facilitates broader system integration through dedicated Python interfaces, specifically including KoreDataFrameReader and KoreDataFrameWriter, which enable the efficient transfer of dataframes into and out of the Kore system within Spark applications. The development process involves careful consideration of compilation, testing, and deployment, as indicated by the presence of detailed build reports, various test results, and Dockerfile configurations, demonstrating a commitment to providing robust and deployable solutions for this specialized data format. The repository structure suggests a comprehensive approach to format development, encompassing source code, testing protocols, and operational deployment guides. |