Kore: Binary File Format Optimized for Modern Data Systems (Open Source)

Recorded: May 30, 2026, 10:02 p.m.

Original

Summarized

GitHub - arunkatherashala/Kore · GitHub

Navigation Menu

Toggle navigation

Appearance settings

PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersView all resourcesOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorGitHub StarsArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

arunkatherashala

/

Kore

Public

Notifications
You must be signed in to change notification settings

Fork
1

Star
9

Code

Issues
0

Pull requests
0

Actions

Projects

Security and quality
0

Insights

Additional navigation options

Code

Issues

Pull requests

Actions

Projects

Security and quality

Insights

arunkatherashala/Kore

release/v0.1.0BranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History93 Commits93 Commits.github/workflows.github/workflows .venv.venv build/lib/language-bindingsbuild/lib/language-bindings cloud-connectorscloud-connectors hadoophadoop kore-binary-parserkore-binary-parser kore_fileformat.egg-infokore_fileformat.egg-info kore_fileformatkore_fileformat kore_fileformat_killerkore_fileformat_killer language-bindingslanguage-bindings pythonpython query-optimizationquery-optimization rust-bindingsrust-bindings spark-scalaspark-scala srcsrc toolstools .GITHUB_ACTIONS_SETUP.md.GITHUB_ACTIONS_SETUP.md .gitignore.gitignore .yamllint.yamllint ALL_PHASES_COMPLETE.mdALL_PHASES_COMPLETE.md CHANGELOG.mdCHANGELOG.md COMPILATION_REPORT.mdCOMPILATION_REPORT.md Cargo.lockCargo.lock Cargo.tomlCargo.toml DELIVERABLES.mdDELIVERABLES.md DEPLOYMENT_COMPLETE.mdDEPLOYMENT_COMPLETE.md DEPLOYMENT_GUIDE.mdDEPLOYMENT_GUIDE.md DEPLOYMENT_MANIFEST.mdDEPLOYMENT_MANIFEST.md DETAILED_SETUP.mdDETAILED_SETUP.md DockerfileDockerfile FINAL_TEST_REPORT.mdFINAL_TEST_REPORT.md IMPLEMENTATION_WAVE_2_COMPLETE.mdIMPLEMENTATION_WAVE_2_COMPLETE.md INSTALL_MISSING_TOOLS.mdINSTALL_MISSING_TOOLS.md LICENSELICENSE MANIFEST.inMANIFEST.in MAVEN_CENTRAL_GPG_SETUP.mdMAVEN_CENTRAL_GPG_SETUP.md PHASE2_IMPLEMENTATION.mdPHASE2_IMPLEMENTATION.md PHASES_2_7_PARALLEL_IMPLEMENTATION.mdPHASES_2_7_PARALLEL_IMPLEMENTATION.md PHASES_STATUS.mdPHASES_STATUS.md PRODUCTION_STATUS.mdPRODUCTION_STATUS.md QUICK_GPG_SETUP.mdQUICK_GPG_SETUP.md QUICK_START.mdQUICK_START.md README.mdREADME.md README_FINAL.mdREADME_FINAL.md RELEASE_NOTES.mdRELEASE_NOTES.md SETUP_DOCKER_MAVEN.mdSETUP_DOCKER_MAVEN.md SPARK_HADOOP_INTEGRATION_PLAN.mdSPARK_HADOOP_INTEGRATION_PLAN.md SPARK_INTEGRATION_DELIVERY.mdSPARK_INTEGRATION_DELIVERY.md TEST_RESULTS.mdTEST_RESULTS.md cargo_build_output.txtcargo_build_output.txt deploy_all_platforms.ps1deploy_all_platforms.ps1 docker_log.txtdocker_log.txt docker_log2.txtdocker_log2.txt docker_log3.txtdocker_log3.txt docker_log4.txtdocker_log4.txt generate-gpg-key.ps1generate-gpg-key.ps1 generate-gpg-key.shgenerate-gpg-key.sh ghcr_log.txtghcr_log.txt h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256 h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256).Hashh .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256).Hash inspect_wheel.pyinspect_wheel.py integration_tests.ps1integration_tests.ps1 killer_err.txtkiller_err.txt killer_out.txtkiller_out.txt kore-private-key.asckore-private-key.asc kore-public-key.asckore-public-key.asc kore_builtin_regression.killerkore_builtin_regression.killer kore_builtin_regression_test.korekore_builtin_regression_test.kore kore_fileformat.killerkore_fileformat.killer kore_fileformat_copy_test.korekore_fileformat_copy_test.kore kore_fileformat_test.korekore_fileformat_test.kore latest_maven_log.txtlatest_maven_log.txt maven_full_log.txtmaven_full_log.txt maven_log.txtmaven_log.txt pyproject.tomlpyproject.toml runlog.txtrunlog.txt sample_10mb.csvsample_10mb.csv sample_builtin_out.koresample_builtin_out.kore temp_log.txttemp_log.txt temp_run.txttemp_run.txt test_all_phases.ps1test_all_phases.ps1 test_suite.ps1test_suite.ps1 warnings_build.txtwarnings_build.txt warnings_build_utf8.txtwarnings_build_utf8.txt workflow_log.txtworkflow_log.txt View all filesRepository files navigationREADMELicense🚀 Kore — Killer Optimized Record Exchange
The fastest, most compressed columnar format for big data | v0.1.0
KORE is a high-performance binary file format optimized for analytical workloads. It provides:

38% compression ratio (vs 63% for Parquet)
131x query speedup with column pruning & predicate pushdown
Zero data loss verification (400K+ cells tested)
Native Spark integration — read/write with PySpark

Quick Start
Rust Library
Add this crate as a dependency (when published) or include from path:
use kore_fileformat::*;

// Write data
kore_write_simple("output.kore", schema_json, data_json)?;

// Read data
let data = kore_read_simple("output.kore")?;

// Read specific column
let col = kore_read_col_simple("output.kore", "column_name")?;

// Get file info
let info = kore_info_simple("output.kore")?;
PySpark Integration ⭐ NEW
from pyspark.sql import SparkSession
from kore import KoreDataFrameReader, KoreDataFrameWriter

spark = SparkSession.builder.appName("KoreExample").getOrCreate()

# Read Kore file
df = KoreDataFrameReader(spark).load("data.kore")

# Write to Kore (38% compression!)
KoreDataFrameWriter(df).mode("overwrite").save("output.kore")

# Spark SQL support (3.5+)
spark.read.format("kore").load("file.kore").show()
See python/README.md for full PySpark documentation.
Publishing checklist

Ensure Cargo.toml metadata is correct (authors, repository, keywords).
Add LICENSE file if required (MIT by default here).
Replace any unimplemented!() stubs with full implementations if you need runtime functionality.
Run cargo build --release and cargo test to verify compilation and tests.
Optionally add CI configuration (GitHub Actions) for cargo test and cargo clippy.

Notes
This workspace contains copies of the original KORE source files. Some long implementations were stubbed out in this initial export; if you want the full original source code included verbatim, I can replace the stubs with the complete implementations from the upstream project files.

About

No description, website, or topics provided.

Resources

Readme

License

View license

Uh oh!

There was an error while loading. Please reload this page.

Activity
Stars

9
stars
Watchers

0
watching
Forks

1
fork

Report repository

Releases
47

Release v1.2.9

Latest

May 29, 2026

+ 46 releases

Packages
0

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Python
90.7%

Makefile
4.1%

Rust
3.0%

PowerShell
1.3%

Java
0.4%

Scala
0.2%

Other
0.3%

Footer

Footer navigation

Terms

Privacy

Security

Status

Community

Docs

Contact

Manage cookies

Do not share my personal information

You can’t perform that action at this time.

Kore is presented as a high-performance binary file format specifically engineered to optimize analytical workloads for big data processing. The core advantages of the Kore format are centered on superior data compression and enhanced query efficiency. It achieves a 38 percent compression ratio when compared to formats like Parquet, offering significant space savings. Furthermore, the format is designed to accelerate data retrieval by enabling query speedups of up to 131 times through techniques such as column pruning and predicate pushdown. Data integrity is ensured through zero data loss verification, which has been tested across over four hundred thousand cells.

The architecture of Kore strongly emphasizes integration with the big data ecosystem. A key feature is its native integration with Apache Spark, allowing for direct reading and writing of Kore files using PySpark, which supports Spark SQL functionality starting from version 3.5. This native support means that data stored in the Kore format can be seamlessly utilized within the Spark environment.

The format provides a structured programming interface, exemplified by a Rust library that allows users to perform fundamental operations on Kore files, such as writing simple data, reading data, extracting specific columns, and retrieving file metadata. This indicates that Kore is designed not only for storage efficiency but also for efficient manipulation at the file format level.

The project also facilitates broader system integration through dedicated Python interfaces, specifically including KoreDataFrameReader and KoreDataFrameWriter, which enable the efficient transfer of dataframes into and out of the Kore system within Spark applications. The development process involves careful consideration of compilation, testing, and deployment, as indicated by the presence of detailed build reports, various test results, and Dockerfile configurations, demonstrating a commitment to providing robust and deployable solutions for this specialized data format. The repository structure suggests a comprehensive approach to format development, encompassing source code, testing protocols, and operational deployment guides.