LmCast :: Stay tuned in

Kore: Binary File Format Optimized for Modern Data Systems (Open Source)

Recorded: May 30, 2026, 10:02 p.m.

Original Summarized

GitHub - arunkatherashala/Kore · GitHub

Skip to content

Navigation Menu

Toggle navigation

Sign in

Appearance settings

PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersView all resourcesOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorGitHub StarsArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback


We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

arunkatherashala

/

Kore

Public

Notifications
You must be signed in to change notification settings

Fork
1

Star
9

Code

Issues
0

Pull requests
0

Actions

Projects

Security and quality
0

Insights

Additional navigation options

Code

Issues

Pull requests

Actions

Projects

Security and quality

Insights


arunkatherashala/Kore

 release/v0.1.0BranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History93 Commits93 Commits.github/workflows.github/workflows  .venv.venv  build/lib/language-bindingsbuild/lib/language-bindings  cloud-connectorscloud-connectors  hadoophadoop  kore-binary-parserkore-binary-parser  kore_fileformat.egg-infokore_fileformat.egg-info  kore_fileformatkore_fileformat  kore_fileformat_killerkore_fileformat_killer  language-bindingslanguage-bindings  pythonpython  query-optimizationquery-optimization  rust-bindingsrust-bindings  spark-scalaspark-scala  srcsrc  toolstools  .GITHUB_ACTIONS_SETUP.md.GITHUB_ACTIONS_SETUP.md  .gitignore.gitignore  .yamllint.yamllint  ALL_PHASES_COMPLETE.mdALL_PHASES_COMPLETE.md  CHANGELOG.mdCHANGELOG.md  COMPILATION_REPORT.mdCOMPILATION_REPORT.md  Cargo.lockCargo.lock  Cargo.tomlCargo.toml  DELIVERABLES.mdDELIVERABLES.md  DEPLOYMENT_COMPLETE.mdDEPLOYMENT_COMPLETE.md  DEPLOYMENT_GUIDE.mdDEPLOYMENT_GUIDE.md  DEPLOYMENT_MANIFEST.mdDEPLOYMENT_MANIFEST.md  DETAILED_SETUP.mdDETAILED_SETUP.md  DockerfileDockerfile  FINAL_TEST_REPORT.mdFINAL_TEST_REPORT.md  IMPLEMENTATION_WAVE_2_COMPLETE.mdIMPLEMENTATION_WAVE_2_COMPLETE.md  INSTALL_MISSING_TOOLS.mdINSTALL_MISSING_TOOLS.md  LICENSELICENSE  MANIFEST.inMANIFEST.in  MAVEN_CENTRAL_GPG_SETUP.mdMAVEN_CENTRAL_GPG_SETUP.md  PHASE2_IMPLEMENTATION.mdPHASE2_IMPLEMENTATION.md  PHASES_2_7_PARALLEL_IMPLEMENTATION.mdPHASES_2_7_PARALLEL_IMPLEMENTATION.md  PHASES_STATUS.mdPHASES_STATUS.md  PRODUCTION_STATUS.mdPRODUCTION_STATUS.md  QUICK_GPG_SETUP.mdQUICK_GPG_SETUP.md  QUICK_START.mdQUICK_START.md  README.mdREADME.md  README_FINAL.mdREADME_FINAL.md  RELEASE_NOTES.mdRELEASE_NOTES.md  SETUP_DOCKER_MAVEN.mdSETUP_DOCKER_MAVEN.md  SPARK_HADOOP_INTEGRATION_PLAN.mdSPARK_HADOOP_INTEGRATION_PLAN.md  SPARK_INTEGRATION_DELIVERY.mdSPARK_INTEGRATION_DELIVERY.md  TEST_RESULTS.mdTEST_RESULTS.md  cargo_build_output.txtcargo_build_output.txt  deploy_all_platforms.ps1deploy_all_platforms.ps1  docker_log.txtdocker_log.txt  docker_log2.txtdocker_log2.txt  docker_log3.txtdocker_log3.txt  docker_log4.txtdocker_log4.txt  generate-gpg-key.ps1generate-gpg-key.ps1  generate-gpg-key.shgenerate-gpg-key.sh  ghcr_log.txtghcr_log.txt  h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256  h .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256).Hashh .kore_fileformat_source_v0.1.0.zip -Algorithm SHA256).Hash  inspect_wheel.pyinspect_wheel.py  integration_tests.ps1integration_tests.ps1  killer_err.txtkiller_err.txt  killer_out.txtkiller_out.txt  kore-private-key.asckore-private-key.asc  kore-public-key.asckore-public-key.asc  kore_builtin_regression.killerkore_builtin_regression.killer  kore_builtin_regression_test.korekore_builtin_regression_test.kore  kore_fileformat.killerkore_fileformat.killer  kore_fileformat_copy_test.korekore_fileformat_copy_test.kore  kore_fileformat_test.korekore_fileformat_test.kore  latest_maven_log.txtlatest_maven_log.txt  maven_full_log.txtmaven_full_log.txt  maven_log.txtmaven_log.txt  pyproject.tomlpyproject.toml  runlog.txtrunlog.txt  sample_10mb.csvsample_10mb.csv  sample_builtin_out.koresample_builtin_out.kore  temp_log.txttemp_log.txt  temp_run.txttemp_run.txt  test_all_phases.ps1test_all_phases.ps1  test_suite.ps1test_suite.ps1  warnings_build.txtwarnings_build.txt  warnings_build_utf8.txtwarnings_build_utf8.txt  workflow_log.txtworkflow_log.txt  View all filesRepository files navigationREADMELicense🚀 Kore — Killer Optimized Record Exchange
The fastest, most compressed columnar format for big data | v0.1.0
KORE is a high-performance binary file format optimized for analytical workloads. It provides:

38% compression ratio (vs 63% for Parquet)
131x query speedup with column pruning & predicate pushdown
Zero data loss verification (400K+ cells tested)
Native Spark integration — read/write with PySpark

Quick Start
Rust Library
Add this crate as a dependency (when published) or include from path:
use kore_fileformat::*;

// Write data
kore_write_simple("output.kore", schema_json, data_json)?;

// Read data
let data = kore_read_simple("output.kore")?;

// Read specific column
let col = kore_read_col_simple("output.kore", "column_name")?;

// Get file info
let info = kore_info_simple("output.kore")?;
PySpark Integration ⭐ NEW
from pyspark.sql import SparkSession
from kore import KoreDataFrameReader, KoreDataFrameWriter

spark = SparkSession.builder.appName("KoreExample").getOrCreate()

# Read Kore file
df = KoreDataFrameReader(spark).load("data.kore")

# Write to Kore (38% compression!)
KoreDataFrameWriter(df).mode("overwrite").save("output.kore")

# Spark SQL support (3.5+)
spark.read.format("kore").load("file.kore").show()
See python/README.md for full PySpark documentation.
Publishing checklist

Ensure Cargo.toml metadata is correct (authors, repository, keywords).
Add LICENSE file if required (MIT by default here).
Replace any unimplemented!() stubs with full implementations if you need runtime functionality.
Run cargo build --release and cargo test to verify compilation and tests.
Optionally add CI configuration (GitHub Actions) for cargo test and cargo clippy.

Notes
This workspace contains copies of the original KORE source files. Some long implementations were stubbed out in this initial export; if you want the full original source code included verbatim, I can replace the stubs with the complete implementations from the upstream project files.

About

No description, website, or topics provided.

Resources

Readme

License

View license

Uh oh!

There was an error while loading. Please reload this page.


Activity
Stars

9
stars
Watchers

0
watching
Forks

1
fork

Report repository

Releases
47

Release v1.2.9

Latest

May 29, 2026


+ 46 releases

Packages
0

 

 

 

Uh oh!

There was an error while loading. Please reload this page.


Contributors

Uh oh!

There was an error while loading. Please reload this page.


Languages

Python
90.7%

Makefile
4.1%

Rust
3.0%

PowerShell
1.3%

Java
0.4%

Scala
0.2%

Other
0.3%

Footer

© 2026 GitHub, Inc.

Footer navigation

Terms

Privacy

Security

Status

Community

Docs

Contact

Manage cookies

Do not share my personal information

You can’t perform that action at this time.

Kore is presented as a high-performance binary file format specifically engineered to optimize analytical workloads for big data processing. The core advantages of the Kore format are centered on superior data compression and enhanced query efficiency. It achieves a 38 percent compression ratio when compared to formats like Parquet, offering significant space savings. Furthermore, the format is designed to accelerate data retrieval by enabling query speedups of up to 131 times through techniques such as column pruning and predicate pushdown. Data integrity is ensured through zero data loss verification, which has been tested across over four hundred thousand cells.

The architecture of Kore strongly emphasizes integration with the big data ecosystem. A key feature is its native integration with Apache Spark, allowing for direct reading and writing of Kore files using PySpark, which supports Spark SQL functionality starting from version 3.5. This native support means that data stored in the Kore format can be seamlessly utilized within the Spark environment.

The format provides a structured programming interface, exemplified by a Rust library that allows users to perform fundamental operations on Kore files, such as writing simple data, reading data, extracting specific columns, and retrieving file metadata. This indicates that Kore is designed not only for storage efficiency but also for efficient manipulation at the file format level.

The project also facilitates broader system integration through dedicated Python interfaces, specifically including KoreDataFrameReader and KoreDataFrameWriter, which enable the efficient transfer of dataframes into and out of the Kore system within Spark applications. The development process involves careful consideration of compilation, testing, and deployment, as indicated by the presence of detailed build reports, various test results, and Dockerfile configurations, demonstrating a commitment to providing robust and deployable solutions for this specialized data format. The repository structure suggests a comprehensive approach to format development, encompassing source code, testing protocols, and operational deployment guides.