LmCast :: Stay tuned in

Show HN: Gemini can now natively embed video, so I built sub-second video search

Recorded: March 25, 2026, 3 a.m.

Original Summarized

GitHub - ssrajadh/sentrysearch: Semantic search over videos using Gemini Embedding 2. · GitHub

Skip to content

Navigation Menu

Toggle navigation

Sign in

Appearance settings

PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersView all resourcesOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorGitHub StarsArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback


We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ssrajadh

/

sentrysearch

Public

Notifications
You must be signed in to change notification settings

Fork
9

Star
355

Code

Issues
0

Pull requests
1

Actions

Projects

Security
0

Insights

Additional navigation options

Code

Issues

Pull requests

Actions

Projects

Security

Insights


ssrajadh/sentrysearch

 masterBranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History23 Commits23 Commitsdocsdocs  natural-language-video-searchnatural-language-video-search  sentrysearchsentrysearch  .env.example.env.example  .gitignore.gitignore  README.mdREADME.md  pyproject.tomlpyproject.toml  View all filesRepository files navigationREADMESentrySearch
Semantic search over dashcam footage. Type what you're looking for, get a trimmed clip back.
ClawHub Skill

demo.mp4

How it works
SentrySearch splits your dashcam videos into overlapping chunks, embeds each chunk directly as video using Google's Gemini Embedding model, and stores the vectors in a local ChromaDB database. When you search, your text query is embedded into the same vector space and matched against the stored video embeddings. The top match is automatically trimmed from the original file and saved as a clip.
Getting Started

Clone and install:

git clone https://github.com/ssrajadh/sentrysearch.git
cd sentrysearch
python -m venv venv && source venv/bin/activate
pip install -e .

Set up your API key:

sentrysearch init
This prompts for your Gemini API key, writes it to .env, and validates it with a test embedding.

Index your footage:

sentrysearch index /path/to/dashcam/footage

Search:

sentrysearch search "red truck running a stop sign"
ffmpeg is required for video chunking and trimming. If you don't have it system-wide, the bundled imageio-ffmpeg is used automatically.

Manual setup: If you prefer not to use sentrysearch init, you can copy .env.example to .env and add your key from aistudio.google.com/apikey manually.

Usage
Init
$ sentrysearch init
Enter your Gemini API key (get one at https://aistudio.google.com/apikey): ****
Validating API key...
Setup complete. You're ready to go — run `sentrysearch index <directory>` to get started.
If a key is already configured, you'll be asked whether to overwrite it.
Index footage
$ sentrysearch index /path/to/dashcam/footage
Indexing file 1/3: front_2024-01-15_14-30.mp4 [chunk 1/4]
Indexing file 1/3: front_2024-01-15_14-30.mp4 [chunk 2/4]
...
Indexed 12 new chunks from 3 files. Total: 12 chunks from 3 files.
Options:

--chunk-duration 30 — seconds per chunk
--overlap 5 — overlap between chunks
--no-preprocess — skip downscaling/frame rate reduction (send raw chunks)
--target-resolution 480 — target height in pixels for preprocessing
--target-fps 5 — target frame rate for preprocessing
--no-skip-still — embed all chunks, even ones with no visual change

Search
$ sentrysearch search "red truck running a stop sign"
#1 [0.87] front_2024-01-15_14-30.mp4 @ 02:15-02:45
#2 [0.74] left_2024-01-15_14-30.mp4 @ 02:10-02:40
#3 [0.61] front_2024-01-20_09-15.mp4 @ 00:30-01:00

Saved clip: ./match_front_2024-01-15_14-30_02m15s-02m45s.mp4
Options: --results N, --output-dir DIR, --no-trim to skip auto-trimming.
Stats
$ sentrysearch stats
Total chunks: 47
Source files: 12
Verbose mode
Add --verbose to either command for debug info (embedding dimensions, API response times, similarity scores).
How is this possible?
Gemini Embedding 2 can natively embed video — raw video pixels are projected into the same 768-dimensional vector space as text queries. There's no transcription, no frame captioning, no text middleman. A text query like "red truck at a stop sign" is directly comparable to a 30-second video clip at the vector level. This is what makes sub-second semantic search over hours of footage practical.
Cost
Indexing 1 hour of footage costs ~$2.84 with Gemini's embedding API (default settings: 30s chunks, 5s overlap):

1 hour = 3,600 seconds of video = 3,600 frames processed by the model.
3,600 frames × $0.00079 = ~$2.84/hr

The Gemini API natively extracts and tokenizes exactly 1 frame per second from uploaded video, regardless of the file's actual frame rate. The preprocessing step (which downscales chunks to 480p at 5fps via ffmpeg) is a local/bandwidth optimization — it keeps payload sizes small so API requests are fast and don't timeout — but does not change the number of frames the API processes.
Two built-in optimizations help reduce costs in different ways:

Preprocessing (on by default) — chunks are downscaled to 480p at 5fps before uploading. Since the API processes at 1fps regardless, this only reduces upload size and transfer time, not the number of frames billed. It primarily improves speed and prevents request timeouts.
Still-frame skipping (on by default) — chunks with no meaningful visual change (e.g. a parked car) are skipped entirely. This saves real API calls and directly reduces cost. The savings depend on your footage — Sentry Mode recordings with hours of idle time benefit the most, while action-packed driving footage may have nothing to skip.

Search queries are negligible (text embedding only).
Tuning options:

--chunk-duration / --overlap — longer chunks with less overlap = fewer API calls = lower cost
--no-skip-still — embed every chunk even if nothing is happening
--target-resolution / --target-fps — adjust preprocessing quality
--no-preprocess — send raw chunks to the API

Limitations & Future Work

Still-frame detection is heuristic — it uses JPEG file size comparison across sampled frames. It may occasionally skip chunks with subtle motion or embed chunks that are truly static. Disable with --no-skip-still if you need every chunk indexed.
Search quality depends on chunk boundaries — if an event spans two chunks, the overlapping window helps but isn't perfect. Smarter chunking (e.g. scene detection) could improve this.
Gemini Embedding 2 is in preview — API behavior and pricing may change.

Compatibility
This works with any footage in mp4 format, not just Tesla Sentry Mode. The directory scanner recursively finds all .mp4 files regardless of folder structure.
Requirements

Python 3.10+
ffmpeg on PATH, or use bundled ffmpeg via imageio-ffmpeg (installed by default)
Gemini API key (get one free)

About

Semantic search over videos using Gemini Embedding 2.

Topics

video

gemini

semantic-search

dashcam

chromadb

gemini-embedding-2

Resources

Readme

Uh oh!

There was an error while loading. Please reload this page.


Activity
Stars

355
stars
Watchers

1
watching
Forks

9
forks

Report repository

Releases
No releases published

Packages
0

 

 

 

Uh oh!

There was an error while loading. Please reload this page.


Contributors
1

ssrajadh
Soham Rajadhyaksha

Languages

Python
100.0%

Footer

© 2026 GitHub, Inc.

Footer navigation

Terms

Privacy

Security

Status

Community

Docs

Contact

Manage cookies

Do not share my personal information

You can’t perform that action at this time.

Soham Rajadhyaksha’s `sentrysearch` project provides a semantic search solution for dashcam footage, leveraging the Gemini Embedding 2 model. The core functionality allows users to effectively search through hours of video data by typing a textual query, receiving a trimmed clip of the relevant footage. The system operates by initially dividing the dashcam footage into overlapping chunks, embedding each chunk using the Gemini Embedding 2 model, and storing these embeddings within a ChromaDB database. When a user initiates a search, the query is also embedded, and the system retrieves the most similar video embedding, subsequently extracting a relevant clip from the original footage.

The project’s setup involves cloning the repository, creating a Python virtual environment, installing dependencies, and initializing the `sentrysearch` tool with a Gemini API key. The index creation process categorizes the dashcam footage into specified chunks, utilizing options for adjusting parameters like chunk duration, overlap, and preprocessing settings – the latter involving downscaling the video to 480p at 5fps to optimize API requests and reduce costs. The system is designed to handle various file formats, specifically mp4 files, and automatically scans directories for all mp4 files, regardless of their location.

The `sentrysearch` tool incorporates several optimization strategies to minimize costs associated with utilizing the Gemini API. These include preprocessing, which downscales video chunks, and still-frame skipping, which intelligently avoids embedding completely static frames. Users are provided with options to control these settings, allowing them to tailor the system to their specific footage and desired level of accuracy. The cost of indexing 1 hour of dashcam footage is approximately $2.84, calculated based on the Gemini API’s pricing. The search queries themselves are considered negligible in terms of cost, owing to the use of text embedding only.

The technology behind `sentrysearch` relies on the native video embedding capability of Gemini Embedding 2, enabling direct comparisons between text queries and video pixels without the need for transcription or frame captioning. This enables sub-second semantic search over extensive video data. The system utilizes ffmpeg for video chunking and trimming, simplifying the process for users. However, the project acknowledges certain limitations, primarily related to the heuristic nature of still-frame detection and the potential for inaccuracies in search results based on chunk boundaries. Future development could focus on more sophisticated scene detection techniques to enhance search quality.

The project provides a command-line interface with parameters that control several facets of the operation including chunk duration, overlap, and preprocessing resolutions. The system is designed to be compatible with a wide range of dashcam footage, regardless of whether it is Tesla Sentry Mode or other sources. The project is currently in preview, and it's important to acknowledge that Gemini API behavior and pricing may evolve.