Show HN: Nano PDF – A CLI Tool to Edit PDFs with Gemini's Nano Banana

Recorded: Nov. 30, 2025, 1:05 a.m.

Original

Summarized

GitHub - gavrielc/Nano-PDF: Edit PDF files with Nano Banana

Navigation Menu

Toggle navigation

Appearance settings

PlatformAI CODE CREATIONGitHub CopilotWrite better code with AIGitHub SparkBuild and deploy intelligent appsGitHub ModelsManage and compare promptsMCP RegistryNewIntegrate external toolsDEVELOPER WORKFLOWSActionsAutomate any workflowCodespacesInstant dev environmentsIssuesPlan and track workCode ReviewManage code changesAPPLICATION SECURITYGitHub Advanced SecurityFind and fix vulnerabilitiesCode securitySecure your code as you buildSecret protectionStop leaks before they startEXPLOREWhy GitHubDocumentationBlogChangelogMarketplaceView all featuresSolutionsBY COMPANY SIZEEnterprisesSmall and medium teamsStartupsNonprofitsBY USE CASEApp ModernizationDevSecOpsDevOpsCI/CDView all use casesBY INDUSTRYHealthcareFinancial servicesManufacturingGovernmentView all industriesView all solutionsResourcesEXPLORE BY TOPICAISoftware DevelopmentDevOpsSecurityView all topicsEXPLORE BY TYPECustomer storiesEvents & webinarsEbooks & reportsBusiness insightsGitHub SkillsSUPPORT & SERVICESDocumentationCustomer supportCommunity forumTrust centerPartnersOpen SourceCOMMUNITYGitHub SponsorsFund open source developersPROGRAMSSecurity LabMaintainer CommunityAcceleratorArchive ProgramREPOSITORIESTopicsTrendingCollectionsEnterpriseENTERPRISE SOLUTIONSEnterprise platformAI-powered developer platformAVAILABLE ADD-ONSGitHub Advanced SecurityEnterprise-grade security featuresCopilot for BusinessEnterprise-grade AI featuresPremium SupportEnterprise-grade 24/7 supportPricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

gavrielc

/

Nano-PDF

Public

Notifications
You must be signed in to change notification settings

Fork
5

Star
107

Edit PDF files with Nano Banana

License

MIT license

107
stars

5
forks

Branches

Tags

Activity

Star

Notifications
You must be signed in to change notification settings

Code

Issues
0

Pull requests
0

Actions

Projects
0

Security

Uh oh!

There was an error while loading. Please reload this page.

Insights

Additional navigation options

Code

Issues

Pull requests

Actions

Projects

Security

Insights

gavrielc/Nano-PDF

mainBranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History7 Commits.github.github assetsassets nano_pdfnano_pdf .gitignore.gitignore CHANGELOG.mdCHANGELOG.md CODE_OF_CONDUCT.mdCODE_OF_CONDUCT.md CONTRIBUTING.mdCONTRIBUTING.md LICENSELICENSE README.mdREADME.md pyproject.tomlpyproject.toml View all filesRepository files navigationREADMECode of conductContributingMIT license

Nano PDF Editor

A CLI tool to edit PDF slides using natural language prompts, powered by Google's Gemini 3 Pro Image ("Nano Banana") model.
Features

Natural Language Editing: "Update the graph to include data from 2025", "Change the chart to a bar graph".
Add New Slides: Generate entirely new slides that match your deck's visual style.
Non-Destructive: Preserves the searchable text layer of your PDF using OCR re-hydration.
Multi-page & Parallel: Edit multiple pages in a single command with concurrent processing.

How It Works
Nano PDF uses Gemini 3 Pro Image (aka Nano Banana) and PDF manipulation to enable quick edits of PDFs with natural language editing:

Page Rendering: Converts target PDF pages to images using Poppler
Style References: Optionally includes style reference pages with generation request to understand visual style (fonts, colors, layout)
AI Generation: Sends images + prompts to Gemini 3 Pro Image, which generates edited versions
OCR Re-hydration: Uses Tesseract to restore searchable text layer to generated images
PDF Stitching: Replaces original pages with AI-edited versions while preserving document structure

The tool processes multiple pages in parallel for speed, with configurable resolution (4K/2K/1K) to balance quality vs. cost.
Installation
pip install nano-pdf
Configuration
You need a paid Google Gemini API key with billing enabled. Free tier keys do not support image generation.

Get an API key from Google AI Studio
Enable billing on your Google Cloud project
Set it as an environment variable:

export GEMINI_API_KEY="your_api_key_here"
Note: This tool uses Gemini 3 Pro Image which requires a paid API tier. See pricing for details.
Usage
Basic Edit
Edit a single page (e.g., Page 2):
nano-pdf edit my_deck.pdf 2 "Change the title to 'Q3 Results'"
Multi-page Edit
Edit multiple pages in one go:
nano-pdf edit my_deck.pdf \
1 "Update date to Oct 2025" \
5 "Add company logo" \
10 "Fix typo in footer"
Add New Slides
Insert a new AI-generated slide into your deck:
# Add a title slide at the beginning
nano-pdf add my_deck.pdf 0 "Title slide with 'Q3 2025 Review'"

# Add a slide after page 5
nano-pdf add my_deck.pdf 5 "Summary slide with key takeaways as bullet points"
The new slide will automatically match the visual style of your existing slides and uses document context by default for better relevance.
Options

--use-context / --no-use-context: Include the full text of the PDF as context for the model. Disabled by default for edit, enabled by default for add. Use --no-use-context to disable.
--style-refs "1,5": Manually specify which pages to use as style references.
--output "new.pdf": Specify the output filename.
--resolution "4K": Image resolution - "4K" (default), "2K", or "1K". Higher quality = slower processing.
--disable-google-search: Prevents the model from using Google Search to find information before generating (enabled by default).

Examples
Fixing Presentation Errors
# Fix typos across multiple slides
nano-pdf edit pitch_deck.pdf \
3 "Fix the typo 'recieve' to 'receive'" \
7 "Change 'Q4 2024' to 'Q1 2025'"
Visual Design Changes
# Update branding and colors
nano-pdf edit slides.pdf 1 "Make the header background blue and text white" \
--style-refs "2,3" --output branded_slides.pdf
Content Updates
# Update financial data
nano-pdf edit report.pdf 12 "Update the revenue chart to show Q3 at $2.5M instead of $2.1M"
Batch Processing with Context
# Use full document context for consistency
nano-pdf edit presentation.pdf \
5 "Update the chart colors to match the theme" \
8 "Add the company logo in the bottom right" \
--use-context
Adding New Slides
# Add a new agenda slide at the beginning
nano-pdf add quarterly_report.pdf 0 "Agenda slide with: Overview, Financial Results, Q4 Outlook"
Using Google Search
# Google Search is enabled by default - the model can look up current information
nano-pdf edit deck.pdf 5 "Update the market share data to latest figures"

# Disable Google Search if you want the model to only use provided context
nano-pdf add deck.pdf 3 "Add a summary slide" --disable-google-search
Requirements

Python 3.10+
poppler (for PDF rendering)
tesseract (for OCR)

System Dependencies
macOS
brew install poppler tesseract
Windows
choco install poppler tesseract
Note: After installation, you may need to restart your terminal or add the installation directory to your PATH.
Linux (Ubuntu/Debian)
sudo apt-get install poppler-utils tesseract-ocr
Troubleshooting
"Missing system dependencies" error
Make sure you've installed poppler and tesseract for your platform. After installation, restart your terminal to refresh PATH. Run which pdftotext and which tesseract to verify they're accessible.
"GEMINI_API_KEY not found" error
Set your API key as an environment variable:
export GEMINI_API_KEY="your_key_here"
"Gemini API Error: PAID API key required" error
Gemini 3 Pro Image requires a paid API tier. Visit Google AI Studio to enable billing on your project.
Generated images don't match the style
Try using --style-refs to specify reference pages that have the desired visual style. The model will analyze these pages to better match fonts, colors, and layout.
Text layer is missing or incorrect after editing
The tool uses Tesseract OCR to restore searchable text. For best results, ensure your generated images are high resolution (--resolution "4K"). Note that OCR may not be perfect for stylized fonts or small text.
Pages are processing slowly

Use --resolution "2K" or --resolution "1K" for faster processing

Running from Source
If you want to run the latest development version:
# Clone the repository
git clone https://github.com/gavrielc/Nano-PDF.git
cd Nano-PDF

# Install dependencies
pip install -e .

# Run the tool
nano-pdf edit my_deck.pdf 2 "Your edit here"
License
MIT

About

Edit PDF files with Nano Banana

Resources

Readme

License

MIT license

Code of conduct

Contributing

Uh oh!

There was an error while loading. Please reload this page.

Activity
Stars

107
stars
Watchers

1
watching
Forks

5
forks

Report repository

Releases
No releases published

Packages
0

No packages published

Contributors
2

gavrielc

claude
Claude

Languages

Python
100.0%

Footer

Footer navigation

Terms

Privacy

Security

Status

Community

Docs

Contact

Manage cookies

Do not share my personal information

You can’t perform that action at this time.

Nano PDF is a compelling CLI tool designed to streamline PDF editing using natural language prompts, powered by Google’s Gemini 3 Pro Image (dubbed “Nano Banana”). This tool provides a unique approach to modifying PDF slides, leveraging AI to understand and execute editing requests. The core functionality centers around a multi-stage process: page rendering via Poppler, optional style reference analysis, AI-driven generation, OCR re-hydration for text, and PDF stitching to preserve document integrity.

The application’s strengths lie in its intuitive user interface – primarily through command-line arguments – and its ability to perform batch edits. The system supports a variety of operations, including updating slide content, adding entirely new slides, and making visual design changes, all by simply providing a natural language prompt. Notably, the tool incorporates advanced features such as style reference pages that allow the AI to learn and mimic the original slide’s aesthetic. Furthermore, the system’s concurrent processing of multiple pages significantly reduces editing time.

However, the tool’s reliance on a paid Google Gemini API key is a critical requirement, emphasizing the tool's accessibility is directly tied to the user’s access to this service. The tool’s workflow includes several dependencies that need to be installed and configured, namely Poppler and Tesseract, along with Python 3.10+. The user needs to ensure they have a good understanding of their environment, as troubleshooting often involves checking system dependencies, like verifying the PATH environment variable.

The user experience is further influenced by the tool’s parameters, which offer granular control over the editing process. Options like `--resolution` (4K, 2K, or 1K), `--use-context`, and `--disable-google-search` allow users to balance image quality with processing speed and control the level of context provided to the AI. Although not explicitly detailed in the documentation, the developer, gavrielc, has provided a Code of Conduct and contributing guidelines which aim to maintain a positive and collaborative community.

The tool’s architecture can be viewed as modular, with distinct components handling page rendering, style reference analysis, AI generation, and OCR re-hydration. This separation facilitates future development and potential enhancements. For example, future development could explore ways to improve OCR accuracy or develop more sophisticated style reference matching. It's important to note that while the documentation details a streamlined workflow, the actual performance depends significantly on the specified parameters and quality of the underlying Google Gemini API. The provided “Troubleshooting” section highlights common issues like missing dependencies and API key errors, underscoring the need for careful setup and configuration while the developer intends on addressing and expanding the tool’s capabilities, it is currently a beta or development tool that requires some level of expertise to operate optimally.