How I Built an AI Receptionist for a Luxury Mechanic Shop - Part 1
That LadyDev Home About Blog Newsletter ✨Code with AI!✨ More 🧠 AI Challenge Courses Links Work with Me! X Home About Remote Work Setup Workshops Blog Newsletter Work with Me! My Links How I Built an AI Receptionist for a Luxury Mechanic Shop - Part 1 Learn how I built an ai receptionist for my brother's mechanic shop Mar 20, 2026 #ai #coding #technical #tutorial My brother is a luxury mechanic shop owner, and he’s losing thousands of dollars per month because he misses hundreds of calls per week. He’s under the hood all day. The phone rings, he can’t answer, the customer hangs up and calls someone else. That’s a lost job — sometimes a $450 brake service, sometimes a $2,000 engine repair — just gone because no one picked up. So I’m building him an AI receptionist. I named it Axle — like a car axle — because of course I did. 😏 This isn’t a generic chatbot. It’s a custom-built voice agent that answers his phone, knows his exact prices, his hours, his policies, and can collect a callback when it doesn’t know something. To get this right requires a custom build, so first I scraped his website data, created a product requirements document (PRD), and scoped the project into a 3-part build.
Step 1: Building the Brain (RAG Pipeline) The first step was making sure the AI could actually answer questions accurately — without hallucinating prices or making things up. A raw LLM is dangerous here. If a customer asks “how much for brakes?” and the AI guesses $200 when the real answer is $450, that’s a broken expectation and a frustrated customer. The fix is Retrieval-Augmented Generation (RAG): instead of letting the model guess, you give it a knowledge base of real information and make it answer only from that. Here’s what I did: Scraped Dane’s website — I pulled his service pages and pricing into markdown files. From there I built a structured knowledge base covering 21+ documents: every service type, pricing, turnaround times, hours, payment methods, cancellation policies, warranty info, loaner vehicles, and what car makes he specializes in. Embedded the knowledge base into MongoDB Atlas — Each document gets converted into a 1024-dimensional vector using Voyage AI (voyage-3-large). These vectors capture the semantic meaning of each document, not just the keywords. They’re stored in MongoDB Atlas alongside the raw text, with an Atlas Vector Search index on the embedding field. Built the retrieval pipeline — When a customer asks a question, the query gets embedded using the same Voyage AI model and then run against the Atlas Vector Search index. It returns the top 3 most semantically similar documents — so “how much for a brake job?” correctly retrieves the brake service pricing doc even if those exact words don’t appear together. Wired up Claude for response generation — The retrieved documents get passed as context to Anthropic Claude (claude-sonnet-4-6) along with a strict system prompt: answer only from the knowledge base, keep responses short and conversational, and if you don’t know — say so and offer to take a message. No hallucinations allowed. By the end of Part 1, I could type a question in the terminal and get a grounded, accurate answer back. “How much is an oil change?” → “$45 for conventional, $75 for synthetic. Includes oil filter, fluid top-off, and tire pressure check. Takes about 30 minutes.”
💡 I’m learning as I go using the MongoDB AI Learning Hub — and you can too! Check it out to build your own AI Agents!
Step 2: Connecting It to a Real Phone Number Next I had to get this brain onto an actual phone line that customers could call. I chose Vapi as the voice platform. It handles everything on the telephony side: purchasing a phone number, speech-to-text (via Deepgram), text-to-speech (via ElevenLabs), and real-time function calling back to my server. The whole voice infrastructure is handled — I just needed to build the webhook it calls. Built a FastAPI webhook server — Every time a caller asks a question, Vapi sends a tool-calls request to my /webhook endpoint with the caller’s query. The server routes that to the RAG pipeline, gets a response from Claude, and sends it back to Vapi, which reads it aloud to the caller. The whole round trip has to be fast enough to feel like a natural conversation. Exposed it with Ngrok — During development, the server runs locally on port 8000. Ngrok punches a tunnel through to a public HTTPS URL, which I paste into the Vapi dashboard as the webhook endpoint. Vapi can now reach my local server in real time as calls come in. For production this would move to a cloud host, but for building and testing Ngrok gets the job done in two minutes. Configured the Vapi assistant — In the Vapi dashboard I set up the assistant with a greeting (“Hi, thanks for calling Dane’s Motorsport, how can I help you today?”), wired up two tools (answerQuestion for RAG-backed responses and saveCallback for collecting a name and number when a question can’t be answered), and pointed both at the webhook URL. Added conversation memory — Vapi sends the full conversation history with each request, so the RAG pipeline gets the prior turns as context. If a caller asks “what are your hours?” and then follows up with “and how much for a tire rotation?”, the AI handles both coherently. Logged every call to MongoDB — Each interaction gets stored in a calls collection: the caller’s number, the query, the AI’s response, whether it escalated to a human, and the timestamp. Callback requests from unknown questions go into a separate callbacks collection so Dane can follow up. This turns the phone system into a data asset — he can see what customers are asking most, when call volume spikes, and how often the AI hands off to a human.
Step 3: Tuning for Voice Then finally, the thing that took the most iteration: making it sound right. Text responses and voice responses are completely different. A response that reads fine on screen — with bullet points, dollar signs formatted as “$45.00”, or a sentence that starts with “Certainly!” — sounds awful when spoken aloud. I had to tune the system prompt specifically for voice delivery. Picked the right voice — Vapi integrates with ElevenLabs and gives you access to a huge library of AI voices. I went through about 20 of them, reading the same test script each time: a greeting, a price quote, an escalation. Most sounded either too robotic, too enthusiastic, or just wrong for a mechanic shop. I landed on Christopher — calm, natural, unhurried. The kind of voice that sounds like someone who actually knows cars. Getting this right mattered more than I expected; a great AI response delivered in the wrong voice still feels off. Rewrote the system prompt for voice — Short sentences. No markdown. No filler phrases like “Great question!” or “Certainly!”. Prices spoken naturally (“forty-five dollars” instead of “$45”). Responses capped at 2–4 sentences max. The goal is to sound like a knowledgeable, friendly human — not a chatbot reading a webpage. Tested the escalation flow — When a caller asks something that isn’t in the knowledge base, the AI doesn’t guess. It tells the caller it doesn’t have that information, asks for their name and a good callback number, and saves that to MongoDB. Dane gets a list of callbacks to return — no lost leads. Wrote integration tests — I built a test suite covering the RAG pipeline, the webhook handler, and the full end-to-end flow. This was especially important for catching edge cases: what happens when Vapi sends a malformed request, what happens when the vector search returns no results above the confidence threshold, what happens when the caller doesn’t leave a callback number.
The Stack Here’s everything wired together:
Vapi (with Deepgram & ElevenLabs integration) — phone number, speech-to-text, text-to-speech, tool calling Ngrok — local development tunnel (Vapi → localhost) FastAPI + Uvicorn — webhook server MongoDB Atlas — knowledge base storage, vector search, call logs, callback queue Voyage AI (voyage-3-large) — text embeddings for semantic retrieval Anthropic Claude (claude-sonnet-4-6) — response generation, grounded in the knowledge base Python — everything glued together with pymongo, voyageai, anthropic, fastapi Copilot CLI - for the actual build!
What’s Next Right now the AI answers questions and takes callbacks. The next phase has a few pieces: connecting it to a real calendar so it can book appointments directly during the call; adding text message notifications so Dane gets an instant SMS whenever a new callback comes in; building a simple dashboard so he can see and manage all his pending callbacks in one place; locking down the security for production robustness; deploying to Railway so it runs on a persistent public URL; and then handing it over for him to actually use with real customers. Dane misses 100+ calls a week. Each missed call is a potential job. Some of those jobs are $50, some are $2,000. This system runs 24/7, never puts someone on hold, and knows every price and policy as well as he does. The build took three focused sprints. The hardest part wasn’t the code — it was getting the voice tone right so it actually sounds like someone who works at a mechanic shop and not a Silicon Valley startup. If you’re building something similar, the core insight is this: don’t use a raw LLM for a business-specific voice agent. Ground it in a real knowledge base, constrain it to only answer from that base, and design the fallback flow before anything else. The escalation path is not an edge case — it’s a core feature.
This post was written with the assistance of AI. Related Posts: 30 Security Tips Every Vibe Coder Needs to Know 10 Vibe Coding Apps You’ve Never Heard Of (But Need To Try!) 5 FREE AI Courses to Level Up Your Skills Written by Kedasha Kerr Software Developerin Chicago Let's Connect: I write about building with AI. Get the next post delivered to your inbox. Subscribe © 2026 That LadyDev®. All rights reserved. Made with ❤️ by Kedasha |
This document details the construction of an AI receptionist, named “Axle,” for a luxury mechanic shop owned by Dane, addressing a significant operational challenge – lost revenue due to missed phone calls. The project, undertaken by That LadyDev, utilized a Retrieval-Augmented Generation (RAG) pipeline to provide accurate and contextually relevant responses to customer inquiries. The initial build, spanning three development sprints, focused on delivering a functional and reliable system.
The project’s core architecture begins with a RAG pipeline, employing Voyage AI (voyage-3-large) to generate vector embeddings from Dane's website data—over 21 documents encompassing service types, pricing, hours, policies, and vehicle specialties. These embeddings were stored in MongoDB Atlas alongside the original text, creating an efficient semantic search index. Subsequently, Anthropic Claude (claude-sonnet-4-6) leverages these embeddings to generate responses, constrained by a system prompt emphasizing accuracy, succinctness, and a clear escalation path when information isn’t readily available. This system avoids hallucination and prioritizes grounded answers. The initial prototype demonstrated the ability to respond accurately to queries such as “How much is an oil change?” providing a precise quote.
Integration of the AI with a real phone line was achieved through Vapi, a telephony platform that handles speech-to-text via Deepgram, text-to-speech via ElevenLabs, and real-time function calling. The implementation utilizes a FastAPI webhook server to handle incoming calls, routing queries through the RAG pipeline and sending the resultant response back to Vapi for audio playback. Notably, Vapi’s capacity to maintain conversation history—recording full dialogue turns—allowed the Claude model to respond coherently to follow-up questions, enhancing the overall user experience. The system’s log data, stored in MongoDB, captures detailed interaction records including caller information, query details, AI responses, and escalation actions. This data provides valuable insights into customer inquiries and operational efficiency.
The final stage involves optimizing the voice interaction. That LadyDev specifically addressed the critical distinction between text-based responses and voice delivery, meticulously tailoring the Claude system prompt for voice-specific requirements. This included employing a natural-sounding voice from ElevenLabs (Christopher), minimizing redundant phrasing, and capping responses to 2-4 sentences. This careful tuning ensured the AI sounded like a skilled and trustworthy mechanic, not a generic chatbot. Moreover, a robust escalation flow was implemented, allowing the AI to gracefully hand off unanswered queries, capturing callback details for immediate follow-up. The project's success hinges on thorough testing, incorporating integration tests to identify and mitigate potential issues within the RAG pipeline, webhook handler, and end-to-end flow.
The technology stack includes Vapi, Ngrok, FastAPI, Uvicorn, MongoDB Atlas, Voyage AI, Anthropic Claude, Python, and Copilot CLI. The next phase of development encompasses integrating the AI with a calendar for appointment scheduling, implementing SMS notifications for callback alerts, building a dashboard for callback management, enhancing security measures, and ultimately deploying the system on a production platform. This project is driven by the goal of maximizing efficiency and revenue generation for the mechanic shop by eliminating missed calls and providing an immediate and knowledgeable customer service solution. |