The actual tradeoff
Personalized video for dealership outbound has been stuck in the same hole for a decade. Manual recording works, but rep time is the ceiling. Templates scale, but customers stopped opening them. Anyone trying to fix this had to pick a side. Generate the face, and you get scale. Hold the face, and you preserve trust. Nobody gets both for free.
Covideo, which has been in dealership video for 20+ years and serves around 3,500 rooftops, picked scale. Their newer AI Video Agent generates a synthetic face from a small library of named personas, lets the dealer script a message, and ships in minutes. No recording step. Zero salesperson coordination. The on-screen face is computed from a model, not pulled off the sales floor. That is a defensible engineering choice. For a vendor with 3,500 dealers to serve, it is arguably the only choice.
VoxRefine picked the other side. We leave the video alone. The face on screen is the actual salesperson, recorded once on real footage. AI is only used on the audio segments that change per customer: the name, the specific vehicle, the appointment time, spoken in a clone of that same rep's voice. The bet is that for dealerships specifically, the face has to match across the entire journey, and that requires holding the video constant.
What dealership customer behavior tells us
The thing that makes automotive different from almost every other industry that uses video outbound is this: the customer physically shows up. They drive to a parking lot, walk through a glass door, and scan the showroom for someone they recognize. Five seconds between the front door and the first handshake. That is the window everything before it was building toward.
Foureyes data on dealership lead behavior shows the obvious version of this: leads who get a personal touch from the assigned salesperson before the in-store visit convert better than leads who get a generic-feeling touch. Strolid and other BDC-focused groups have been saying the same thing for years. None of this is novel. What is changing is that the customer's read on what counts as a personal touch has tightened. A synthetic face used to feel futuristic. Now it feels like automated marketing. The bar moved.
In our R&D work with Premier Automotive, the consistent pattern is that the appointment-confirmation video is the send that does the most work. Get that right and show-rate moves. Get it wrong and the rest of the funnel cannot recover. The send that decides whether the customer walks in the door is the one place a dealership cannot afford to break face continuity.
The three approaches, side by side
Two synthetic-avatar options and the real-face option, on the dimensions a GM actually decides on.
| Approach | What's synthetic | Face continuity to in-store | Best for |
|---|---|---|---|
| VoxRefine (real face + cloned voice) | Audio only — name, vehicle, appointment time | Preserved. Customer meets the same face from the video. | Appointment confirmations, no-show follow-ups, service reminders, equity mining |
| Covideo AI Video Agent | Face and voice — named AI personas (Megan, Laura) | Broken. Customer meets someone different at the door. | Recall notices, hours changes, generic announcements |
| Synthesia, HeyGen, similar avatar tools | Face and voice — fully generated speaker | Broken. Generic spokesperson, not dealer staff. | Training videos, internal comms, contexts with no in-person handshake |
Where AI avatars are the right call
The honest version of this argument has to acknowledge where synthetic avatars win, because they genuinely do win in plenty of places. Anywhere the on-screen face is never going to physically meet the viewer, the avatar tradeoff inverts.
Training videos are the cleanest example. A 12-module compliance course narrated by a synthetic spokesperson is fine. Nobody is going to look for that person in a hallway later. Corporate internal comms, multi-language localization at scale, generic awareness ads where the face is decoration, support knowledge base walkthroughs, product explainer overlays inside a SaaS app — every one of these is a context where face continuity does not exist as a requirement in the first place. Synthesia and HeyGen built large businesses serving exactly those use cases. Covideo moving into avatars is a reasonable expansion into that same shelf.
The argument here is narrower than "avatars are bad." It is "avatars are wrong for the specific dealership sends that decide whether a customer walks in the door."
What is actually behind the customer trust shift in 2026
Two things are happening at once. First, consumer AI literacy is up sharply since 2023. Most adults have now interacted with a chatbot, an AI photo filter, and a synthetic-voice phone tree. Pattern recognition for "this is generated" happens faster than it did even 18 months ago. Second, synthetic-content fatigue is rising in the same population. The same people who were impressed by an AI demo two years ago are now annoyed by the third synthetic customer-service video this week.
The exact rate of either shift is hard to nail down — every vendor publishing a number has a horse in the race, this one included — but the directional trend is consistent across the industry surveys that do exist. Treat the specifics as ranges. The qualitative point holds regardless of the decimal: the bar for what reads as a personal touch is moving, and it is moving against synthetic faces.
A note on Covideo's strategy specifically
Covideo expanded into AI avatars because the market asked for it. Their core manual-record product is solid; it has been the category default in dealership video for two decades for good reason. But manual record bottlenecks on rep time, and customer demand for at-scale video kept climbing. From a product perspective, they had two ways to scale: build a real-face plus cloned voice pipeline (technically harder, narrower category) or license a generative avatar stack (faster to ship, broader use cases including non-automotive). They picked the second. For a company with 3,500 dealers and an addressable market beyond automotive, that is a reasonable call.
We picked the first because we are not trying to be a general-purpose video platform. VoxRefine exists to fix one specific problem for one specific industry: scaled outbound video for car dealerships where the face on screen is the face the customer is about to meet. Different scope, different constraint, different answer. Both can be right.