BACK TO BUYING & COST
PILLAR 02 · BUYING & COST

The hidden costs of traditional audio guide production.

The line items that don't show up in an audio guide vendor's quote — re-booked voice talent, studio re-engineering for mid-tour edits, handset attrition, per-language requoting, and content amortized over short exhibition runs.

ERIC DUFFY·FOUNDER·10 MIN READ·UPDATED 2026-05-29

The cost of a traditional museum audio guide is rarely what's on the quote. The quote covers production: scripting, voice talent, studio time, editing, mastering, and a per-language multiplier. What it doesn't cover is what happens next — the corrections, the new language a board member asks for, the handsets that don't come back from a school group, the temporary exhibition you can't justify recording for. Those costs are real, they recur every year, and they're the reason five-year totals on traditional guides routinely come in 30–60% above what was approved in year one.

I run Convo, a phone-based platform that competes against this model, so I'm not a neutral observer. But the line items below are the ones that show up in customer discovery calls — almost always after a museum has already lived through a traditional production cycle and seen the bill arrive in installments. This piece is the unpacking I wish someone had handed me before my first RFP.

What's actually missing from the quote?

A traditional audio guide quote scopes production — the work that ends on opening day. Everything that happens after opening day is a separate cost, and most of the five-year total lives there. The vendor's quote will itemize scriptwriting per minute, voice talent per finished hour, studio time per hour, editing, sound design, mastering, and a translation line per additional language. What it generally doesn't itemize: the studio re-booking minimum for the first correction, the voice-talent re-engagement fee when an actor is no longer available, handset attrition at low-double-digit-percent rates per year, the per-language requote when you add Mandarin in year two, and the amortization math when the tour covers a temporary exhibition.

These aren't hidden because vendors are dishonest. They're hidden because the quote is a production budget, not an operating budget — and a museum audio guide is an operating asset, not a one-time deliverable.

What does a voice-talent re-booking actually cost?

Re-booking a voice actor for a single-stop correction routinely costs more than the original recording of that stop, because you pay the engagement floor regardless of how short the work is. Voice talent is typically booked at the $200–$275 per-finished-hour audiobook floor (Voice Over Resource Guide, 2024), with a minimum engagement that is almost never less than a full session. A correction that's thirty seconds of finished audio still incurs the minimum, plus studio time, plus an editor.

The harder problem is availability. The actor who recorded your tour in 2024 has a calendar; six months later, that calendar may not have you on it, and a different actor introduces a voice mismatch that no amount of re-mastering fixes. Most museums I've talked to have at least one stop they know is wrong — a misattributed work, a deaccessioned object, a curator quote that no longer reflects the institution's position — and have not corrected it, because the math of correcting one stop didn't pencil.

That's the real hidden cost of voice re-booking: the corrections that don't happen.

Why does mid-tour editing require studio re-engineering?

A traditional audio tour is a mixed and mastered audio program; editing a single stop means re-engineering the program, not editing a file. When a studio cuts the original tour, they sound-design to a target loudness, EQ to a target frequency curve, and master to a target dynamic range. The result is a coherent listening experience across stops. Drop a new recording into the middle of that without matching its loudness, EQ, ambience, and mastering chain and the visitor hears the seam — a sudden room-tone change, a different vocal proximity, a level mismatch their phone or handset auto-corrects awkwardly.

Re-engineering one stop to match the rest of the mix is usually a half-day of studio work minimum: matching mics if the original session notes survive, re-creating the original mastering chain, matching ambient noise floor, and QC against adjacent stops. Vendors I've talked to charge a flat re-engineering fee in the $800–$2,500 range for any single-stop edit — independent of how short the edit is. Multiply that by the handful of corrections a typical multi-year tour accumulates, and a "small content refresh" becomes a five-figure annual line.

What is the real handset attrition rate?

Plan for a low-double-digit-percent of your handset fleet to disappear or die each year, on top of the standing operating load from upkeep — and a practical fleet lifespan of two to three years for heavily used units. A mid-size museum running a 100-handset fleet should budget for ten to fifteen replacement units annually for loss, breakage, and theft alone, on top of battery replacement, sanitization labor, and charging infrastructure. Across the five-year horizon the hardware-device line for a mid-size museum stacks into a meaningful five-figure to low-six-figure total before content is even produced.

Once you add cleaning labor (the equivalent of roughly a quarter of an FTE on a busy fleet), batteries and consumables, breakage and loss, and amortized charging infrastructure, a mid-size handset fleet absorbs the low five figures a year in operating expense, independent of content cost — before a single new piece of interpretation is produced. That's not a hidden cost — it's the biggest visible cost of the model — but it doesn't appear on the production quote at all.

What does adding a language actually cost?

Adding a language to a traditional tour is roughly 60–80% of the original English production cost, every time, because the work is almost entirely re-doable rather than translatable. A per-language production includes: script translation by a domain-literate translator (not a generic vendor), a native voice cast in the target language, studio booking, recording, editing, sound design, mastering, and QA against the source. Building from the $30,000–$150,000 per-tour production range Convo publishes — and that matches what curators tell us they were quoted in their last RFPs — a mid-size museum offering even half-a-dozen languages can land in the mid-five-to-low-six-figure range in recording costs alone, before original script development or hardware.

This is why most traditional tours ship in one to three languages and stop. The marginal cost of the fourth language is roughly the same as the first additional language, and the audience case for it is harder to defend to a board. Meanwhile, roughly 20% of the US population speaks a language other than English at home (American Alliance of Museums, 2022) — substantially higher in the cities where most museums sit — which means the production economics of the legacy model silently capped how many of your visitors you could meet in their first language.

The shape of the AI-narrated model is structurally different here, which we cover in AI audio guide vs traditional audio guide.

How does amortization break temporary exhibitions?

A traditional production is amortized over the run of the exhibition it covers; for temporary and traveling shows, that math almost never works. A $40,000 production amortized over a permanent installation across, say, 1.2 million visitors over five years works out to roughly three cents per listener — fine. The same $40,000 production amortized over a twelve-week traveling exhibition that draws 80,000 visitors works out to fifty cents a listener, and that's only if every visitor uses it.

The realistic numbers are worse. Audio guide adoption on temporary shows is often in the single-digit-percent range of paid attendance, and the production lead time of six to twelve months means the tour often opens late or doesn't open at all. The result, in practice, is that most temporary and traveling exhibitions ship with no audio guide at all — not because curators don't want one, but because the production economics of the legacy model never allowed for short-run content. That's a hidden cost too: the visitor interpretation that never happened.

What about the costs that don't show up as line items?

The costliest hidden costs are the ones that don't appear as expenses on a P&L — they appear as missing capability. A few examples worth naming, because they're the costs museums most often describe to us in discovery calls:

  • Corrections that don't happen. The misattributed work, the curator quote that has aged badly, the donor name that needs to come off. We've already covered why: the re-booking economics make a single-stop fix uneconomic.
  • Languages that are never offered. The Mandarin tour the board has asked about for three years, the Korean track for the K-pop-era visitor who's now a real share of the audience, the Arabic version a school group requested. Each one is a new production line, and the answer is almost always no.
  • Temporary exhibitions without audio. The traveling show, the rotating gallery, the year-long loan — interpretation that ships only as wall text because the audio production timeline didn't fit.
  • The accessibility transcript that's a separate workstream. Most traditional vendors charge separately for transcripts, captions, and visual descriptions. In the AI-narrated model, those usually fall out of the same source.
  • The staff time spent managing a handset fleet. Sanitization between uses, charging, troubleshooting, training new staff to onboard visitors. This is usually invisible in the P&L because it's absorbed by existing roles.

None of these show up as line items in a vendor quote. All of them show up in five-year retrospectives as the reason the program never grew.

How should you read a traditional audio guide quote?

The right move is to insist that the vendor scope a five-year operating model, not a year-one production budget, and to require the categories below in writing before you sign. A defensible RFP response for a traditional audio guide should include, at minimum: the per-stop edit re-engineering fee (flat, in writing); the voice-talent re-engagement fee, including what happens if the original actor is unavailable; the annual handset attrition budget (units and dollars, with a stated assumption); the per-language production cost, scoped as a new production rather than a translation; and a clearly stated amortization plan for temporary exhibitions.

If a vendor can't or won't put those numbers in writing, that's the answer. The hidden costs are hidden because the operating model is not what's being sold. The production is what's being sold. The operating model is what you discover.

For the structural alternative, the AI audio guide vs traditional audio guide comparison covers what changes when the content layer becomes software. For the full TCO model — including five-year scenarios for both — see museum audio guide total cost of ownership. For how the same-day update capability changes operations specifically, see same-day museum tour updates.

Where this analysis doesn't apply

A few cases where the traditional model's hidden costs are either tolerable or beside the point:

  • A permanent installation that genuinely won't change for a decade. If the script is settled and the languages are fixed, the corrections-don't-happen and languages-aren't-added costs become non-costs.
  • A named-voice tour where the production is the point. If the curator, director, or a celebrity ambassador is the headline, you're not paying for narration — you're paying for a curatorial choice. Hidden costs apply, but they're part of the offer.
  • An institution mid-contract on hardware. If your hardware-and-content contract is two years into a five-year term, the migration math usually says ride it out and plan the replacement at renewal.

If none of those describe you, the line items above are probably the costs your program is already absorbing — they're just not yet on a single page.

FAQ

Roughly 20–35% on top of the production quote, depending on fleet size and tour length. The biggest contributors are handset attrition, the first round of corrections, and at least one unscoped language request from a board member, donor, or community group.

It's real, and the range varies more by visitor demographics than by hardware quality. Family-heavy and school-group-heavy institutions sit at the high end; quiet, adult-skewing collections sit at the low end. Plan for the midpoint and adjust after year one.

Sometimes, especially with smaller production vendors. The cleanest version is a flat per-stop re-engineering fee written into the original contract, capped at a number you can live with. Watch for "minimum session" language that nullifies the flat rate.

Most museums are quoted $5,000–$15,000 per additional language for a 30-stop tour with professional voice talent and studio production. The variance is mostly studio location and whether the vendor sub-contracts to lower-cost overseas studios.

Yes, and we'll cover them in their own piece — primarily curator review time and the editorial discipline required to keep grounded reference materials clean. The hidden-cost shape is different, but the principle that platform fees aren't the whole story still applies.

Because production is what they sell, and quoting operating costs means competing on a number that includes everyone else's labor too. The vendors that do quote operating costs in writing are usually a better bet, even if their headline production number is higher.

The honest summary

The traditional audio guide model isn't more expensive than it looks because vendors are deceptive. It's more expensive than it looks because the model itself separates production economics from operating economics, and the operating economics are where the real cost lives. Five-year totals on a traditional program routinely come in 30–60% above the year-one production budget once corrections, language additions, handset attrition, and short-run amortization are accounted for honestly.

If you're evaluating a traditional vendor, the move is to ask for an operating model in writing, scope corrections and languages as line items, and stress-test the five-year math against the way your collection actually changes. If you're evaluating an AI-narrated platform, the equivalent move is to look at where the hidden costs would live in a software model — curator review and reference-material discipline — and ask the platform how it handles each.

For Convo's published numbers in full, pricing is on a single page; for the broader buying and cost map, the buying and cost pillar guide is the place to start.


About the author

Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about the economics of museum interpretation from inside the category — drawing on RFP data, discovery calls with curators and directors, and the production economics of both the studio-and-handset model and the AI-narrated model. Reach him at eric@convo.app or on LinkedIn.

WHAT WE’RE ASKING

Pick one gallery.
Give us two weeks.