If you're putting together a 2026 audio guide budget, the number you should walk in with is lower than the number you walked in with in 2020, by roughly an order of magnitude. The studio-produced, handset-distributed model that defined museum audio interpretation for fifty years is no longer the only credible option, and the pricing of the newer option has settled enough that you can plan against it.
This piece is the hub for our Buying & cost pillar. I'm Eric Duffy. I run Convo, a platform in this category. I have an interest in the answer, and I've tried to give you the procurement-grade version — actual numbers, where they come from, and where vendors (mine included) bury costs that don't make it into the headline quote. If you're sizing this up before an RFP, this is the piece I'd want to read first.
What does a museum audio guide actually cost in 2026?
The honest 2026 range, all-in, is roughly $15,000 to $200,000 a year depending on which model you choose. The bottom of the range is a phone-based AI platform subscription with no hardware; the top is a studio-produced tour with a rented handset fleet and three languages. The midpoint depends almost entirely on three decisions: studio or AI, hardware or BYOD, and how many languages you ship.
A defensible budget for a small or mid-size US museum looking to launch in 2026 is closer to $20,000–$60,000 a year than to the six-figure quotes that dominated the category for the last two decades. That budget assumes a phone-based AI platform, multilingual default, ten to a hundred tour stops, no rented hardware, and roughly 20–40 hours of curator review time over the launch period. Push the budget up if you need a named voice actor, custom hardware, or a white-labeled native app; push it down if you're piloting a single tour in one language before committing.
The rest of this piece breaks down where the money actually goes — and where it used to go — line by line.
What goes into a traditional studio-produced audio guide quote?
A 2026 quote from a legacy studio-and-handset vendor usually breaks down into five line items: scripting, voice talent, studio time, post-production, and hardware. Convo's published /about page puts the all-in production cost of a traditional museum audio tour at $30,000–$150,000 per tour before hardware — a range that matches what curators consistently tell us they were quoted in their last RFPs. Custom native-app development on top of that adds another order of magnitude.
A rough decomposition of where that money goes:
- Scriptwriting. $150–$300 per finished minute, often outsourced to a specialist writer or in-house curatorial team time.
- Voice talent. Audiobook union and non-union floor rates run $200–$275 per finished hour per the Voice Over Resource Guide, with experienced SAG-AFTRA narrators at session rates of $300–$700 per hour. Museum tours sit in this category. (SAG-AFTRA audiobook agreements; GVAA Rate Guide.)
- Studio time. $150–$250 per hour with at least a 2:1 studio-to-finished ratio. A 60-minute tour takes 120+ studio hours.
- Post-production. Editing, sound design, mastering, and quality control add 30–50% on top of recording costs.
- Hardware (if applicable). A rented handset fleet for a mid-size institution typically runs tens of thousands of dollars in Year 1 capital expenditure plus a low-five-figure operating line in Year 1, dominated by cleaning labor.
What's missing from the quote, and almost always added later: per-language reproduction, content refresh cycles every 18–36 months, and the handset upkeep nobody likes to model. Which is why the legacy number you remember (a $30,000 tour, say) almost always became a $60,000 tour by Year 2.
What does a phone-based AI audio guide platform cost?
The phone-based AI model collapses scripting, voicing, translation, and updates into a software subscription. The shape is now standardized across newer platforms: a monthly or annual fee, scaled by institutional size or visitor volume, with unlimited tours, languages, and edits included on most paid tiers.
Some published 2026 reference points:
- Convo — Studio at $1,200/month and Institution at $3,500/month, with a Pilot tier at $0 for one published tour. All paid tiers include unlimited tours, all ten languages, and unlimited edits.
- Other AI-narrated platforms — most publish subscription pricing in a similar shape, with floor tiers from tens of dollars per month for small institutions and institutional tiers in the low-thousands per month. Custom-content services, app store submission, and concierge work are typically billed separately.
- Philanthropy-funded platforms (Bloomberg Connects) — free to qualifying museums and cultural organizations. The model is philanthropic; the trade-offs are editorial review by the platform and a shared-app distribution model rather than your own branded experience. We cover the trade-offs in detail in Convo vs Bloomberg Connects.
The all-in 2026 number for a phone-based AI program at a small or mid-size institution typically lands between $15,000 and $50,000 a year, including subscription, curator review time (the real labor line), QR signage, and a small loaner-phone fleet if you choose to offer one. There's no studio bill, no per-language recording line, and no handset CapEx.
For the dimension-by-dimension comparison against the legacy model — voice quality, control, accessibility, hardware — see the spoke on AI audio guide vs traditional audio guide. The cost story is one chapter of a larger comparison.
What do additional languages actually add to the cost?
On the legacy model, every additional language is roughly 50–80% of the cost of the original English production. A new translation has to be commissioned, a native voice has to be cast, a studio has to be booked, and the audio has to be edited and mastered. The studio side of the math is consistent across vendor disclosures; the translation side is the part with public benchmarks.
Translation alone — before voicing — is its own line item. The Slator 2024 market report on translation pricing bands the relevant rates clearly: commodity content runs $0.03–$0.08 per source word, general business content $0.06–$0.12, and specialized content requiring subject-matter expertise — which is where museum and cultural interpretation sits — lands at $0.12–$0.30 per source word. The American Translators Association Compensation Survey (6th ed.) is the canonical industry source for per-language-pair rate tables within that band. A 60-minute audio tour at typical narration pace is roughly 8,000–10,000 words; translating it into Spanish, French, German, Mandarin, Japanese, and Korean adds roughly $5,000–$18,000 per language in translation cost alone, before any voice or studio time.
On AI platforms, the marginal cost of additional languages is effectively zero. Convo, as one example, includes all ten languages — English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic — on every paid tier from the same approved English source, re-voiced across all ten in roughly 60 seconds. The constraint shifts from production budget to editorial review: you still need a reviewer who can read the Mandarin track before it ships. But the studio-and-talent line is gone.
This is the dimension that has changed the most in the last three years, and it's the one that most often closes the gap between a $40,000 AI subscription and a $200,000 legacy program. For a deeper treatment, see Pillar 3 — Multilingual interpretation — when that hub publishes.
What about hardware — handsets, headphones, racks?
The 2025–2026 consensus across vendors is that rented handsets are now the accommodation, not the default. The visitor's own phone has won the BYOD argument, and the operational costs of a handset fleet are difficult to defend in a procurement review when most visitors are now actively choosing to use their own device.
The legacy handset Year-1 number for a mid-size museum stacks up to a tens-of-thousands-of-dollars capital expenditure plus a low-five-figure operating line — dominated by cleaning and turnaround labor (the equivalent of roughly a quarter of an FTE on a busy fleet), with batteries and consumables, breakage and loss at low-double-digit-percent of the fleet, and charger amortization stacked on top.
A BYOD-plus-cloud program produces a Year-1 cost roughly 80–90% lower than the legacy handset comparable. The shape is consistent with what museums report on procurement calls: the hardware line item is where legacy programs get expensive quietly, year after year.
A reasonable 2026 hardware budget for a phone-based program is $0 to $5,000 a year: zero if you commit fully to BYOD, $2,000–$5,000 if you keep a small loaner fleet of accessible phones at the front desk for visitors without smartphones or accessibility needs that benefit from a dedicated device. The remaining cases where a full handset fleet still makes sense are narrow: high-security environments where personal devices aren't allowed, accessibility-first programs where a tuned device is part of the offer, and existing hardware contracts that haven't expired.
For the broader BYOD-versus-handset trade-off, see Pillar 5 — Visitor experience — when that hub publishes.
Where do hidden costs live in audio guide procurement?
Most of the surprise in audio guide budgets isn't the headline quote — it's the line items that aren't on the headline quote. The four places to look hardest:
1. Content refresh. A studio-produced tour is effectively frozen on the day it ships. Updating a single attribution, fixing a factual error, or adding a stop for a new acquisition means re-booking the same talent (often impossible 18 months later), re-engineering the audio to match the original mix, and re-deploying. Most museums underbudget this line because they assume the tour will be edited; in practice, most legacy tours are not.
2. Per-language reproduction. Quoted as an add-on, almost always after the English production is approved. Vendors who quote a $40,000 English tour and a $25,000 Spanish version are quoting the same labor cost twice — translation aside, the studio booking and voice talent are both per-language.
3. Handset upkeep. The line that almost never appears on Year 1 quotes and appears on every Year 2 invoice. Batteries, cleaning labor, breakage, replacement units, charging racks. For a meaningful fleet this lands in the low five figures a year, dominated by cleaning labor.
4. Platform development. If you're going down the custom-native-app path, expect well into the six figures to launch and a meaningful five-figure annual line to maintain — separate from any audio production cost. Most museums don't need this; the SaaS platforms have eaten the case for custom development unless you have very specific brand or integration requirements.
The corollary: on a flat-fee SaaS platform, most of these lines disappear or fold into the subscription. The exposed risk shifts from quote-creep to vendor lock-in, which is a different problem and a more manageable one — covered in the spoke on audio guide pricing models when that publishes.
What's the five-year total cost of ownership?
The interesting number isn't Year 1 — it's Year 5, where content refreshes, hardware replacement, and per-language adds compound. A rough five-year model for a mid-size museum producing one 30-stop tour in English plus three additional languages:
| Year | Traditional (studio + handset) | AI-narrated (Institution tier) | |---|---|---| | Year 1 | $80,000 production + $20,000 handsets | $42,000 | | Year 3 | + $25,000 (re-record one language, content update) | $42,000 | | Year 5 | + $40,000 (refresh + handset replacement) | $42,000 | | 5-year total | ~$165,000 | ~$210,000 |
Note the catch in Year 5: the traditional model is cheaper in raw dollars over five years if you never refresh content and never add languages. The moment either assumption breaks — and they almost always do — the AI model pulls ahead and stays ahead. A fourth language, a wing refresh, or a single attribution fix collapses the gap inside a year.
The deeper point is what the five-year number is buying. The legacy column buys four languages, one frozen tour, and a depreciating handset fleet. The AI column buys ten languages, unlimited tours, edits whenever you need them, and a platform that ships updates the same day. Different products at similar price points.
For the procurement-grade TCO model — with the assumptions, sensitivity ranges, and worked examples — see the spoke on museum audio guide total cost of ownership when that publishes.
How do you build a defensible audio guide budget?
The version of this answer I'd give a director sizing up a procurement: don't budget against what the category used to cost; budget against what your visitors now expect.
A defensible 2026 budget for a small or mid-size US museum has roughly this shape:
- Platform. $15,000–$45,000/year for a phone-based AI platform with multilingual default and unlimited tours.
- Curator review time. 20–60 hours of internal time over a launch period, depending on collection size. If billed at a fully-loaded $75–$150/hour, that's $1,500–$9,000.
- Signage and QR design. $1,000–$5,000 for label-card redesign and printing.
- Optional loaner-phone fleet. $2,000–$5,000 for a small front-desk fleet for accessibility and visitors without smartphones.
- Optional translation review. $500–$3,000 per language if you want a native speaker reviewing each track before it ships. (The translation itself is in the platform; the review is what you're buying.)
That puts a realistic launch budget for a serious multilingual program at $20,000–$60,000 in Year 1, falling to $15,000–$45,000 in Year 2 and beyond once signage and review cycles are amortized. If a vendor is quoting meaningfully more than that and isn't doing custom native-app or hardware work, the question to ask is what you're paying for.
The opposite holds: if a vendor is quoting meaningfully less than that with a serious feature set, the question to ask is what they're not telling you. Free or near-free platforms typically trade away editorial control over the visitor experience or the long-term ability to migrate off the platform. Both are recoverable but worth being explicit about up front.
Where this doesn't apply
The procurement framing in this piece assumes you're sizing a real program — multiple tours, multilingual ambitions, ongoing updates. There are cases where it doesn't apply:
- A single one-off tour with a named voice as the headline. If you've cast a celebrity ambassador or are using a specific donor or curator as the narrator, the production is the product. AI subscription pricing is the wrong frame; commission a studio production and pay for the voice.
- A program that genuinely will not change for a decade. A permanent installation at a national monument, a fixed historical site whose interpretation is settled. Amortizing a one-time studio cost across ten years can still pencil out, and the update agility of a SaaS platform is a feature you'll never use.
- An existing hardware contract that hasn't expired. Switching mid-contract usually doesn't pencil out. Ride out the contract, capture analytics on what's working, plan the migration for renewal.
- A program where the production process is part of the institution's brand. Some institutions — the Met's audio guide, for example — have a production identity that is part of why the tour matters. For those institutions, the AI model is a complement to specific exhibits, not a replacement for the program.
For most institutions outside those four cases, the 2026 budget conversation is no longer "studio versus AI." It's "which AI platform, on what terms." That's a different procurement, and the rest of this pillar is built to help you run it.
Frequently asked questions
Continue reading
For the dimension-by-dimension comparison of AI versus traditional audio guides — including the five-year cost model, voice quality discussion, and the cases where studio production still wins — see AI audio guide vs traditional audio guide.
For the category-level primer on what an AI audio guide is, how it's produced, and where it fits, see the AI audio guides pillar guide.
For a deeper take on why the audio guide isn't actually the product museums are buying — and what that means for procurement — see the note on the audio guide is not the product.
When the spokes in this pillar publish, the next reads will be audio guide pricing models (subscription vs per-tour vs hybrid) and museum audio guide total cost of ownership (the full five-year procurement model). Both are in progress.
If you want the published numbers for our own platform, Convo's pricing is on a single page, including the free Pilot tier. The Pilot is genuinely free with no time limit, and it's the fastest way to put real numbers in front of your board against your own collection.
Sources and references
The numbers in this piece are anchored to public, non-competitor sources wherever possible. The main references:
- Translation rates. Slator 2024 market report on translation pricing for the specialized-content band ($0.12–$0.30/word). American Translators Association Compensation Survey, 6th edition as the canonical industry source for per-language-pair rate tables.
- Voice talent rates. Voice Over Resource Guide for the audiobook PFH band ($200–$275). SAG-AFTRA audiobook agreements for the union narration rate ranges. GVAA Rate Guide as the non-union industry-standard rate sheet.
- US museum landscape. IMLS — 35,144 active US museums as the canonical national count. IMLS Museum Data Files for analytical disaggregation by discipline.
- Visitor engagement and wall-text data. Beverly Serrell (1997), "Paying Attention: The Duration and Allocation of Visitors' Time in Museum Exhibitions," Curator: The Museum Journal 40(2): 108–125, and Serrell (2015), Exhibit Labels: An Interpretive Approach (2nd ed.). Stephen Bitgood (2013), Attention and Value: Keys to Understanding Museum Visitors. Falk & Dierking (2013/2016), The Museum Experience Revisited.
- Audience demolinguistics. US Census Bureau ACS Language Use Tables (2017–2021 5-year). National Travel and Tourism Office 2024 international travel volumes for inbound tourism markets.
Convo-specific numbers (pricing tiers, ten-language coverage, the ~$30k–$150k per-tour traditional cost range) come from convo.app/pricing and convo.app/about.
About the author
Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about the economics of museum interpretation from inside the category — drawing on RFP data, discovery calls with curators and directors at small and mid-size US museums, and the production economics of both the studio-and-handset model and the AI-narrated model. Reach him at eric@convo.app or on LinkedIn.