ESSAY

How long does an audio tour actually take to produce?

Audio tours take twelve to twenty weeks per language. Here's where the time actually goes, and what changes when the stages aren't serial.

ERIC DUFFY·FOUNDER·MAY 25, 2026·9 MIN READ

A professional voice-over recording booth seen through glass, empty mid-session — microphone, headphones, and a marked-up script on a stand. The lead image for the essay on audio tour production timelines.

The first quote a museum gets for an audio tour is usually a number of weeks and a number of dollars, sitting next to each other on a page, with very little explanation of what's inside either one. The director hands it to the curator. The curator hands it to the head of education. Nobody pushes back, because nobody quite knows what they'd push back on. The quote becomes the schedule. The schedule becomes the budget. The budget becomes the limit of what gets covered.

It's worth slowing down and asking what the weeks are actually for.

The honest answer: twelve to twenty weeks, per language

Audio tour production is the end-to-end process of turning a museum's reference materials into recorded, multilingual narration ready to play at each stop. The reference materials are typically catalogs, wall cards, exhibition notes, and curator research. A standard audio tour, produced by a competent legacy studio-and-handset vendor, takes twelve to twenty weeks from kickoff to a published English track. Add three to six weeks for each additional language. A "simple" twenty-stop tour in English plus Spanish, on the average vendor's calendar, is a four- to six-month project.

This isn't a number we made up. The legacy vendors themselves have long described six months as the practical floor for a custom mobile guide; the four-month figure for a smaller audio-only tour is the same math at a smaller scale. The numbers come out of our own RFP and discovery work with curators and directors comparing quotes from the studio-and-handset side of the market.

That's the baseline. The range exists because the variables that move it are real: the institution's review cycle, the writer's familiarity with the collection, voice-talent availability, studio scheduling, and the number of revision rounds anyone realistically gets. None of those variables is anyone's fault. They are what the work is.

However, what's worth knowing — and what the quote rarely says — is that the math isn't long because the work is hard. It's long because the work is serial. Each stage waits on the one before it. The cost of the cycle isn't in any one stage. It's in the handoffs.

Where the time actually goes

There are four stages in a traditional production cycle. Each is reasonable on its own. Stacked, they're months.

Stage 1 — Drafting (3–6 weeks)

The writer reads the collection materials. The catalog, the wall cards, the exhibition notes, sometimes the back-of-the-house research files that curators have built up over a decade and never published. A first pass of scripts comes back in two or three weeks. The curator marks it up. A second pass comes back a week later. The third pass goes to the director.

In fact, drafting is the part that looks the most like real work to the institution, and it's the part curators most want to be present for. It's also the part where the most time vanishes invisibly. Between draft three and draft four, there's a week where nothing seems to be happening. In fact, everyone's waiting on a single comment from one person who is also installing an exhibition.

Stage 2 — Voicing (2–4 weeks)

The approved scripts go to a voice actor. The voice actor is union talent in most major US markets, and union talent is a calendar problem before it's a craft problem. SAG-AFTRA publishes its audio commercial rate sheets annually, and the union's own voiceover contracts page makes clear that pension, health, and residual structures attach to every recorded session. The studio session has to be booked at the end of the writer's revision cycle, not the beginning, so the lead time is whatever the talent's calendar happens to be when you finally get there.

In addition, once the booth time is real, the recording itself is fast — a twenty-stop tour records in an afternoon or two, plus pickups. The post — edits, levels, mastering — adds another week. If a stop gets reworded after voicing, you go back to the booth. Or you wait for the next session. Or you accept that the tour now sounds slightly different at that stop than the others.

Stage 3 — Translation (per language, additive)

Multilingual production is the part of the cycle most museums quietly trim back, because the math multiplies. If the museum wants Spanish, the English script goes to a translator. If it wants Spanish and French, it goes to two translators. The American Translators Association notes that professional rates in the US generally run between twelve and thirty cents per word. The exact rate depends on language pair and subject-matter expertise. A twenty-stop, four-thousand-word tour, translated into three languages, is between $1,400 and $3,600 just in translator fees. That's before review, before voice talent, before the studio.

The translation itself takes one to two weeks for a twenty-stop tour. But the curator now has to review the translated script — and most curators don't fluently read four of the languages their institution is shipping in. So they don't, really. They trust the translator and hope.

Then the translated scripts go to another voice actor, native in that language, who has their own calendar. Two languages adds roughly six to ten weeks to the project. Four languages adds twelve to twenty.

In particular, this is the stage where curators stop dreaming. "We'd love to offer it in Mandarin and Korean too" gets crossed out at the budget meeting, because the math has spoken.

Stage 4 — Review, QA, and revision (1–3 weeks)

Final listening. The director listens to the English tour. The Latin American Studies curator listens to the Spanish. Somebody notices that the second sentence at stop 7 is wrong. The painting was reattributed last year, and the script still says Workshop of Rembrandt. The wording at stop 12 misses the new accessibility framing the institution adopted in February.

Both fixes are real. Both fixes are also a small project. Re-voicing one sentence means booking the talent again, mastering again, re-pushing the audio file to the hardware or the app. The institution makes a list of "things to fix in the next version" and ships what it has.

A visitor looking quietly at a painting in a gallery — placeholder for licensed museum photography.

This part of the cycle is the one that visitors feel even when they don't know they are feeling it. Visitor-studies research, replicated across multiple journals, finds that the mean dwell time in front of a single artwork is roughly 27 seconds — about a third the length of a typical 90-second audio stop. The interpretive script, in other words, is built for an attention window that most visitors don't give it. When the script is also out of date, the gap widens further. Curators know this. The math of the cycle is what stops them from closing the gap.

Why the math is the math

The total isn't long because any one stage is slow. It's long because the stages can't run in parallel. The writer can't write until the curator has briefed the writer. The voice actor can't voice until the scripts are approved. The translator can't translate until the English is locked. The second voice actor can't record until the translation is checked. Each handoff carries calendar slippage that's invisible inside any one stage and obvious only at the end of the project, when somebody adds up the calendar and realizes that two-thirds of the elapsed weeks were spent waiting, not working.

Notably, this is the part vendors don't explain, because the cycle is the cost. The price of an audio tour isn't really the cost of drafting plus voicing plus translation plus QA. It's the cost of coordinating the handoffs over four months, and that coordination is the line item nobody calls out by name on a quote.

And it's the line item that decides what your museum can actually do. A six-month tour cycle, at the unit cost of a real production, is why most museums end up with audio for a fraction of their permanent collection, in one or two languages, kept current rarely. Rotating exhibits don't get audio because the next rotation begins before the last one ships. Multilingual is offered in two languages because the third would have meant adding a quarter to the project. The cycle quietly defines the ceiling on what curatorial ambition is allowed to be. Most directors run into that ceiling not when they're told about it. They run into it when they see the budget for the second language quoted next to the budget for the first.

However, this isn't a problem of will or craft. In our discovery interviews with curators and directors at small and mid-size museums, the same pattern keeps coming back. Specifically, the work people would like to ship is downstream of what the production cycle is willing to let them ship. It's a problem of math.

The shape of the math is easier to see laid out side by side. The left column is what a legacy cycle assumes. The right column is what the same work looks like if the stages don't hand off serially.

Stage	Legacy production cycle	Parallel production
Drafting	3–6 weeks with an outside writer	An afternoon of curatorial editing
Voicing	2–4 weeks of union talent scheduling	A click on a finished script
Translation	+3–6 weeks per added language	One regeneration in every language
Review and QA	1–3 weeks of listening passes	Sentence-level edits in seconds
Post-launch updates	A new mini-project per correction	Same-day, same script, same voice

The point of the comparison is not that one column is faster. It's that the right column never blocks on a calendar that isn't the curator's. That single change is what lets a director say yes to a third language, or to giving the rotating program the same care as the permanent collection.

Four practical changes follow once the calendar stops gating the work:

A correction noticed on Tuesday ships on Tuesday, not in the next cycle.
The third and fourth language stop being stretch goals and start being defaults.
Rotating exhibits get audio that arrives with the install, not after it.
Curators spend the time they used to spend coordinating on the parts of the job only they can do.

These changes don't depend on any one product. They depend on the production cycle stopping being serial. For example, peer-reviewed research on visitor-museum interaction is now exploring conversational interpretation as a complement to traditional audio — the 2025 ACM IMX study finds measurably higher engagement when visitors can ask, not just listen. That kind of design only becomes feasible when the production math underneath it gets honest.

The cost no quote shows you: updates

Here is the part of the math that quotes never name and curators run into later.

For instance: a correction. A reattribution. A new acquisition. A re-hang. A donor whose name has to come off a label. An accessibility revision the institution adopted six months after the tour shipped. Any one of those things, in a traditional production cycle, is a small project. You book studio time. You schedule the talent. You re-translate, if you're multilingual. You re-master. You re-deploy.

So the changes don't ship. The tour drifts out of date. After eighteen months it's noticeably wrong in three or four places. After three years the institution either accepts that the audio is a snapshot of an older curatorial position, or it commissions a new production cycle and starts over.

This is the second cost of the math. The first cost is what you can afford to make. The second cost is what you can afford to keep current.

What changes when the stages stop being serial

The cycle is what it is because the handoffs are what they are. The interesting question is what happens when the handoffs go away.

If the first draft can be produced in ninety seconds from the reference materials a curator already has — the catalog, the wall cards, the exhibition notes — drafting collapses from three weeks to an afternoon of curatorial editing. If voicing is a click on a finished script, voicing collapses from two weeks to a minute. If translation runs from the same source in parallel for every language at once, translation collapses from additive weeks per language to a single regeneration. If a correction at stop 7 ships in seconds rather than triggering a re-booking, updates are no longer a project.

That isn't a hypothetical claim. It's the way authoring works at Convo. In our discovery interviews with curators, the same instinct repeated almost verbatim: the part of the work I want to keep is the editing, not the typing. The point isn't that the work is easier. It's that the order of the work is different. Curators decide more and coordinate less. The same expertise that goes into a draft now goes into editing one. The first pass is faster; the curatorial pass becomes the whole pass.

What that math makes possible is the part directors should care about. Audio for the new wing without it killing the budget. Multilingual as a default rather than a stretch goal. Rotating exhibits getting the same care as the permanents. A correction shipping the same afternoon it's noticed. The kind of program a director couldn't seriously plan for before — because the math, until recently, refused to let them.

A question to ask your vendor before signing

In other words, if you take one thing from this piece, take this. The next time you read a quote for an audio tour, look for the line that explains what happens when something changes after launch. Almost none of them have it. Ask the vendor, in writing, what a one-sentence correction at one stop, in three languages, costs in dollars and in days. In our experience, the answer will tell you which of the four stages above they've actually solved, and which they're hoping you won't ask about.

That's the conversation that matters. Not the headline number, not the launch date, not the number of stops they're quoting on. The conversation about what your audio program looks like in year three.

A traditional vendor-produced audio tour takes twelve to twenty weeks for a single-language English track, and three to six additional weeks per language. A twenty-stop tour in English plus Spanish typically lands in the four-to-six-month range from kickoff to public launch. Six months has long been described inside the industry as the practical floor for a custom mobile guide on the studio-and-handset model.

Because the second language is a second production cycle, not a translation step. The English script has to be locked first, then translated by a professional translator, then reviewed, then recorded by a different voice actor whose calendar is its own constraint. Each handoff costs calendar time the quote rarely names.

In the RFPs and studio quotes we have reviewed, full-production audio tours from legacy vendors typically range from $30,000 to $150,000 per language, depending on number of stops, voice talent, and whether hardware is bundled. The variation isn't usually in the voicing or translation — it's in the project-management overhead of coordinating the handoffs across four months.

Technically yes; practically no. A correction at one stop, in one language, means re-booking the voice talent, re-mastering, and re-deploying. Most institutions accumulate a list of things to fix in the next version and ship the rest as-is. This is the part of the math curators run into after launch, not before.

About the author

Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about how museums could afford to be more ambitious with interpretation, drawing on discovery conversations with curators, directors, and education leads at small and mid-size US museums. Reach him at eric@convo.app or on LinkedIn.