The two-week pilot, deconstructed.
We tell prospective customers that a Convo pilot can go from contract signature to live, multilingual tour in two to four weeks. That's not a marketing number. It's a description of a specific sequence of work, and almost all of it is on the museum's side. Here's what actually happens.
We say, on the pricing page and on most discovery calls, that a Convo pilot goes from a signed agreement to a live, multilingual tour in two to four weeks. I want to use this essay to deconstruct that number, because the framing of "two to four weeks" is doing a lot of work in those conversations, and the honest version is more interesting than the number alone.
The compressed timeline isn't a function of how fast the AI is. The AI is fast — a draft can be produced in roughly 90 seconds from uploaded reference materials, and re-voicing across ten languages takes about a minute — but those minutes were never the bottleneck in audio tour production. The bottleneck in a traditional tour was the long human steps strung together in series: voice casting, studio booking, recording, editing, mastering, then the same cycle repeated per language. The AI removes that series.
What's left, after the long human steps are gone, is the work that doesn't compress. That work is what a museum is actually committing to when they sign a pilot agreement. The pilot timeline is two to four weeks because the work that doesn't compress takes two to four weeks. It will never take less. It might take more if the museum isn't ready for it. And the part of the pitch I want to deconstruct here is what "ready for it" means, because that's the part that determines whether the timeline lands cleanly or slips.
Days one through three: source materials
The first stretch is the museum gathering the reference materials the platform is going to draw from. This is the step that varies most by institution, and the variation is the single biggest predictor of whether the pilot stays on schedule.
What we need is the institution's existing interpretation copy for the gallery or tour being piloted. That can be wall text and label copy. It can be the catalog entries for the works being interpreted. It can be exhibition essays, curatorial notes, the docent training binder, the script from an existing audio tour, a CSV export from the collection management system. It can be any combination of the above. What it cannot be is nothing — the platform drafts from the institution's voice, and the institution's voice has to exist somewhere in text before the draft can pull from it.
The institutions that move quickest through this step are the ones whose interpretation team already has the copy organized — usually because they've been running an audio tour, a school program, or a written gallery guide and have the underlying materials on hand. The institutions that move slowest are not unprepared so much as un-collected: the materials exist, they're just scattered across email, a shared drive, a curator's laptop, and a printed binder in someone's office. Days one through three are sometimes a productive scavenger hunt rather than a curated upload.
The honest part of this is that the platform cannot speed up the scavenger hunt. We can move the scavenger hunt forward by giving the museum a clear inventory of what we need, but the consolidation of source material is operational work the institution has to do. Most institutions can do it in three days; some take three weeks. The pilot timeline lands cleanly when the source-material consolidation lands cleanly.
Days four through eight: drafts and the first curator review
Once the materials are in the platform, the drafting and first review happen fast.
The platform produces a first-pass script for each stop, grounded in the uploaded materials. The script comes through in the institution's register — formal or conversational, art-historical or social-historical, technical or accessible — depending on what the source materials taught it. The drafts are not, in my experience, finished writing. They are first drafts of a kind a curator would recognize as the equivalent of what you'd hand to an editor. They get the structure right, they get most of the facts right, they capture the institution's tone, and they need real editing before they ship to a visitor.
The first review is the part of the workflow that I want to be most honest about, because it's where the most expectations get reset. The platform did not write a finished tour; it wrote a draft of one. The curator on the institution's side is now in the role of an editor — reading every stop, rewriting where needed, cutting where the AI was verbose, sharpening where the AI was generic, fact-checking where the curator knows the material better than the source documents did. This is real editorial work. It takes a curator a day, sometimes two, depending on the length of the tour and the size of the team.
The institutions that finish this step quickly are the ones who treat it as editorial work and apply the same standards they'd apply to a written gallery guide. The institutions that struggle are the ones who treat it as a check-the-box review and either miss things or over-trust the draft. The pilot timeline assumes the editorial work gets done seriously; the cost of skipping it is paying for it later in correction cycles.
Days nine through twelve: voicing, languages, and the second review
After the curator has approved the English script, the voicing and language regeneration happen.
This is the part where the AI does, in fact, move fast. The English voicing runs on a high-quality neural text-to-speech model and produces narration that, for most listeners in most contexts, is functionally indistinguishable from a competent studio read in short-form. The language regeneration takes the approved English script, translates it into the other nine languages we ship, and produces audio in each one in roughly a minute end-to-end.
What happens in days nine through twelve is mostly review of that output. The curator listens to the English narration and confirms it sounds the way they wanted; usually the only adjustments are to specific pronunciations of proper names or institutionally-specific vocabulary, which we handle by editing the source script. For the non-English languages, we ask the institution to designate a reviewer per language who can confirm the translation is appropriate, especially for any name pronunciations, religious terminology, or cultural references that the model might handle in a way the museum hadn't expected.
The "per language reviewer" piece is operational, not built into the platform's admin — we work through it with the institution directly during the pilot. Most museums don't have a native speaker on staff for every language they want to ship, and the realistic answer is that they reach out to a community partner, a docent volunteer, or a contracted reviewer for the languages that need a closer look. We help coordinate this. It is, in my experience, not the bottleneck most people expect.
Days thirteen through fourteen: publishing, signage, and going live
The last stretch is the operational work of going live. The QR codes that visitors will scan need to be generated, printed, and installed. The wall cards or signage need to be designed and produced. The institution's web team — or the institution's directorate, if the institution's web team is a person with another job — needs to know that the tour is live and how to surface it on the museum's own website if they want to.
We can help with all of this, but the signage step in particular is one that varies by institution. A larger museum with an in-house design team produces beautiful wall cards in a day. A smaller institution that's relying on a single person to print and laminate signage takes a few days longer. The pilot timeline assumes a reasonable production cadence for signage; institutions that want gallery-quality printed wall cards should budget an extra week for that piece specifically.
When the QR codes are up and the signage is in place, the tour goes live. The visitor scans, the web player opens, the narration starts, and the museum has — for the first time, in most cases — a multilingual audio tour they own, that they can update on their own cadence, and that their visitors can ask questions of at any stop.
What the timeline is really committing to
The "two to four weeks" framing is a real number. It's also a number that assumes the institution does its share of the work, on the schedule the schedule assumes, with the discipline the editorial review demands.
What the museum is actually committing to when they sign a pilot agreement isn't a software contract. It's a two-week sprint of curatorial work, condensed because the production steps that used to take months are now minutes, but still requiring the institution to bring its source materials, its editorial judgment, its review attention, and its operational follow-through. The AI doesn't replace any of those. It replaces the things around them.
The institutions that get a clean two-week pilot are the ones who understand this going in. The institutions that get a slipped four-week pilot are the ones who expected the AI to do work the AI was never going to do. The difference is mostly expectation-setting, which is why I wanted to write this essay — so when a curator reads our pricing page and sees the number, they have an honest picture of what they're agreeing to. The number is real. The work behind the number is also real. Both can be true at once.
About the author
Eric Duffy is the founder of Convo, a platform that helps museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about the operational practice of moving institutions onto AI-narrated tours. Reach him at eric@convo.app or on LinkedIn.