What we won't ship.
Most product strategy documents are a list of what you intend to build. The more honest document is the list of what you've decided not to build, and why. Here's ours — five features that would demo well, and would, in our view, make the product worse at what it's actually for.
The most useful conversation I have had in a long time about Convo was a conversation about something we are not going to build.
A prospective customer asked, on a discovery call, whether we had on the roadmap a particular feature that one of our competitors had announced. I said no. They said, "Is that because it's too hard, or because you don't want to?" Those are different answers, and I appreciated that they asked the question that way, because most people don't.
The answer was: it isn't hard. It would, in fact, be relatively easy. We are not going to ship it because we believe shipping it would make the product worse at the thing it's actually for, even though it would make the demo look better, the feature comparison checklist longer, and the procurement spreadsheet happier. Those are the conditions under which a refusal becomes interesting. They are also the conditions under which most companies cave and ship the feature anyway, because the structural pressure to grow the comparison matrix is enormous.
This essay is the list. Five things we have decided not to build, and the reasoning behind each one. The reasoning matters more than the list, because the list will change over time as we learn things; the reasoning is the more durable artifact.
We are not going to ship a generic AI docent
The fastest thing to build in this category is a general-purpose chatbot pre-loaded with public art-history knowledge, with the museum's logo on top. Two engineers, three weeks. A demo of it would impress most procurement officers, because the demo would consist of the chatbot accurately answering questions about famous paintings — paintings that the chatbot has read about in its pretraining, paintings that have been written about extensively on the public internet, paintings that any AI of this generation can produce a serviceable answer about without ever touching the museum's actual reference materials.
That product is not what we are building, and the gap is the entire point.
The problem with a generic AI docent is structural, not technical. A generic AI docent treats the museum's collection as a query string into a model. The visitor's question goes into the model; the model's general knowledge produces an answer; the museum's brand goes around the outside. The relationship the visitor has with the artifact is not a relationship with the institution's reading of it — it is a relationship with a model's reading of it, with the institution's branding tied around it. The model is doing the curatorial work, and the museum is paying for the wrapper.
A serious AI audio guide is the inverse. The visitor's question goes through retrieval into the museum's own source materials. The answer comes from what your curators wrote, with the structure and emphasis your interpretation team chose. The model is the production tooling. The museum is the author. We refuse to ship the generic-docent version because the only thing it accomplishes is teaching visitors to have a generic relationship with what should be a specific institution. That's not a feature; it's an erosion.
We are not going to ship one-click translation at playback time
Several prospective customers have asked us whether the multilingual layer can work the way Google Translate works in a web browser — the visitor selects a language, the platform translates the script on the fly at playback, the audio is synthesized in that language at the moment the visitor presses play. This would, on paper, eliminate the need for the institution to think about per-language production at all.
We are not going to build this either, for a reason that's structural rather than technical.
The relationship the visitor is having is with the curator's voice as it was approved. A translation pipeline that fires at playback means the curator did not approve the specific language the visitor heard. The Spanish-speaking visitor and the Korean-speaking visitor and the German-speaking visitor are each receiving a version of the tour that the museum has never seen and could not, in any meaningful sense, have signed off on. The model produced it; the visitor consumed it; the museum has no record.
This is wrong in a way that doesn't show up at the surface most of the time, because the translations are mostly fine. The model is competent. It will, mostly, produce reasonable translations of reasonable English source material. The "mostly" is doing a lot of work. The specific cases where a translation is sensitive — proper names in religious contexts, dates expressed in a different calendar system, terminology that has political weight in the destination language — are precisely the cases where the museum should have approved the translation before a visitor heard it. A playback-time translation pipeline routes around that approval gate. We don't want to build the route around.
Instead we voice each language ahead of time so a curator can review what a Korean-speaking visitor will actually be told. The harder workflow lets the relationship survive translation. We chose the harder workflow.
We are not going to ship richer broadcast — longer scripts, sound design, music beds
There is a version of this product that produces, instead of a two-sentence stop with an invitation to ask follow-up questions, a two-minute cinematic narration with ambient sound design, a score, and the kind of production polish that you'd associate with a high-budget podcast or documentary.
The technology to build this is here. We could do it. The voice synthesis is good enough. The audio engineering tooling is mature. We could ship the rich broadcast mode in a quarter and it would expand the demo by a meaningful amount.
We are not going to do it because we think the visit is a conversation, not a broadcast, and richer broadcast is the wrong direction for our specific opinion about what a tour should feel like. The two-sentence stop with an invitation to ask follow-up questions is not a limitation of the product; it is a design choice. The unit of value we are trying to produce is the moment when a visitor asks a question and gets a grounded answer about the specific object in front of them. A cinematic two-minute stop with a music bed is the opposite of that — it asks the visitor to be passive, to receive, to listen to the curated production without interrupting.
We are leaving room for the visitor to interrupt. That is the product. Richer broadcast would crowd it out.
We are not going to ship white-label or subdomain branding today
This is the one where I want to be most careful, because it's the one we have considered most seriously and it is the one where I am least sure of the long-term answer.
A class of larger institutions has asked us whether the tour can be served under their own domain — your-museum.org/tour rather than convo.app under their branding. The argument is that they want their visitors to never see the word Convo at all; they want the whole experience to be the museum's, top to bottom.
I take this argument seriously. We are not building it today, but for a reason that is more about sequence than principle. Building white-label correctly — subdomain branding, custom domains, full-trust SSL handling, the operational tail of supporting it — is a meaningful piece of engineering that takes attention away from the product features that we believe will matter more to more institutions in the same time. We have made the tradeoff explicit: pay the deferred-feature cost for now, and revisit when the larger institutions we want to serve become a larger share of our customer base. This is the only one of the five where I would not be surprised to see the answer change inside two years.
The honest version of "what we won't ship" includes things we will ship eventually. The point of saying so publicly is to be clear about why we haven't yet, and to avoid the trap of telling a prospect on a sales call that it's coming next quarter when it isn't.
We are not going to ship visual AI generation for stops
Several prospective customers have asked whether we will let the model generate visual content — a synthesized image of an artifact, a recreated context, a visualization of a scene the museum is describing. The technology to do this is here too; the demos look impressive.
We will not build it. The artwork is the artwork. The object is the object. We voice the words around the object; we do not synthesize the visuals.
This is the easiest of the five for me to argue, because the reasoning is essentially deontological. A museum's authority is built on the proposition that the objects in its galleries are real, that the descriptions of them are accurate, and that the institution stands behind both. A platform that lets a model generate plausible-looking images of, say, a Roman villa or a medieval workshop, and then ships those images to visitors inside the museum's branded tour, has compromised the line between the actual collection and the imagined one. We will not ship the visual-generation feature because we will not help blur that line. The point of bringing visitors into a museum is to put them in the room with the real thing. The audio is there to add interpretation around the real thing. Generated images would be there to add a fake version of the real thing. We don't do that.
How we decide
I want to close on the meta-question, because the meta-question is what makes the list possible at all.
The rule we try to apply, when a "should we build this?" question comes up internally, is something like: does shipping this make the product better at producing the moment where a visitor stands in front of an object and forms a relationship with it, mediated by the curator's interpretation? If yes, build it. If no, don't build it, even if it would demo well, even if it would expand the addressable market, even if it would close a deal.
This is a harder rule than it sounds, because most product decisions don't break cleanly on it. Most decisions involve some small drift toward "the product looks more impressive but doesn't actually do its job better." Strategy is the practice of noticing those decisions and saying no to them, on a fast enough cadence that the product doesn't gradually become something else.
That is what the list above is. It's not a marketing document. It's the running record of decisions we made, against the gravitational pull of the comparison matrix, to keep the product pointed at the thing it's actually for.
The list will change. The reasoning, we hope, won't.
About the author
Eric Duffy is the founder of Convo, a platform that helps museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about the practice of building a product for a category that hasn't decided what it wants to be yet. Reach him at eric@convo.app or on LinkedIn.