ESSAY

The questions visitors ask but rarely get to ask.

Visitors spend 27 seconds with an artwork and 2.47% of them download the audio app. The questions they actually have rarely reach a curator. Here is what the data shows, and what changes when there is an inbound channel.

ERIC DUFFY·FOUNDER·MAY 23, 2026·9 MIN READ

A visitor leaning in close to a glass display case of small artifacts, studying one as if about to ask a question — the lead image for the essay on the questions visitors rarely get to ask.

A docent at a mid-sized art museum will tell you something most visitor surveys miss. The questions visitors ask in the room rarely match the questions the labels are built to answer. For example: a visitor leans toward a portrait and asks why the dress is that color. Another asks who the painter was angry at. A third asks if the apostle on the left really has six fingers. None of those questions appear on the wall card. None appear in the audio script. None reach a curator who could answer them.

This is not a complaint about visitors. They are doing what curious people do. It is a comment on the design. Most museums have built interpretation with no inbound channel. The curator writes. The visitor hears. The moment passes. The question fades into the rest of the afternoon.

What the data says about visitor attention

Interpretation is the curatorial work of turning scholarship into something the visitor can meet in the room. For about a century, that work has assumed a broadcast model. The curator speaks. The visitor listens. The label, the audio guide, and the docent script are versions of the same one-way channel.

However, the field's own visitor research has been quietly arguing the other way for thirty years. Dwell time is the time a visitor spends in front of one artwork or stop. In fact, mean dwell time at art museums is about 27.2 seconds. The median is 17 seconds, per Smith & Smith, replicated in Museum Management and Curatorship in 2024. A standard audio-guide stop, by contrast, runs 90 to 180 seconds. In other words, the script is written for a window of attention most visitors do not give.

< 3%

typical adoption of a museum's own native audio-guide app

per cross-industry research on cultural-organisation apps

The numbers on the channel are no kinder. Completion rate is the share of started audio tours that reach the final stop. The more telling number, however, is the share who start at all. Years of cross-industry research on cultural-organisation apps — most consistently the long-running tracking by Frankly Green + Webb — keep finding that the average museum app pulls visitor adoption in the low single-digit percentages, with most apps under a thousand downloads and opened less than once. Hardware-rental guides do better. However, most surveyed museums report adoption under a quarter. The channel everyone builds around reaches a small share of the room.

So the picture is this. Most visitors do not press play. The ones who do are listening for less than a third of the time the script was built for. And the curiosity that brought them into the room rarely gets a second touch. In our discovery interviews, we found this is the part curators name first. They are asked what their interpretation program is missing, and this is the first thing they say.

Why the broadcast model misses

Three problems compound. Each one is well-documented. Together, they describe a model that is much weaker than its reputation.

The dwell-time gap

Specifically, the first problem is timing. Visitors look at art for a fraction of the time the script is built for. A 90-second stop is written for a visitor who is not in the room. In our interviews with directors and education leads, this is one of the few facts that lands without argument. Everyone has seen it happen. Few have built a program around it.

The adoption collapse

Similarly, the second problem is the channel itself. Apps fail because asking a stranger to download something before they can hear about a painting is a tall request. A QR-launch tour is one delivered to the visitor's own device via a printed QR code, without an app download. That removes some of the friction, but not all of it. The visitor still has to know the tour exists. They have to want to use it. They have to want to use it now, not later. Most do not.

The unasked question

Notably, the third problem is the one curators feel and rarely name. Visitors have questions in the room. Those questions almost never become a signal anyone at the institution can read. There is no log. There is no review. There is no curatorial reply. In our experience reviewing how museums collect visitor feedback, the most valuable visitor research a museum could be sitting on is exactly this. It is the real content of visitor curiosity in front of real objects. In particular, that research is not collected. The design has no place to put it.

These three problems are not independent. For instance, short dwell times mean broadcast content does not get heard. Low adoption means it does not reach most visitors at all. And without an inbound channel, the curious visitor cannot ask the question they actually want answered. That is the visitor who would have listened for ten minutes, in other words. If there had been a way to ask.

What changes when visitors can ask

Recent studies on conversational interpretation describe a different pattern. For example, a 2025 study at the ACM International Conference on Interactive Media Experiences compared two groups of visitors. One used a generative chatbot. The other used a traditional museum app as a control. The chatbot group showed higher artwork engagement, longer dwell times, and richer follow-up. Notably, this is not an isolated result. The Centre Pompidou's pilot with Ask Mona showed a similar pattern. Visitors who treated the guide as a chat stayed longer with each work than those who used it as a playlist.

In addition, the University of Cambridge's "Nature Perspectives" experiment lets visitors chat with museum specimens as if they were still alive. The team reports unusually high voluntary engagement times. The setting should, by every prior expectation, have produced the opposite.

What these studies share is a reframe. A conversational tour is an audio tour visitors can pause to ask questions of. The answers are grounded in the curator's own reference materials. In other words, the reframe is not technological. Interpretation, in this model, is not content delivered. It is curiosity met. When the inbound channel exists, visitors use it. When it does not, the curiosity fades within seconds and rarely returns.

The shape of the difference is easier to see laid side by side. The left column is the broadcast model. The right column is the question-first model. Neither is the whole answer for every gallery. In particular, the point of the comparison is to make the assumptions of each one visible.

Dimension	Broadcast model	Question-first model
Activation	Visitor presses play	Visitor asks a question
Pacing	Linear playlist	Branching, on demand
Measurement	Play count	Question count, by theme
Coverage per artwork	One angle	As many as the corpus supports
What the curator authors	A script	A reference corpus
What the institution learns	Whether visitors finished	What visitors wondered about

In our reading of the literature, we found the numbers in early pilots of the right column run an order of magnitude above broadcast-model floors. However, we expect those numbers to land lower at scale. Similarly, small pilots almost always do. Even halved, they sit well above the 2.47% download rate the field treats as background.

How a question-first model is built

The shift is mostly architectural. Specifically, four moves do most of the work.

The curator's expertise becomes a corpus, not a script. A curatorial corpus is the body of reference material — catalog entries, exhibition notes, wall cards, research files — that grounds a tour or a curator-facing guide. The guide draws from that corpus. The curator decides what goes in it.
The visitor's question is the activation signal. Nothing plays until the visitor wants something. No autoplay. No "Welcome to the gallery."
Every question is logged and clustered. The curiosity report is the anonymized aggregate of visitor questions, clustered by theme. It surfaces what visitors are actually thinking about across an institution. Most institutions have never had this data before.
The curator keeps editorial control. The guide can be tuned to defer, to refuse questions outside its scope, or to surface a curatorial caveat alongside any answer. Grounded answers are visitor-facing responses sourced from a defined curatorial corpus rather than general model knowledge. When the guide cannot ground an answer, it says so.

None of these moves require throwing out the audio guide. For instance, the fastest path in practice is to keep the linear product where it works. A tightly authored tour of a single retrospective, say. The conversational layer goes alongside it for the rest of the gallery. That is the pattern several of the pilots above followed.

This is the rough shape of how our authoring tool works in practice. However, the architectural pattern is older than any one product. Centre Pompidou, Cambridge, and a handful of others have been working in this direction for several years. The point is not the vendor. The point is the inbound channel.

How to start, in your own gallery

You do not need a platform to start. You need a week, a docent, and a notebook. In other words, the first step is to find out what your visitors are already asking that nobody is hearing.

As a result, the practical sequence is short. In particular:

Run a one-week question audit. Have a staff member shadow three gallery tours. Log every visitor question, verbatim. Categorize as factual, interpretive, comparative, or personal.
Audit the inbound channels you already have. For instance: email to the education department, comment cards, the first ten minutes of every public program. The questions are not hidden. They are scattered.
Pick one gallery for a pilot. Modest curatorial complexity. High foot traffic. Run for four to six weeks. Measure adoption, questions per session, and dwell time per artwork.
Compare against your existing guide. Do not retire the old product. Run the two channels in parallel. Let the visitors choose. The data will tell you which to invest in next.
Review the question corpus monthly. This is the most valuable output. It will reshape your next exhibition's interpretation strategy before it reshapes anything else.

Measurement matters here. Specifically, track adoption rate, average questions per session, dwell time per gallery, and one post-visit survey question that captures whether the visitor felt their curiosity was met. Six to eight weeks is the floor for a real signal. In our experience, two weekends of foot traffic is not enough data to make a strategic call on.

Where the argument is weakest

In particular, the biggest caveat is the corpus. A conversational guide is as strong as the curatorial scholarship behind it. Grounded answers are only as good as the source. If the underlying material is thin, the guide will be thin too. A thin guide is worse than a well-written label, because it disappoints a visitor who came expecting depth. In short, the model does not substitute for scholarship. It amplifies whatever scholarship the institution has invested in.

There are also contexts where the broadcast model still wins. For example, a linear retrospective benefits from authored pacing and curatorial sequencing in a way conversational tools struggle to replicate. A 45-minute biographical tour is exactly the case where a fixed script does its best work. Similarly, smaller institutions without bandwidth to maintain a corpus will find conversational tools more demanding to operate than a pre-recorded guide.

However, the engagement numbers themselves deserve caution. The pilot figures in the studies above come from small cohorts. They are mostly self-selected. They run in well-resourced institutions. The replication base is still thin. The honest framing is this. The direction of the effect is consistent across pilots. The size of the effect is not yet settled. However, even cautious estimates sit well above the 2.47% floor.

In other words, the shift here is not technological. It is editorial. The interpretation model that defined the twentieth-century museum was built on scarcity. Curators were scarce. Recording was expensive. Distribution was hard. However, none of those scarcities apply anymore. The bottleneck has moved from curatorial output to visitor curiosity. Most institutions' interpretation infrastructure has not caught up.

The question worth asking, in your gallery, this week: what are visitors wondering that no one ever hears?

A model where the visitor's question is the activation signal, the curator authors a reference corpus rather than a script, and the guide draws from that corpus on demand. The curator keeps editorial control.

Mean dwell time is about 27 seconds. A standard audio stop is 90 to 180 seconds. Native audio-guide apps average 2.47% adoption. Most visitors never press play, and the ones who do are listening for less than a third of the time the script was written for.

No. A tightly authored tour of a retrospective or a guided historical narrative is the case where the broadcast model still wins. The argument is about the default for permanent collections, where visitor paths are non-linear and questions outnumber answers.

The same research finds visitors do not want more screens or more friction. The point of conversational interpretation is to make the channel invisible: the visitor's own phone, one QR code, one question. No app, no headset. The technology should disappear into the conversation.

A small question audit. Shadow three tours. Write down every visitor question. Categorize them. Within a week, the pattern is clear enough to choose a pilot gallery. A platform is the second step, not the first.

About the author

Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about how museums could afford to be more ambitious with interpretation, drawing on discovery conversations with curators, directors, and education leads at small and mid-size US museums. Reach him at eric@convo.app or on LinkedIn.

WRITTEN BY

Eric Duffy

Founder, Convo

MORE NOTESAll notes

JUN 7·ESSAY

The two-week pilot, deconstructed.

READ ›

JUN 6·ESSAY

What "AI-narrated" actually means.

READ ›

JUN 5·ESSAY

What we won't ship.

READ ›

ENJOYED THIS?

A note like this, end of every month.

Subscribe Book a demo