Self-guided tours and dwell time in museums.

KEY TAKEAWAYS

The most reliable finding in the literature: an audio guide tends to lengthen stops at the objects it covers, while its effect on total visit duration is mixed. Adding minutes to a stop is not the same thing as adding minutes to the visit.
The baseline most museums quietly know is uncomfortable: median time spent in front of a work of art is around 17 seconds; the mean lands near 27 seconds (Smith & Smith, 2001). Carbon's 2017 follow-up found similar first-look times, with re-visits pushing the total higher.
Beverly Serrell's tracking-and-timing benchmarks — Sweep Rate Index near 300 and roughly 25–26% "diligent visitors" — are still the cleanest cross-institution comparison. Most exhibitions do not clear them.
Dwell time is a proxy, not a goal. A visitor who lingers because they're confused looks identical to a visitor who lingers because they're enthralled. The metric does not distinguish them.
A more useful signal for a self-guided tour in 2026 may be questions asked — what visitors choose to ask the tour about, in their own words. That data didn't exist five years ago. It exists now.

If you run visitor experience at a museum, you have heard the dwell-time argument from at least three directions: from a board that wants higher engagement numbers, from a vendor that wants to sell you audio that "increases time on object," and from a curator who quietly suspects the whole frame is wrong because some of their best stops are the quickest ones. All three positions have something going for them. The literature is more honest than any of them.

This piece walks through what the dwell-time research actually says about self-guided audio tours, where the effect is real, where it is overclaimed, and what a 2026 self-guided tour can measure that the previous generation of tours could not.

What does the dwell-time research actually say about audio guides?

The strongest finding is narrow and worth stating precisely: audio guides reliably extend the time visitors spend at the specific objects covered by audio, with smaller and more variable effects on total visit duration. A frequently cited observational study at a mid-size US institution found that students with an audio guide stayed longer at exhibits and displayed more inquisitive behavior than students without one. Multiple subsequent studies point in the same direction at the per-stop level.

What the literature does not support is the cleaner pitch ("audio guides make people stay longer in the museum"). The strongest case is "audio guides make people stay longer at the stops they listen to, and may slightly shorten time elsewhere as visitors budget around the audio." Total-visit effects are mixed. That distinction matters because it changes what dwell time, as a metric, is actually telling you.

What's the baseline visitors are starting from?

Most visitors look at most artworks for a very short time. This is not a moral failing — it's the structure of how museums get used. The canonical figure comes from Smith & Smith's 2001 study at the Metropolitan Museum of Art: across 150 observed visitors and six paintings, the mean time spent looking at a work of art was 27.2 seconds; the median, 17 seconds. The Louvre famously reported that visitors spend about 15 seconds in front of the Mona Lisa itself.

A 2017 follow-up by Carbon and colleagues, using tracking and a richer methodology, found mean first-look times of 33.9 seconds (median 25.1), with total viewing time across return visits closer to 50 seconds per artwork — meaningful for memorability research but still within the same order of magnitude.

The frame that follows: when a vendor tells you their audio guide "increases dwell time by 40%," ask what the baseline was. Forty percent of 17 seconds is 24 seconds. The honest claim is not that visitors are suddenly contemplating each work — it's that they're staying through one extra beat of looking, which is a real thing and worth getting, but not the thing the pitch sometimes implies.

What are Serrell's benchmarks, and why do they still matter?

Beverly Serrell's tracking-and-timing benchmarks remain the cleanest cross-institution way to ask whether an exhibition is being used at all, and most exhibitions don't clear them. Serrell's Paying Attention (1997, expanded as a 1998 AAM volume covering 110 studies) established two indices that are now standard in the field:

Sweep Rate Index (SRI) — square feet of exhibition divided by average visitor time. Lower means visitors are spending more time per square foot. Her benchmark: about 300.
Percent Diligent Visitors (%DV) — the percent of visitors who stop at more than half of the exhibit elements. Her benchmark: about 25%.

A 2010 follow-up with an additional 50 exhibitions found nearly identical averages — SRI around 300, %DV around 26%. A 2020 Visitor Studies aggregation extending Serrell's framework reached substantially the same conclusion.

What these benchmarks do well: they let a curator ask "is our exhibition being thoroughly used at all?" without comparing apples to oranges across topics. What they don't do: tell you whether the time visitors did spend was any good. They measure quantity, not quality. Serrell herself has been careful about that distinction in print. A lot of the people quoting her have not.

Does a self-guided audio tour change these numbers?

At the per-stop level, usually yes. At the whole-visit level, sometimes — and sometimes in directions you didn't intend. The cleanest summary of the evidence: a self-guided audio tour tends to (a) lengthen time at objects the visitor chooses to listen to, (b) raise the share of visitors who behave "diligently" at the covered stops, (c) modestly shorten time at uncovered stops as visitors budget around the audio, and (d) have a small, inconsistent net effect on total visit length.

The British Museum's An Audio State of Mind (Mannion, Sabiescu, and Robinson, Museums and the Web 2015) is the most honest piece I've found on what visitors actually do with audio guides. Their finding from observational work at a major institution: the dominant behavior with a traditional handset wasn't "tour following" (~10% of users), it was "code hunting" (~75%) — visitors driven by the search for audio-guide icons rather than the structure the curator designed. That's a dwell-time win and a curatorial frustration at the same time. Visitors stayed longer at stops with audio. They also skipped past stops without it.

If your goal is to lift dwell time on a specific set of objects, an audio tour is one of the most reliable tools available. If your goal is "longer visits," the evidence is weaker, and you have to be honest about that with your director.

Is dwell time even the right metric?

Dwell time is a proxy that conflates the visit you want with the visit you don't. A visitor who stops for 90 seconds because they're transfixed and a visitor who stops for 90 seconds because they can't find the exit produce the same line in the data. A visitor who blows past a stop because the wall card already told them what they came to know is indistinguishable, in the dwell numbers, from a visitor who blew past because nothing held them. The metric does not separate the visits you'd want more of from the ones you wouldn't.

This is one of the points Beverly Serrell makes most carefully in her own writing and the one most often dropped when she gets cited. Her benchmarks are diagnostics, not goals. The exhibitions that clear them tend to be good; the move from "clears the benchmark" to "is good" is not automatic in either direction.

This is also why the dwell-time-as-KPI argument tends to land badly with curators. They know it's a proxy. They know what the proxy misses. They've watched a visitor stand for three minutes in front of a piece they don't love because of an audio choice they didn't make, and they've watched a different visitor stop for ten seconds in front of the one piece in the room that actually mattered to them. Both look the same in the spreadsheet.

For a deeper take on this from a Convo angle, see The audio guide is not the product — the argument that the visit, not the audio, is the thing being measured.

What's the better signal a 2026 self-guided tour can give you?

Questions asked. What a visitor chooses to ask, in their own words, at a stop, is a richer signal than how long they stood there. A 2015-era audio guide had no way to capture it; the question was: did the visitor press play or didn't they. A 2026 conversational tour produces a log of what visitors actually asked. That's data with editorial signal in it — it tells the curator what the wall card didn't answer, what the audio didn't reach, what the visitor actually wanted to know.

Stop time tells you a visitor lingered. Questions asked tell you what they were lingering on. The two together are far more useful than either alone, and for the first time the second one is available without an evaluator with a clipboard.

For the methods piece on how to set this up — what to measure, what not to over-measure, how to read the data without overreading it — see measuring museum audio guide engagement.

Where this doesn't fit

A few cases where chasing dwell time, or chasing more from a self-guided tour, is the wrong move:

Crowd-flow-limited galleries. In a busy room with a single signature object, you may need visitors to move through the stop, not linger at it. A tour that lengthens the queue is a tour that makes the visit worse for the next person in line. Many institutions have a Mona Lisa problem at smaller scale.
Short, dense, well-labelled exhibitions. A 20-minute show with strong wall text is doing its job at 20 minutes. Pushing the dwell number up by lengthening the audio is gold-plating. Don't.
Visits where the visit isn't the unit. School groups, members who come weekly, visitors using the museum as a third place — none of these are well-served by treating "minutes per visit" as the headline number.
When the question rate is what you actually want to lift. If visitors are spending plenty of time and asking nothing, the problem isn't dwell time, it's permission. That's a content and design question, not an audio one.

FAQ

At the level of the specific stops covered, generally yes. At the level of the total visit, the effect is mixed and smaller than vendor marketing tends to imply. The honest answer is that audio guides lengthen stops on objects with audio and may shorten time elsewhere as visitors budget around the tour. Net effects on total time vary by institution.

There isn't a single number. The cleanest cross-institution benchmarks are Beverly Serrell's: Sweep Rate Index near 300 (lower is more time per square foot) and roughly 25% diligent visitors (visitors who stop at more than half of the elements). Most exhibitions in her dataset don't clear both. Whether they should is a separate question from whether they do.

The most-cited figure is from Smith and Smith's 2001 Metropolitan Museum study: mean of 27.2 seconds, median of 17 seconds. Carbon's 2017 follow-up found similar first-look times, with total time across re-visits closer to 50 seconds per work. The order of magnitude is consistent across studies.

It's a proxy with real limits. A long stop can mean engagement or confusion; a short stop can mean inattention or efficient understanding. Used alongside other signals — questions asked, repeat visits, recall on exit, programmatic outcomes — it earns its place. Used alone as a KPI, it tends to mislead.

Self-guided digital tours on visitor phones inherit most of the per-stop dwell-time literature on audio guides. The newer thing they make possible — capturing questions asked, language switches, and stop-by-stop drop-off — is more useful for curatorial editing than the dwell numbers themselves. Most platforms in the category, including Convo, surface both.

A defensible visitor-experience dashboard in 2026 usually combines: completion rate by stop and by language; questions asked per stop; question topics (what visitors didn't get from wall text); language distribution; and qualitative on-site or exit feedback. Dwell time stays on the list as a coarse signal — it just doesn't lead.

The verdict

The honest position on dwell time and self-guided tours is more useful than the marketing position. Audio tours do lengthen stops at the objects they cover. They don't reliably extend visits overall. Serrell's tracking-and-timing benchmarks are still the cleanest way to ask whether an exhibition is being thoroughly used, and they're still benchmarks most exhibitions don't clear — but clearing them is a diagnostic, not a goal. A 2026 self-guided tour can do something the previous generation could not: capture what visitors actually asked, in their own words, at the stop where they asked it. That signal, paired with a sober reading of stop times, is the dashboard worth building.

If you're building the measurement layer for a self-guided program, the companion piece is measuring museum audio guide engagement. For the broader visitor-experience map, the visitor experience pillar guide is the index.

About the author

Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about visitor experience from inside the category — drawing on tracking-and-timing literature, curator conversations, and the engagement data self-guided tour platforms now make available. Reach him at eric@convo.app or on LinkedIn.