Measuring museum audio guide engagement: what to track.

KEY TAKEAWAYS

Audio guide analytics fall into two families: broadcast metrics (start rate, completion rate, drop-off by stop) measure whether visitors listened, and conversation metrics (questions per visit, top question themes) measure what they wanted to know. A serious program tracks both.
The single most useful slide for a board is not a listen count — it's a topic cluster of the questions visitors actually asked. It's a portrait of the relationship your collection is building, not a ratings chart for the audio.
Industry baselines for native museum apps are sobering — cross-industry research consistently puts adoption in the low single-digit percentages of visitors (Frankly Green + Webb). QR-launched web guides typically clear that bar by an order of magnitude, but the right comparison is your own start rate over time, not the industry mean.
Privacy is the constraint that determines whether any of this is even legal to collect. Anonymous aggregates, consent at the QR scan, and a per-visit ephemeral session are the floor for GDPR-aligned operation.
Build the dashboard you'll use in the curators' meeting, not the one the vendor ships. If you can't draw a line from a number to a decision a curator would make, the number is a vanity metric.

Most audio guide programs are measured the way TV shows are measured: how many people pressed play, how long they listened, where they stopped. Those numbers are real, and they belong in the dashboard. They are not, by themselves, a picture of whether visitors had a good visit. That picture lives one layer deeper — in what visitors wanted from the tour, which is now measurable in a way it was not five years ago, because the tour can be asked questions.

This piece is the practical version of that argument. It walks through the metrics worth tracking, the baselines worth comparing against, the slide a board will actually read, and the privacy posture that has to sit underneath all of it. It's written for the visitor-experience lead, education manager, or director who has to defend the audio program's value at the next quarterly review.

What's the difference between broadcast and conversation metrics?

Broadcast metrics measure whether the visitor listened. Conversation metrics measure what they wanted to know. Treat them as two halves of the same picture, not as alternatives.

Broadcast metrics are the legacy set: start rate (what share of visitors launched the tour), completion rate (what share finished a stop), and drop-off by stop (which stops bleed listeners). These tell you whether the audio worked as a piece of media. They are the metrics every legacy vendor reports because they are the only ones their tours can produce.

Conversation metrics are new. They exist only on platforms where the visitor can ask a question after — or instead of — the script. They include questions asked per visit, the top themes those questions clustered into, the share of visits that included at least one question, and the share of stops that triggered a follow-up. These tell you what curiosity your interpretation actually produced.

A program that reports only broadcast is reporting the audio's performance as a recording. A program that reports both is reporting the visit. This is the shift I've argued about elsewhere — that the audio guide is not the product — and analytics is where the argument gets operationalized.

What broadcast metrics should you track?

Five broadcast metrics carry their weight: start rate, completion rate, average stops per visit, drop-off by stop, and language mix. Everything else is decoration.

Start rate is the share of unique on-site visitors who launch the tour, measured against gate count for the same period. The industry comparison is bracing: years of cross-industry research on cultural-organisation apps — most consistently the long-running tracking by Frankly Green + Webb — keep finding that the average museum app pulls visitor adoption in the low single-digit percentages, with double-digit adoption rare enough to be treated as the outlier rather than the operating expectation. A QR-launched web guide should clear the app baseline easily — single-digit-percent is a problem to investigate, double-digit is normal, and the right comparison after the first quarter is your own trend line, not the average.

Completion rate is the share of stop-starts that finished. Track it per stop, not per tour. A stop with 90% completion is doing its job; a stop with 30% completion is too long, too dense, or in the wrong place.

Average stops per visit tells you how the tour is being used — as a guided walk (high), as a hop-around reference (medium), or as a single-stop curiosity (low). All three are valid; the value is in the trend.

Drop-off by stop is the chart you'll stare at most. It shows where listeners leave. The villain is rarely the script — it's usually the gap. A stop that follows a long walk between rooms loses listeners every time.

Language mix is the share of starts in each language. It's also a board slide on its own. We'll come back to that.

What conversation metrics should you track?

Three conversation metrics matter: questions per visit, top question themes, and the share of visits with at least one question.

Questions per visit is the simplest signal of engagement depth. Anything above one is a tour the visitor stopped passively consuming; anything above three is a tour they used to think with. This number will be a fraction in your first month and grows as visitors learn the tour invites the question.

Top question themes is the slide. A grouped, anonymized cluster of what visitors asked across the museum — material and technique, provenance, symbolism, artist biography, conservation, comparison to other pieces — tells you what your interpretation primed and what it left underdeveloped. We will spend more time on this below.

Share of visits with at least one question is the engagement floor. Most visits will still be listen-through. The trend in this number, more than its absolute value, tells you whether the conversational layer is being discovered.

How do you read drop-off honestly?

Independent research on art viewing has converged on short median dwell times in front of paintings — even before audio. The implication for audio: if your stop runs ninety seconds and visitors are dropping off at the forty-second mark, the right reading is that the script outlasted the object's natural pull, not that visitors are disengaged. Cut the stop. The visitor who wants more can ask. For more on dwell-time literature and how to read it without overreading it, see the self-guided tours and dwell time guide.

How do you build a board-ready slide from the data?

The board slide is not the dashboard. A director presenting to a board has thirty seconds per chart. The dashboard you live in has many metrics; the slide that goes upstairs has two.

The two that earn the slide:

1. The language access chart. A simple stacked bar showing the share of tour starts by language, over the quarter. For most US institutions, this is the chart that re-frames the program around mission. "Twenty-two percent of our audio sessions last quarter were in Spanish, Mandarin, French, or Korean — visitors we previously served only in English" is a sentence a board chair understands immediately. It maps directly to the institution's access goals and to the multilingual reality of the audience.

2. The topic cluster slide. This is the one nobody could produce before. A grouped, anonymized view of what visitors asked about across the museum, sorted by volume. Material and technique. Provenance. The artist's life. Conservation. Symbolism. The cluster is a portrait of the relationship your collection is producing — what it stoked, what it left unsatisfied, what it accidentally said the loudest. It is the closest thing audio analytics produces to an answer to why does this institution exist?

Two charts. One page. That's the slide.

Where does this stop being useful?

It stops being useful when you measure the metric instead of the visit. Three failure modes are worth naming.

The first is goosing the start rate by chasing it. A start rate inflated by aggressive QR signage at the gate but flat completion and zero questions is worse than a smaller, voluntary start rate from visitors who wanted the tour. The number went up; the visit got worse.

The second is treating completion rate as a quality score. A stop with 95% completion may be a stop nobody can leave — the audio is twenty seconds long and runs on autoplay. A stop with 60% completion that triggered three follow-up questions did more for the visit than the 95% stop did. The numbers without the question layer can mislead.

The third is using the topic cluster as a content-strategy autopilot. The cluster tells you what visitors asked. It does not tell you what they should have asked, or what the institution exists to put in front of them. A great audio program shows visitors things they didn't know to be curious about. Listening only to the cluster optimizes for the curiosity that's already there, not the one curatorial work creates. The cluster informs; it does not decide.

This is the place to be honest that a good chunk of the most meaningful effect of a museum visit is unmeasurable and should stay that way. The painting that lodged in the visitor's head for a week, the conversation on the train home, the return trip with a friend — none of that surfaces in analytics. Measure what's measurable so the program can defend itself; resist the urge to claim what isn't.

What about visitor privacy?

Privacy is the floor, not a feature. Three principles cover most of what a museum's IT and legal reviewers will ask about.

First, default to anonymous aggregates. Individual visitor identifiers are almost never necessary to answer the questions the program needs to answer. Group-level metrics — share of visitors who started, completion distribution, question themes — produce the same insights and clear GDPR's data minimization bar. If you don't need the identity, don't collect it.

Second, treat each visit as an ephemeral session. The QR scan opens a session; the session closes when the visitor leaves the site or the browser tab. No cross-visit tracking, no persistent identifiers, no fingerprinting. This is both the right ethical posture and the easiest legal one.

Third, surface consent at the QR scan, not in fine print. A short, plain-English notice — "This guide tracks anonymous usage to improve future tours. It doesn't track who you are or where you go after this visit." — works for most jurisdictions and most reviewers. Where stricter regimes apply (the EU, California for under-13 visitors, school-group contexts), pair it with an explicit consent action.

Privacy is also a place where the conversational layer is actually safer than legacy analytics. Anonymized question themes do not require identifying who asked. Compare that to RFID badges, Wi-Fi triangulation, or CV-based people-counting — all of which can produce useful data and all of which carry materially more privacy weight.

How does this connect to the rest of the visitor-experience program?

If you've gotten this far, you're past the question of which metrics matter and into the question of what the program is for. Analytics is the feedback layer that makes a tour a system rather than a deliverable. The visitor-experience pillar guide lays out the rest of that system — the BYOD assumptions, the dwell-time work, the QR-code logistics, and the way 2026 visitors already expect to engage with a gallery. Read it next.

FAQ

There is no industry-wide answer because the comparison set is so noisy. The published industry research on native museum apps converges on low-single-digit adoption rates (Frankly Green + Webb); QR-launched web guides typically clear that by an order of magnitude. The useful benchmark after your first quarter is your own trend line, segmented by entrance, signage placement, and language.

Aggregates almost always. Individual journeys are rarely needed for the questions the program has to answer, and they make the privacy posture much harder to defend. If you have a research need for individual-level data, run it as a separately consented study with a clear scope and retention window — not as the default analytics layer.

Monthly for the program lead, quarterly for the curatorial team, twice a year for the board. The most underrated review is the curatorial one — handing curators the topic cluster for their gallery is how analytics actually changes the writing.

Vanity numbers that don't drive decisions. Cumulative-since-launch totals (they only go up). Average session length without context (a long session might be lost visitors). Per-stop play counts without completion (loud signage at one stop distorts the comparison). If you can't connect the number to a curator's decision, drop it from the dashboard.

No — those metrics require a guide that accepts visitor questions. A broadcast-only platform can only ever produce broadcast metrics. The category shift to conversation is what makes the topic-cluster slide possible at all, and it's the dimension worth pressing vendors on hardest in evaluation.

The line to leave you with

The metric a board remembers is not the listen count. It's the slide that shows what visitors wanted to know, grouped by theme, sorted by volume. Build the program so that slide is possible to make, and the rest of the dashboard organizes itself around it.

If you want the wider context for how visitor expectations have shifted into BYOD, QR codes, and conversational tours, the visitor-experience pillar guide is the place to start. If you want the argument for why the conversation matters more than the broadcast, the audio guide is not the product is the longer version of the case this piece is operationalizing.

About the author

Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about the economics and craft of museum interpretation from inside the category, drawing on visitor analytics, curator conversations, and the production economics of both the legacy studio-and-handset model and the AI-narrated model. Reach him at eric@convo.app or on LinkedIn.