Six Things Your Sleep Tracker Cannot Actually Tell You
Consumer sleep trackers estimate sleep duration reasonably well. They classify sleep stages correctly about 65% of the time. Here's a precise account of where their limitations fall — and three things they're genuinely good for.
In this article8 sections
My sleep tracker told me this morning that I got 6:43 of sleep, with 1:12 of deep sleep and 1:08 of REM. I have almost no idea what to do with that.
I’ve come to understand this is the correct response — not ignorance, not dismissal, but a specifically calibrated skepticism about which questions these devices can answer and which they cannot.
Consumer sleep trackers — Oura Ring, Apple Watch, Fitbit, Garmin, WHOOP — use movement (accelerometry) and heart rate variability to estimate sleep staging. Independent validation studies find they track sleep duration reasonably well and classify sleep stages correctly about 60–70% of the time when compared to polysomnography, the clinical gold standard. That accuracy rate looks reasonable until you examine which questions people are actually asking their trackers.
1. What your sleep stages actually were
This is the central epistemological problem, and it’s buried in the fine print of every device’s accuracy documentation.
Sleep staging — identifying when you’re in light NREM, slow-wave deep sleep, or REM — is clinically assessed through polysomnography, which requires electrodes measuring brain electrical activity (EEG), eye movements (EOG), and muscle activity (EMG). Consumer wearables have none of these sensors. They infer sleep stages from movement patterns and heart rate variability — a reasonable approximation for distinguishing asleep from awake, and a poor one for distinguishing NREM substages from each other or from REM.
A 2019 systematic review by de Zambotti et al. in Sleep Medicine Reviews analyzed 22 validation studies of consumer wearables against PSG. Sensitivity for detecting REM sleep ranged from 37% to 72% across devices. Sensitivity for slow-wave sleep ranged even more widely. On average, the devices agreed with PSG on sleep staging about 65% of the time. For a clearer picture of what the stages actually represent, sleep architecture explained is a useful reference point.
Better than random. Not better than a confident guess. What this means for your “1:08 of REM”: it’s an estimate with wide, asymmetric error bars. Not a measurement.
2. Why you woke up
Your tracker knows you woke at 2:14 a.m. It registered movement, heart rate change, and deviation from expected sleep staging. What it cannot tell you: whether you woke because you were too warm, or because you heard a sound, or because your bladder reached a threshold, or because you naturally surfaced at the end of a sleep cycle, or because anxiety initiated an arousal, or because you were in a restless half-consciousness that didn’t involve full waking at all.
Context is the thing sleep trackers most conspicuously lack. All these events look identical to the accelerometer: movement, heart rate shift, staging change. The device records what. It has no access to why.
This matters because the “why” is almost always the actionable piece. Knowing you woke at 2:14 tells you nothing about whether to change your room temperature, reduce evening alcohol, limit fluids after 7 p.m., or simply accept that sleep occasionally fragments at REM transitions. The number gives you something to look at without telling you where to look.
3. Whether your score is good
“Your sleep score is 78.”
Seventy-eight out of what, exactly? Compared to whom?
Most consumer trackers provide proprietary sleep scores calibrated to your own historical data — not to population norms, not to clinical thresholds, not to any external reference that has been validated against outcomes. A score of 78 means you slept better than your median but worse than your best recent nights. It does not mean you slept well in any absolute or medically meaningful sense.
The problem is that scores read as performance metrics, so people optimize for them. A 78 triggers investigation: what changed, how to get to 85. But the score is measuring a moving average of itself, against itself. Dr. Rafael Pelayo at Stanford Sleep Medicine has noted that consumer sleep scores are difficult to interpret clinically because there is no validated reference range and no published norm population. You cannot tell a patient their sleep is abnormal on the basis of a proprietary index derived from their own prior nights.
Optimization pressure on a metric with no external anchor is how you end up rearranging your sleep to improve your score rather than your sleep.
4. The difference between sleep quality and next-day function
You can get eight hours of technically adequate sleep cycling and wake up exhausted. You can sleep five hours and feel sharp. The relationship between sleep duration, staging percentages, and next-day cognitive function is not linear for any individual.
Factors that modulate how you function from a given night’s sleep include: accumulated sleep debt from prior nights, where in the circadian cycle the sleep occurred, cortisol levels on waking, the emotional valence of the preceding day, body temperature during sleep, and individual recovery physiology. Sleep trackers measure one proxy (movement and heart rate) for one input (staging) to one variable (how you feel). Several inputs are invisible to them.
“1:08 of REM” could represent deeply restorative REM with full memory consolidation. It could represent fragmented, thin REM that provided little consolidation benefit. The tracker cannot distinguish these states.
5. What’s normal for you specifically
Population sleep studies provide averages: healthy adults spend roughly 20–25% of sleep in REM and 15–20% in slow-wave sleep. These figures come from PSG studies in specific populations under controlled conditions.
Your percentages are yours. Genetics, age, individual neurology, and prior sleep history produce wide variation in normal sleep architecture. Some people naturally run lower on REM and function well. Some run high on light NREM. Comparing your tracker output to a population average — or to the implicit norm the tracker algorithm is optimizing toward — may mean optimizing for the wrong target.
The tracker doesn’t know what’s normal for you. It knows what average looks like and flags deviation. Whether that deviation is a problem requires context the device doesn’t have.
6. Whether you should be worried
This is the limitation that causes the most actual harm.
Dr. Kelly Glazer Baron at Rush University Medical Center coined the term “orthosomnia” in a 2017 paper in the Journal of Clinical Sleep Medicine: clinical insomnia symptoms arising from anxiety about consumer sleep tracking data. There’s a full breakdown of orthosomnia if you recognize this pattern in yourself. The pattern: tracker reports poor sleep quality, user begins monitoring sleep more carefully, heightened monitoring increases pre-sleep arousal, arousal fragments sleep, tracker confirms poor quality, anxiety increases. A feedback loop, caused not by a sleep disorder but by giving a hypervigilant person a detailed nightly report about the thing they’re already anxious about.
There is a specific population susceptibility here. People most likely to purchase and consistently use sleep trackers overlap substantially with people who are already anxious about sleep quality. Hypervigilance about sleep is a recognized behavioral feature of insomnia. A device that produces detailed scoring and staging data, delivered every morning, may be pouring accelerant on what was originally a small fire.
Three Things They’re Actually Good At
Because this would be dishonest without them:
Long-term trend detection. A tracker is bad at telling you whether last night was good. It’s useful for noticing that your sleep quality has been declining for three weeks, or that it’s consistently lower when you travel, or that it correlates with your alcohol intake patterns. Trends across months, not scores on single nights.
Wake-time consistency tracking. This is the one thing actigraphy measures fairly reliably — when you’re consistently awake and consistently asleep. Consistent wake time is the single most evidence-backed behavioral lever for sleep quality, and trackers can reinforce this more easily than most other practices.
Behavioral correlation. If you’re logging other variables — exercise, alcohol, stress ratings, caffeine timing — and correlating them with your tracker output over weeks, the tracker becomes an input to a more useful feedback loop. The correlation isn’t precise, but the direction of effects tends to be reliable.
Keep the trend. Question the score.
DontSnooze focuses on wake-time consistency — the single sleep variable trackers measure reliably and that has the clearest behavioral leverage. No sleep score required, and no anxiety about one.
Frequently Asked Questions
Are some trackers more accurate than others? Yes. Devices with more sensors — optical heart rate plus skin temperature plus SpO2, as in newer Oura Ring and Apple Watch generations — perform better on validation studies than accelerometry-only devices. But even the best consumer trackers fall substantially short of PSG accuracy for sleep staging. The gap is partly fundamental to sensor type, not only to algorithm quality.
Should I stop using my tracker? Not necessarily. If you use it for trend detection rather than nightly scoring, it can be a useful tool. If you find yourself anxious about your sleep score, modifying your bedtime behavior in response to numbers, or spending significant mental energy interpreting the data each morning, a tracker-free period is worth experimenting with.
What is polysomnography and why is it the gold standard? Polysomnography (PSG) is a supervised overnight sleep study recording brain electrical activity (EEG), eye movements (EOG), muscle activity (EMG), respiratory effort, blood oxygen, and cardiac rhythm. It’s conducted in a clinical setting with trained technicians. All consumer sleep tracker validation studies compare device output to PSG as the reference standard.
What does “sleep score” actually measure? Proprietary sleep scores vary by device, but typically combine estimated sleep duration, sleep staging estimates, and heart rate variability metrics, weighted by the device manufacturer’s algorithm. They are calibrated to your own historical data, not to population norms or clinical thresholds. There is no published validation showing that a specific score threshold corresponds to a specific level of health risk or functional impairment.