DataStun — Performance grading (passive, every tier)

How we read network health Packetman saysWhy trust a passive reading? Because the numbers underneath it are the ones four decades of packet work taught us to watch. Retransmission rate — how often a host had to send a packet twice — is the closest thing the network has to an SOS signal, and the kernel already counts it for free. We measure each direction on its own, because TCP is half-duplex and an average across upload and download hides whichever one is failing. And when we do test on purpose, we sip — about ten packets of packet-pair dispersion finds the bottleneck without flooding your link. The agent should never cause the problem it's measuring.

Three forensic habits sit under every verdict on this page — they’re why a passive reading can be trusted as much as a test you ran on purpose.

PacketMan explains — hover to play

Retransmission rate is the SOS signal

The fraction of packets a host had to send twice is the single most powerful proxy for network health. We read it straight from the kernel, no probes. Out of ten thousand devices it points you at the thirty or forty that are actually struggling.

TCP is half-duplex — measure each direction

Upload and download succeed or fail on their own. We measure and report every metric per direction, A→B and B→A, because an average across both hides the one direction that’s failing.

Smart sampling, minimal footprint

When we measure actively, packet-pair dispersion — roughly ten packets — reveals the bottleneck without flooding the link. Brute-force flood tests are the anti-pattern, opt-in only. The agent should never cause the problem it’s measuring.

Four verdicts

Two metrics — median internet latency and retransmission percentage. Worst-of-two drives the grade so an app with great latency but high retransmits never gets a flattering average.

Excellent

Median internet RTT < 50 ms
Retransmits < 0.5%

A healthy connection to a nearby server — a typical fiber day.

Good

Median internet RTT < 100 ms
Retransmits < 1%

A typical broadband day — everything works, nobody's complaining.

Fair

Median internet RTT < 200 ms
Retransmits < 3%

A flaky Wi-Fi link or a slow path is starting to cost you. Worth a look.

Poor

Anything worse

Bad Wi-Fi signal, an upstream congestion event, or an app retrying every packet.

Sub-half-second sessions are excluded from grading — TCP hasn't escaped slow-start that early, so the latency reading would be noise rather than signal.

Every session you run — measured, and pinned to the program that opened it Packetman saysPacketMan here. A network number with no owner is half an answer — knowing the link is slow doesn't tell you if it's the line, the Wi-Fi, or one bad app. We read every qualifying TCP session from the kernel — throughput, round-trip time, retransmits — and tie it to the exact program that opened it. So the readout isn't "the network is slow," it's "Teams is hurting while OneDrive on the same machine is clean." Symptom, meet diagnosis.

We don’t sample a few connections and guess at the rest. Every qualifying TCP session your machine opens gets read — throughput each direction, round-trip latency, and retransmits — and tied to the exact program that opened it. That per-session reading is the raw material; everything else on this page is that material rolled up.

PacketMan explains — hover to play

Program (this device, last hour)	Throughput · down / up	Latency (RTT)	Retransmits	Grade
OneDrive	41 / 8 Mbps	44 ms	0.2%	Excellent
Chrome	18 / 1 Mbps	39 ms	0.4%	Excellent
Zoom	1.8 / 0.9 Mbps	96 ms	0.8%	Good
Microsoft Teams	2.1 / 0.4 Mbps	182 ms	3.1%	Fair

Illustrative readout. Throughput, latency, and retransmits are read per session and attributed to the owning process — so “Teams is the one struggling” is a fact on the screen, not a guess.

Micro to macro — the same reading, at four scales

One reading rolls up four ways, so the right answer is there whether you own one laptop or run ten thousand of them.

Micro · the session

One conversation

A single TCP session: throughput each direction, round-trip latency, retransmits — the rawest signal, owned by one process.

↑ the program

One app

Every session that executable ran today, rolled into one grade. Answers the eternal “is it the app or is it the network?”

↑ the machine

One device

Every program on that device, one verdict and a 24-hour timeline. This is the view the agent’s owner opens straight from the tray or dock.

Macro · the fleet

Every device

The same verdict across the whole tenant, ranked. The operator’s view through the dashboard menus — is it one desk, or the whole floor?

Two audiences, one measurement: the person who owns a single agent watches their own machine from the tray; the operator watches the fleet from the menus. Nobody has to ask the other for a number.

Three views, one click apart

Those scales show up as three surfaces in the product, each a click from your dashboard — the same data, sliced for the question you're asking.

📱

Per-device verdict Packetman saysLives on every agent detail page under the Performance tab. The grade pill is paired with a one-sentence headline ("This device is performing well") and the evidence ("97% sessions clean · median 38 ms · 0.2% retx"). Last-hiccup line shows the most recent fair-or-worse session — when, which app, and the numbers — so you can chase the actual cause.

One pill, one headline, one evidence line. "This device is performing well — 97% of sessions ran clean, median internet latency 38 ms, 0.2% retransmits." If anything's hurting it, the last-hiccup line names the app and when.

24h timeline — five-minute buckets, color = worst grade in that window. Dim grey = device idle.

📋

Per-app health Packetman saysSame grading rules applied to each app independently. Apps with fewer than 3 qualifying sessions are excluded — too few readings to grade fairly. Worst-graded apps surface first, so the thing hurting your experience jumps to the top of the table without you having to sort.

Every app you used today, graded the same way. Slack: Excellent · Edge: Good · OneDrive: Fair (180 ms · 3% retx). Worst-graded apps surface first, so the thing actually hurting you sits at the top.

📊

Fleet health page Packetman saysTenant-wide view at /agents/health. Top of every dashboard shows a single-line band: "Fleet is healthy — ● 8 Excellent · ● 3 Good · ● 1 Fair" with a link through to the full ranked tables. The page itself shows Fastest 10 + Slowest 10 with grade pill, median RTT, retx %, and session count, and the grade distribution as a horizontal bar.

The whole fleet, ranked. Fastest 10 on the left, Slowest 10 on the right, grade distribution across the top, median fleet latency right next to it. One click from your dashboard band tells you exactly which devices need attention.

Fleet mesh health — every pair, every direction Packetman saysPacketman here. Passive grading watches each session your device runs — but it can't tell you which agent-to-agent path inside your network is hurting. The mesh does. Paired agents run a controlled small-burst test between themselves on a schedule, in both directions independently, and the fleet dashboard collapses N-squared pairs into one heat-map. A path with 5 milliseconds in one direction and 80 in the other isn't a paradox — it's a clue. The chips up top tell you how many cells are within each link's own historical best, how many slipped a little, how many slipped a lot. One look, the whole fleet's inter-agent posture.

Every agent pairs with every other agent in a controlled small-burst test — both directions independently, on a schedule. The fleet dashboard collapses N² pairs into one heat-map so the worst legs jump out instantly. Below is the live readout from our own mesh.

Mesh Health — latency (ms), directional, baseline-relative

10 within baseline 18 outside 2 marginal 0 down row → column · source → destination

	G16	acer	gus mac lap1	linda	si	stun
G16 →	·	5 ms	80 ms	63 ms	2 ms	46 ms
acer →	6 ms	·	86 ms	81 ms	6 ms	49 ms
gus mac lap1 →	57 ms	65 ms	·	101 ms	63 ms	104 ms
linda →	61 ms	64 ms	99 ms	·	61 ms	101 ms
si →	1 ms	5 ms	186 ms	119 ms	·	45 ms
stun →	46 ms	51 ms	147 ms	109 ms	46 ms	·

Mesh Health — throughput (Mbps), directional, baseline-relative

2 within baseline 6 outside 22 marginal 0 down row → column · baseline = each link’s own best Mbps

	G16	acer	gus mac lap1	linda	si	stun
G16 →	·	19 Mb	12 Mb	2.0 Mb	19 Mb	17 Mb
acer →	21 Mb	·	18 Mb	1.2 Mb	27 Mb	6.1 Mb
gus mac lap1 →	1.9 Mb	2.2 Mb	·	4.6 Mb	2.2 Mb	189 Mb
linda →	1.9 Mb	2.0 Mb	3.5 Mb	·	2.4 Mb	1.3 Mb
si →	51 Mb	54 Mb	3.3 Mb	3.0 Mb	·	92 Mb
stun →	2.8 Mb	2.2 Mb	34 Mb	3.7 Mb	3.8 Mb	·

Live snapshot from our own mesh — six agents across LAN, home broadband, mobile, and server. Each cell is one direction of one pair; the colour compares the current reading to that link’s own historical best, so a fast link slipping is what jumps out, not a slow link staying slow. Cells with a midpoint dot are self-cells (an agent doesn’t test against itself). With N agents you get N² − N independent direction-specific readings: six agents produce 30 cells, ten agents produce 90, fifty agents produce 2,450 — one click of the menu surfaces them all.

What’s between every pair of agents Packetman saysPacketman again. Every cell in that heat-map is the result of a real connection between two agents, and getting that connection is harder than it sounds. Most pairs sit behind home routers or carrier NAT, so neither side can be called directly. Our coturn server gives each agent two things at startup: a STUN reading of what address-and-port the public internet sees it on, and a TURN relay slot in case nothing else works. Each agent advertises both to the others through the tenant platform, and then for every peer it tries the cheapest path first — same-LAN if they share a subnet, otherwise a coordinated hole-punch that opens a direct path through both NATs, and only if that fails does it fall back to relaying through coturn. The green path is what we want every time: coturn just told us each side's reflected port; the packets themselves go agent-to-agent. The lavender path is the always-works last resort.

The mesh measurement is honest only if the path it measures is the path your data would actually take. So agents try the cheapest, most direct path first — same-LAN if available, then a coordinated NAT hole-punch, then mutual relay through coturn as the last resort. The diagram below shows the ladder.

same-LAN · used when both agents share a subnet NAT ⇄ NAT hole-punch · coturn used for discovery, not in the data path mutual-relayed (TURN) · data through coturn, always-works last resort

Probes per 2-minute cycle (agent ↔ agent)

DTM1 echo → RTT (EWMA, α = 1/6) so a single bad sample doesn’t spike the chart
DTM3/4 packet-pair (×5) → bottleneck bandwidth in Mbps without flooding the link
DTM5 punch-hint → coordinated NAT crack when the path doesn’t already exist

When direct doesn’t exist yet

Both agents advertise their reflected IP:port and TURN relay through the tenant platform
Coordinated burst: each side fires 5 attempts × 64 port deltas above the reflected port
First reply locks the path; coturn never sees the data again on that pair
TURN relay is the always-works backstop — correct, but the measurement is then of the relay path, not the direct one

Every agent maintains a persistent TURN allocation on stun.datastun.com. That allocation gives the agent its reflected IP:port (STUN side) and a relay address (TURN side); both are advertised to every peer, so each peer can try LAN, punch, and relay in sequence and pick the one that works. The green path is the goal: coturn was contacted only to learn the reflected port, and the test packets go agent-to-agent. The lavender path is correct when nothing else works, but the measurement is then of the relayed leg.

Drill into one pair — the full path, both ends, every hop Packetman saysPacketman again. The heat-map is the overview; this is what you see when you click a cell. It's the actual traceroute between two of your agents, IPv6 in this case because both ends had public v6 — and v6-direct is preferred when it's available because it avoids the NAT-punch entirely. Each circle is one TTL slot, one router along the way. The solid cyan circles responded with ICMP Time-Exceeded so we have their IP and RTT; the dashed circles didn't — carriers often suppress ICMP on a subset of hops, especially in the core, so silent hops are normal and not an alarm. The hop's IP and RTT are staggered above and below the path so long IPv6 addresses don't overlap each other; a thin guide line ties each label to its hop. Hover any hop in the real product for rDNS, ASN, and geo; the NAT translations on each end show whose carrier you're crossing — AT&T to TWC in this case, which is why you see two ASNs on the trip.

Every cell in the heat-map above is a clickable drill-down. The per-pair view shows the actual route between two agents — both NAT translations, every IPv6 hop that answered, and the ones that didn’t. Below is the live readout for one of our own pairs: acer in Cedar Park to linda in Humble, 14 hops, half of them silent (which is normal — carriers suppress ICMP in the core).

responded (ICMP Time-Exceeded) silent (carrier suppressed ICMP at this hop) route line (direction of probe)

Each circle is one TTL slot. Solid cyan = the router at that hop responded with ICMP Time-Exceeded, so we have its IP and RTT. Dashed = no response within 900 ms (carriers commonly suppress ICMP on a subset of hops, especially in the core, so silent hops are normal and not an alarm). Each responding hop’s IP and RTT are staggered above and below the path so long IPv6 addresses don’t overlap; a thin guide line ties each label to its hop. The +N next to each RTT is the delta from the previous responding hop — a small positive number is healthy ladder behaviour; a sudden jump points at the leg where the latency is being added. In the real product, hover any hop for rDNS / ASN / geo; if the pair used a TURN relay (lavender on the connection ladder above), additional cyan curves appear at each end showing the relay leg.

Trickle, not flood — and between your devices, not to a CDN Packetman saysPacketman here. The popular internet speed tests — Ookla, fast.com, speedtest.net — work by hurling hundreds of megabytes of traffic at a single CDN node a thousand miles away and timing how long it takes. That's useful exactly once, when you want to know "how fast is my link, at this moment, in this one direction, to this one CDN." It's useless for everything else: it taxes the link you're trying to measure, it can't tell you what's between your laptop and your file server, and if you're on a metered cell connection it costs you real money. We do it differently. Small-burst smart sampling — roughly ten packets with inter-packet-gap analysis — gives us throughput and latency for kilobytes of cost, not megabytes. And we run those bursts between your agents, on a schedule, in both directions independently. That's how you get a heat-map of every leg inside your fleet without anyone noticing the test ran.

The popular internet speed tests hurl hundreds of megabytes at a single CDN node to clock your link in one direction. Useful once. Useless if you want to know what’s happening between your laptop and your file server — and expensive on a metered, mobile, or congested link. We do it differently: small-burst smart sampling between your agents, both directions, on a schedule.

The brute-force speed test

Ookla / fast.com / speedtest.net

Hurl as much traffic as the link will carry at a single CDN node and clock the throughput.

~600 MB+ of test traffic per run
Taxes the link you’re trying to measure — expensive on metered, mobile, or congested links
Measures you → a single CDN, in one direction at a time
Can’t tell you what’s between your laptop and your file server, your branch and your data centre, or two of your remote workers
You run it when you remember; the rest of the day is unmeasured

Smart-sampled mesh

DataStun agent-to-agent

Coordinated small bursts between your own agents, both directions, on a schedule.

~10 packets per probe, with inter-packet-gap analysis to estimate the bottleneck rate
Kilobytes per test, not megabytes — cell-backup and metered links are safe to enrol
Measures your stuff ↔ your stuff, both directions independently, every pair (the matrix above)
Reveals asymmetric paths a one-way CDN blast can’t see (5 ms one way, 80 ms the other is a clue, not a paradox)
Runs continuously on a 2-minute cycle; the dashboard always has fresh numbers

Why this matters in practice

No surprise bills. A 600 MB speed test on a metered cellular backup is real money; ten kilobytes is not. You can enrol every agent — including the road warriors on hotel Wi-Fi and the branch office on LTE failover — without flinching.
The path that actually matters. The path from your couch to a CDN node a thousand miles away is not the path that hurts when video conferencing stutters between two of your offices. We measure the path your traffic actually uses.
Continuous, not anecdotal. "I ran a speed test and it was fine" is one data point. We run thousands per day across the fleet, so when something slipped at 2 a.m. you have the receipt.
Asymmetry is a feature, not noise. An IP path is half-duplex — A→B and B→A traverse different routers, hit different queues, and can degrade independently. We measure each direction on its own so the asymmetric ones don’t hide inside a bidirectional average.

Need the deterministic-flood number anyway — for an ISP SLA dispute, a before-and-after change validation, a network-acceptance test? The Speed Test add-on runs that on demand, still agent-to-agent (or agent to a DT-operated heavy responder), still under your control. See the Speed Test add-on →

Passive vs active — we do both

Passive grading runs on every tier. The Speed Test add-on layers controlled measurement bursts on top when you want a deterministic number.

	Performance grading (this page)	Speed Test add-on →
How it measures	Reads kernel-native TCP stats from sessions your apps were already running	Active measurement bursts between paired agents (N² mesh)
Bandwidth cost	Zero — observes existing traffic only	Configurable; small-burst smart sampling by default, not Ookla-style flood
What you get	Verdict per device + per app + 24h timeline + fleet rank	Latency, throughput, jitter, packet loss, traceroute — per direction
When to use it	Always — it's the daily-driver "is anything wrong?" surface	When you need a deterministic before/after, or pair-specific path data
Tier	Every tier — included free	Per-agent add-on, Business and above
Privacy	Metadata only (latency, byte counts, retransmit counts) — never packet contents	Same; the burst is to a peer agent we control, not a third-party endpoint

The two are designed to work together. Passive grading runs continuously and tells you something is off; active testing answers how off with controlled, repeatable measurements.

Why passive is the right default

Synthetic probes are easy to fool, expensive to run at scale, and tell you about a moment in time. Real-traffic readings tell you what your users are actually experiencing.

⚡

Zero impact on your link

The agent reads numbers the kernel already tracks for its own TCP control loop. We don't generate traffic, we don't open extra sockets, we don't tax your bandwidth even on metered or congested links. The data was free; we just expose it.

👥

Reflects what users actually see

A speed test to a CDN node twenty miles away tells you the best-case capacity of your line. It doesn't tell you why Microsoft Teams is stuttering or why OneDrive uploads are taking forever. Reading the kernel stats from the actual Teams session does.

📶

Continuous, not "the moment you ran it"

A user runs a speed test, gets a fine number, and ten minutes later their call drops. Passive grading is always running, so the 24-hour timeline shows you exactly when things were bad — even if no one was at the keyboard to notice.

💾

Per-app, automatically

Active probes measure the network. Passive grading measures the network per application — because every TCP session is owned by a specific process, and we know which one. "Slack is fine, OneDrive is fair" is the answer; you didn't have to think to ask the question.

🚦

Fair-by-construction grading

Worst-of-two-metrics means an app with 30 ms latency but 5% retransmits grades Poor — the problem isn't hidden by an average. Sub-half-second sessions are excluded so quick connection probes don't drown out the real conversation. The thresholds are explicit, published, and calibratable per tenant later.

🔍

Fleet view comes free

One agent's verdict is useful. The same verdict applied across every device on your tenant lets you rank the fleet, surface the slowest 10, and answer "is it just my machine or is the whole office hurting today?" in one glance.

Privacy — where the numbers come from

We read what the kernel already tracks. We don't inspect packets, we don't decrypt anything, we don't proxy anything.

What the agent reads

Smoothed round-trip time per TCP session (Linux TCP_INFO.tcpi_rtt, Windows ESTATS hooks)
Retransmit count per session
Bytes sent and received
The OS process that owns each socket

All four are numbers the kernel tracks for its own control loop. We expose them; we don't compute them.

What the agent does not read

Packet contents — ever
HTTP request bodies, response bodies, URLs, headers
TLS-encrypted application data
What's in your messages, documents, or files

There is no decryption code path on the agent. There is no proxy. The data we see is the same data netstat sees — just continuous, attributed, and graded.

Where you’re good, where you’re not — and what to do about the “not”

Grading is the always-on signal: it tells you which device, which app, which hour went bad. When you need the cause — not just the location — the same agents escalate to active diagnostics, on demand. No new software, no truck roll.

1 · The grade flags it

Passive grading runs continuously and surfaces the device or app that slipped to Fair or Poor — with the hour, the throughput, the latency, and the retransmit count already attached. You start knowing where.

2 · The mesh measures it

Run an agent-to-agent test along the suspect path, both directions, to turn “something’s off” into a deterministic number — throughput, jitter, loss, and the routers on the way out and the way back.

See mesh & packet diagnostics →

3 · The packets prove it

When a number still isn’t enough, Advanced Packet Diagnostics captures the session on both ends, decodes it centrally, and hands PacketMan the trace for a plain-English diagnosis — with no capture tools installed on the machine.

How packet diagnostics works →

For paths that look fine on a traceroute but lose packets on every hop, Hop Starvation probes for the silent TTL-expiry drops that betray an injection appliance. The grade points; the diagnostics close.

Retransmission rate is the SOS signal

TCP is half-duplex — measure each direction

Smart sampling, minimal footprint

Four verdicts

Micro to macro — the same reading, at four scales

One conversation

One app

One device

Every device

Three views, one click apart

Mesh Health — latency (ms), directional, baseline-relative

Mesh Health — throughput (Mbps), directional, baseline-relative

Probes per 2-minute cycle (agent ↔ agent)

When direct doesn’t exist yet

Ookla / fast.com / speedtest.net

DataStun agent-to-agent

Why this matters in practice

Passive vs active — we do both

Why passive is the right default

Zero impact on your link

Reflects what users actually see

Continuous, not "the moment you ran it"

Per-app, automatically

Fair-by-construction grading

Fleet view comes free

Privacy — where the numbers come from

What the agent reads

What the agent does not read

Where you’re good, where you’re not — and what to do about the “not”

1 · The grade flags it

2 · The mesh measures it

3 · The packets prove it

More in this lane

Speed Test — agent-to-agent

Hop Starvation

Try it free on your fleet today