How the agent mesh works Packetman saysThis page explains DataStun's measurement mesh — the part that makes us different from every single-point speed test. Instead of testing your connection to one far-away server, your own agents test each other, directly, peer to peer. The mesh organizes itself and heals itself. The clever bit is reachability: to test two machines that both sit behind home routers, you have to punch through their NATs the same way a video call does — discover each side's public address, then send packets straight across. We try the cheapest path first: same LAN, then IPv6 direct, then a NAT hole-punch, and only as a last resort a relay. We race several candidate paths at once and keep the working one warm so an on-demand test never has to start cold. And your traffic never flows through our servers when a direct path exists — that's the sovereignty point.

A single speed test measures your link to one server, somewhere. The DataStun mesh turns your own fleet into the test rig: every agent measures every other agent, directly, peer to peer — the way a video call connects, not the way a download works.

1. Why peer-to-peer beats single-point

If your video calls stutter between the branch office and HQ, a speed test to a public server tells you almost nothing — it measures a path you don’t care about. The mesh measures the paths you do care about: branch ↔ HQ, laptop ↔ data-center, every pair, continuously. When something degrades, the grid shows you exactly which leg.

And because agents talk directly to each other, your measurement traffic — and the metadata about your network it reveals — never has to traverse anyone else’s infrastructure. That’s the sovereignty payoff: the test is yours, end to end.

2. The connect ladder — cheapest path first

Two machines behind home or office routers can’t just send each other packets — their NATs drop unsolicited traffic. The mesh solves this exactly like a VoIP/ICE call does: a coordination server (STUN) helps each side learn its own public address, then the agents punch a path straight across. The relay is the last resort, not the default, because routing through a relay changes the very path we’re trying to measure.

STUN coordinates; it never carries your data. A direct path wins whenever one exists — the relay only catches the pairs that truly can’t punch through.

Same-LAN

Both agents share a local network (same public egress IP). Talk directly over private addresses — nothing leaves the building.

IPv6 direct

If both have public IPv6 addresses, they connect with no NAT at all — the cleanest cross-internet path there is, and increasingly common.

IPv4 NAT hole-punch

Each learns its public address from STUN, then both send simultaneously so each NAT sees the other’s reply as “solicited” and lets it through. The data path is direct.

Relay — last resort

Some NATs (symmetric / carrier-grade) defeat punching. Only then do we relay through TURN, and the result is clearly labelled as relayed so it’s never mistaken for a direct measurement.

Family-matched, never cross-dialled. We pair IPv6 with IPv6 and IPv4 with IPv4, preferring IPv6 when both sides have it (it needs no punch). We never try to dial an IPv6 address from an IPv4-only side — a classic source of silent failures in naive implementations.

3. Multi-candidate racing

An agent usually has several plausible addresses for a peer — a LAN address, an IPv6 address, a NAT-mapped address, a reflected address. Rather than try them one at a time and wait for each to time out, we race them, the way modern browsers race IPv4 and IPv6 (“Happy Eyeballs”). The most-preferred candidate launches first; the rest follow a moment later, staggered. The first to answer wins; the losers are cancelled.

The payoff is speed and resilience: a dead-but-listed address can’t stall the whole test, because a working sibling answers while the dead one is still timing out. An on-demand test feels instant instead of waiting on a worst-case punch timeout.

The first working path wins in a fraction of a worst-case timeout — dead candidates can’t hold the test hostage.

4. One always-warm port

The continuous mesh probes keep a single fixed UDP port open and a NAT pinhole warm at all times. Every measurement — latency, throughput, an on-demand diagnostic — rides that one already-open path, distinguished by a run identifier inside the packet rather than by opening a fresh port.

This matters more than it sounds. Opening a fresh port for a test means punching a fresh NAT hole, and on two residential connections that punch often silently fails — you’d see 100% loss on a path that actually works fine. By reusing the warm port, an on-demand test inherits a path the mesh already proved reachable two minutes ago. It also means many tests can run at once without colliding on a port.

Hard-won lesson. We learned this the painful way: a brand-new test port between two home connections read 100% loss, while the same pair’s mesh probe showed a healthy 50 ms round-trip and multi-megabit throughput. The fix — share the warm mesh socket — is now core to the design.

5. Two planes, one substrate

The mesh runs two kinds of measurement on the same secure, warm foundation.

Monitoring keeps the path warm and the baseline current; diagnostics borrow that warmth to go deep on demand.

Monitoring is small, standardized, and identical every cycle — which is exactly what makes results comparable across time and across pairs. It’s the heatmap that’s always there.
Diagnostics is flexible: choose the protocol lens, the ports, the payload size, turn on packet capture. It runs when an operator asks — or automatically when the monitoring plane sees something worth a closer look.

6. Self-healing reachability

Reachability isn’t set-and-forget — laptops move, ISPs re-NAT, Windows updates quietly close sockets. The mesh keeps a cache of each peer’s last-known-good path and re-confirms it (re-opening a lapsed NAT pinhole) right before a burst, so a test never starts cold against a stale address.

It also learns bidirectionally: when an agent receives a valid, authenticated probe from a peer, it immediately knows how to reach that peer back — reachability heals from the data plane itself, without waiting for its own probe cycle to come around. Heavy tests are scheduled so they never congest a shared uplink; light tests compose freely, dozens at a time.

Next: how the fabric stays secure → Or: the measurement methods →

On this page