← Documentation

How the agent mesh works Packetman saysThis page explains DataStun's measurement mesh — the part that makes us different from every single-point speed test. Instead of testing your connection to one far-away server, your own agents test each other, directly, peer to peer. The mesh organizes itself and heals itself. The clever bit is reachability: to test two machines that both sit behind home routers, you have to punch through their NATs the same way a video call does — discover each side's public address, then send packets straight across. We try the cheapest path first: same LAN, then IPv6 direct, then a NAT hole-punch, and only as a last resort a relay. We race several candidate paths at once and keep the working one warm so an on-demand test never has to start cold. And your traffic never flows through our servers when a direct path exists — that's the sovereignty point.

A single speed test measures your link to one server, somewhere. The DataStun mesh turns your own fleet into the test rig: every agent measures every other agent, directly, peer to peer — the way a video call connects, not the way a download works.

On this page

1. Why peer-to-peer beats single-point 2. The connect ladder — cheapest path first 3. Multi-candidate racing — pick the winner fast 4. One always-warm port — and why it matters 5. Two planes — always-on monitoring vs. on-demand deep-dive 6. Self-healing reachability

1. Why peer-to-peer beats single-point

If your video calls stutter between the branch office and HQ, a speed test to a public server tells you almost nothing — it measures a path you don’t care about. The mesh measures the paths you do care about: branch ↔ HQ, laptop ↔ data-center, every pair, continuously. When something degrades, the grid shows you exactly which leg.

And because agents talk directly to each other, your measurement traffic — and the metadata about your network it reveals — never has to traverse anyone else’s infrastructure. That’s the sovereignty payoff: the test is yours, end to end.

2. The connect ladder — cheapest path first

Two machines behind home or office routers can’t just send each other packets — their NATs drop unsolicited traffic. The mesh solves this exactly like a VoIP/ICE call does: a coordination server (STUN) helps each side learn its own public address, then the agents punch a path straight across. The relay is the last resort, not the default, because routing through a relay changes the very path we’re trying to measure.

STUN / TURN discovery & last-resort relay Agent A :41878 Agent B :41878 each learns its own public address direct path — STUN not in the data flow relay (TURN) — data through the relay — last resort only Tried in order, cheapest first: ① same-LAN ② IPv6 direct ③ IPv4 hole-punch ④ relay (last resort)
STUN coordinates; it never carries your data. A direct path wins whenever one exists — the relay only catches the pairs that truly can’t punch through.
1

Same-LAN

Both agents share a local network (same public egress IP). Talk directly over private addresses — nothing leaves the building.

2

IPv6 direct

If both have public IPv6 addresses, they connect with no NAT at all — the cleanest cross-internet path there is, and increasingly common.

3

IPv4 NAT hole-punch

Each learns its public address from STUN, then both send simultaneously so each NAT sees the other’s reply as “solicited” and lets it through. The data path is direct.

4

Relay — last resort

Some NATs (symmetric / carrier-grade) defeat punching. Only then do we relay through TURN, and the result is clearly labelled as relayed so it’s never mistaken for a direct measurement.

Family-matched, never cross-dialled. We pair IPv6 with IPv6 and IPv4 with IPv4, preferring IPv6 when both sides have it (it needs no punch). We never try to dial an IPv6 address from an IPv4-only side — a classic source of silent failures in naive implementations.

3. Multi-candidate racing

An agent usually has several plausible addresses for a peer — a LAN address, an IPv6 address, a NAT-mapped address, a reflected address. Rather than try them one at a time and wait for each to time out, we race them, the way modern browsers race IPv4 and IPv6 (“Happy Eyeballs”). The most-preferred candidate launches first; the rest follow a moment later, staggered. The first to answer wins; the losers are cancelled.

The payoff is speed and resilience: a dead-but-listed address can’t stall the whole test, because a working sibling answers while the dead one is still timing out. An on-demand test feels instant instead of waiting on a worst-case punch timeout.

Candidates raced with a small stagger LAN no answer IPv6 direct ✓ first to answer — wins IPv4 punch cancelled once IPv6 wins reflected cancelled t=0 preferred candidate gets a head start; the rest follow staggered
The first working path wins in a fraction of a worst-case timeout — dead candidates can’t hold the test hostage.

4. One always-warm port

The continuous mesh probes keep a single fixed UDP port open and a NAT pinhole warm at all times. Every measurement — latency, throughput, an on-demand diagnostic — rides that one already-open path, distinguished by a run identifier inside the packet rather than by opening a fresh port.

This matters more than it sounds. Opening a fresh port for a test means punching a fresh NAT hole, and on two residential connections that punch often silently fails — you’d see 100% loss on a path that actually works fine. By reusing the warm port, an on-demand test inherits a path the mesh already proved reachable two minutes ago. It also means many tests can run at once without colliding on a port.

Hard-won lesson. We learned this the painful way: a brand-new test port between two home connections read 100% loss, while the same pair’s mesh probe showed a healthy 50 ms round-trip and multi-megabit throughput. The fix — share the warm mesh socket — is now core to the design.

5. Two planes, one substrate

The mesh runs two kinds of measurement on the same secure, warm foundation.

Plane A · Monitoring always-on · every pair · ~2 min small, identical, standardized test → comparable trends across time & pairs latency + throughput heatmap, 24/7 Plane B · Diagnostics on-demand · operator or auto-triggered full lens matrix · custom parameters → packet capture, expert decode, deep-dive answer a specific question, in depth Shared substrate one warm authenticated port (UDP 41878) · direct-first ladder · per-pair-keyed, replay-protected
Monitoring keeps the path warm and the baseline current; diagnostics borrow that warmth to go deep on demand.

6. Self-healing reachability

Reachability isn’t set-and-forget — laptops move, ISPs re-NAT, Windows updates quietly close sockets. The mesh keeps a cache of each peer’s last-known-good path and re-confirms it (re-opening a lapsed NAT pinhole) right before a burst, so a test never starts cold against a stale address.

It also learns bidirectionally: when an agent receives a valid, authenticated probe from a peer, it immediately knows how to reach that peer back — reachability heals from the data plane itself, without waiting for its own probe cycle to come around. Heavy tests are scheduled so they never congest a shared uplink; light tests compose freely, dozens at a time.

Next: how the fabric stays secure →   Or: the measurement methods →