What we collect.

Every category of data the platform observes, where it lives, why we keep it, and who can see it. Read this end-to-end if you want to make an informed decision; jump to a specific layer or category via the filters below.

DataStun is in beta. The service is provided “AS IS.” This is the receipt that backs every claim in the Privacy Policy — binding as written today, and updated in the same release as any change to what we collect. We will not collect data we have not described here. If anything below would prevent your organization from using the product as it exists today, please tell us before you sign up.

The three-layer architecture, at a glance

DataStun is three independent systems. Knowing which one holds which data is the foundation of everything else on this page.

Agent

On the customer's endpoint. Reads OS / network state and reports it to ten over an outbound HTTPS connection. No inbound port, no listening socket; the agent never accepts inbound connections.

Ten

The tenant platform. Holds the customer's identity, their fleet's observations, their support tickets, and their account state. This is the system the customer's users log into.

Rep

The reputation provider. Holds verdicts on public objects (IPs, hashes, DNS names) keyed by the object itself, not by who asked. No customer record carries through to rep's caches.

The rep anonymity boundary

Reputation is a property of the target (an IP address, a file hash, a DNS name) — not of the customer who asked. Rep’s authoritative caches (reputation.ip_cache, reputation.file_cache) are keyed solely by the target. There is no tenant identifier on those records. One verdict serves every customer who looks up the same IP or hash.

The only place any tenant trace exists on rep is the pending-lookup queue, where the first asker is recorded as an audit-trail field for accountability. That field is intentionally not used in the verdict and does not appear on the cached result. The schema enforces this: there is no foreign key from rep’s caches back to a tenant.

Three options if any of this crosses a line for you

We will not pretend the choice is "use it or don’t use it." If a category of data on this page is something your organization is not comfortable having on managed infrastructure, you have three real options:

1. Don’t use the product

The simplest answer. Read this page, decide it’s not for you, walk away with no obligation. We’d rather lose you here than have you uncomfortable later.

2. Use the managed cloud, with full understanding

The default deployment. We operate ten and rep on infrastructure we manage; you run agents on your endpoints. This page is the receipt for everything we’ll see.

3. Self-host part or all of the platform

The agent, ten, and rep are independently deployable. You can run any of them on your own cloud or on-premises. Each entry below has a "Self-host" note describing what changes if you do.

What we collect, by category

Every category of data we observe, keep, or share is below. Use the filter to narrow down by layer (where it lives) or by what you most want to understand. Use Cmd / Ctrl-F to search the page if you remember a keyword.

Layer
Crosses to rep
Sensitivity

Outbound TCP / UDP connection metadata

agent Network → rep: Target only Required

For every network connection a process on your endpoint opens to the outside world, the agent records the connection metadata — not the connection contents. We never see the bytes you exchanged with the destination.

  • Destination IP address and port
  • Protocol (tcp / udp)
  • Process name and full executable path
  • SHA-256 hash of the process binary
  • Code-signing publisher (if signed)
  • Bytes transmitted / received per session
  • Session duration
  • TCP retransmission and round-trip-time stats
Why we collect it: Core visibility feature — the entire dashboard, blocklist enforcement decisions, exposure-policy alerts, and SBOM rollups are derived from this stream. Without it, the product has no observable surface.
Where it’s stored: analytics.flows table on ten
Where you can see it: /destinations, /map, /intel, /insights/sbom, /alerts
Rep boundary detail: Only the destination IP and the process hash cross to rep — and they cross stripped of tenant identifier. Rep produces a verdict on the IP / hash itself; that verdict is then served back to every customer who asks about the same IP / hash.
Self-host: If you self-host ten + rep on your own infrastructure, all of this stays on your premises. Nothing leaves your boundary.

Inbound connection metadata

agent Network → rep: Target only Feature-dependent

The mirror of outbound: when something connects TO your machine, the agent records the same five-tuple metadata, plus a flag for whether the local process was a listener (a server waiting for connections) versus an ephemeral socket. Powers the exposure-policy feature.

  • Source IP address and port
  • Local listener IP, port, and process
  • Protocol
  • Bytes / duration
  • Listener flag
Why we collect it: Detects unsolicited inbound contact and validates configured exposure policies (e.g., "this machine should not be a database server on the public internet").
Where it’s stored: analytics.flows table, direction=inbound
Where you can see it: /destinations (inbound tab), /alerts
Rep boundary detail: Source IPs reach rep for reputation lookup with the same anonymity guarantee as outbound destinations.
Self-host: Stays on-prem when ten is self-hosted.

QUIC / TLS handshake metadata

agent Network → rep: Target only Feature-dependent

For UDP traffic on port 443 (QUIC), the Windows agent extracts the SNI hostname and ALPN protocol identifier from the unencrypted ClientHello packet. We do not decrypt the session; we read fields the protocol places in the clear by design.

  • Server Name Indication (e.g., "google.com")
  • ALPN identifier (e.g., "h3", "h2")
Why we collect it: Without SNI, all 443/UDP destinations look identical to any observer (CDN IPs front thousands of services). SNI is what lets us label a flow as "Slack" vs. "YouTube" rather than "Cloudflare on 443."
Where it’s stored: Embedded in analytics.flows.dns_name
Where you can see it: /destinations, /intel
Rep boundary detail: SNI hostnames are sent to rep for domain-category lookup; rep stores them in its global ip_cache without customer association.
Self-host: Capture happens on the endpoint regardless of where ten lives. Only the lookup destination (rep) determines whether the hostname leaves your environment.

ICMP path observations

agent Network → rep: Target only Feature-dependent

When the agent fires low-TTL probes to characterize the network path to a destination (used by Hop Starvation diagnostics and the Sources page), routers along the path return ICMP TTL-exceeded replies. We record the responding-router IPs.

  • Router IP address
  • Inner destination IP (the target the probe was aimed at)
  • Inner TTL value
  • Receive timestamp
Why we collect it: Path-stability diagnostics, traceroute-quality return-path mapping, MITM / route-rewriting detection.
Where it’s stored: path_intelligence.icmp_time_exceeded
Where you can see it: /hops
Rep boundary detail: Router IPs are queued for rep reputation lookup on the same anonymous basis as flow destinations.
Self-host: Stays in your ten when self-hosted.

TCP / UDP / IP stack counters

agent Network Operational → rep: Never Optional

OS-level network-stack counters polled at heartbeat cadence — totals across the host, not per-connection. Used to detect path-loss versus host-internal problems versus encapsulation overhead.

  • TCP segments out, in, retransmitted, with errors
  • TCP fast-retransmits, slow-start retransmits, spurious RTOs (Linux)
  • UDP packet errors, no-port-listening events
  • IP fragmentation / reassembly failures
  • Boot-time epoch (for counter-wrap detection)
Why we collect it: Distinguishes "your link to the ISP is bad" from "your machine is too busy" from "your VPN is fragmenting."
Where it’s stored: analytics.heartbeats.metrics.stack_stats
Where you can see it: /my-device, /agents/:id diagnostics
Self-host: Stays in your ten when self-hosted.

Host hardware inventory

agent Host → rep: Never Feature-dependent

A daily snapshot of the machine's hardware: what CPU, how much memory, which disks, which network interfaces, which USB devices. Re-collected only when something changes (mtime-gated).

  • CPU model name and core counts
  • Total RAM
  • Disk mount points, sizes, and free space
  • Network interface names, MAC addresses, IPv4 / IPv6 addresses
  • Default gateway
  • USB device vendor / product / serial
Why we collect it: Powers the SBOM and patch-coverage views, supports asset inventory for compliance reports, surfaces unexpected USB devices on workstations that should not have them.
Where it’s stored: analytics.heartbeats.metrics.host_inventory
Where you can see it: /my-device, /agents/:id Host tab
Self-host: Stays in your ten when self-hosted.

Operating-system inventory

agent Host → rep: Never Required

The OS family, version, build, and hostname. Identifies the device, locates it in the patch-currency view, and is required for any platform-specific telemetry collector to know which collector to run.

  • OS name (Windows / Linux / macOS) and distribution
  • OS version, build, kernel
  • Machine hostname
  • Last boot time
  • Locale and timezone
Why we collect it: Required for the agent itself to operate; informs platform-specific behavior throughout the system.
Where it’s stored: control.agents (canonical record) + analytics.heartbeats.metrics.host_inventory (snapshot history)
Where you can see it: /agents, /my-device
Self-host: Stays in your ten when self-hosted.

Installed-software inventory

agent Host → rep: Never Feature-dependent

List of installed applications on the machine, from each platform's native source (Windows Registry uninstall keys, macOS .app bundles, Linux package manager).

  • Application name
  • Version
  • Publisher (if available)
  • Install date
Why we collect it: Powers the Fleet SBOM, supply-chain risk analysis, vendor-concentration views, license-reconciliation features.
Where it’s stored: analytics.heartbeats.metrics.host_inventory.installed_apps
Where you can see it: /insights/sbom, /my-device installed-apps tab
Self-host: Stays in your ten when self-hosted.

Patch / update inventory

agent Host → rep: Never Feature-dependent

Pending operating-system updates and reboot-required state. Lets the patch-lag scoreboard show how far behind a fleet is.

  • Pending update titles and KB / advisory IDs
  • Severity classifications
  • Reboot-pending flag
Why we collect it: Patch-lag is the highest-leverage security signal in any fleet; this is what powers it.
Where it’s stored: analytics.heartbeats.metrics.host_inventory.pending_updates
Where you can see it: /agents, /insights patch-lag scoreboard
Self-host: Stays in your ten when self-hosted.

Security-configuration posture

agent Host → rep: Never Feature-dependent

Whether the machine's built-in security features are turned on. Posture only — we don't read AV scan logs or read any keys / passphrases.

  • Firewall enabled and provider
  • Disk encryption enabled and provider
  • Antivirus product names and enabled state
  • TPM and Secure Boot status (Windows)
  • Local administrator account list
Why we collect it: Compliance reporting and an at-a-glance "is this fleet baseline-secured" view.
Where it’s stored: analytics.heartbeats.metrics.host_inventory
Where you can see it: /agents/:id Host tab
Self-host: Stays in your ten when self-hosted.

Live performance metrics

agent Host Operational → rep: Never Optional

Lightweight performance counters sampled at heartbeat cadence (~once per minute). Costs single-digit milliseconds per sample.

  • CPU utilization average since last sample
  • RAM and swap usage
  • Disk usage per mount point
  • Logged-in user count (count only — never names)
  • Unix load average (Linux / macOS)
Why we collect it: Real-time view of fleet health, anomaly detection (a normally-idle workstation suddenly at 100% CPU).
Where it’s stored: analytics.heartbeats.metrics.host_live
Where you can see it: /my-device Stats, /agents/:id graphs
Self-host: Stays in your ten when self-hosted.

File hashes (SHA-256)

agent Files → rep: Target only Required

When a binary opens a network connection for the first time, the agent computes its SHA-256 fingerprint. This is identity, not contents — a hash uniquely names a binary if you have the original to compare to, but cannot be reversed back to the binary itself.

  • SHA-256 hash of executable file
  • File name (basename) and full path
  • File size
Why we collect it: Hash-anchored identity is the cleanest answer to "is this binary safe?" — immune to rename, copy, and superficial obfuscation. The cluster of reputation chips on the SBOM page (SIG, NSRL, MBZ, VT) all key off this hash.
Where it’s stored: analytics.flows.process_hash
Where you can see it: /insights/sbom, /destinations
Rep boundary detail: The hash crosses to rep for reputation lookup. The hash is the only thing that crosses — never the binary itself, never the customer's identity.
Self-host: If you self-host ten and rep both, hashes never leave your boundary. If you self-host ten only, hashes are forwarded to our reputation provider for verdict; the verdict comes back, and other customers benefit from the same lookup. If you self-host both, your reputation provider can still federate with ours over a signed peer link, or stay isolated.

Code-signing metadata

agent Files → rep: Target only Feature-dependent

For each hashed binary, the agent reads the embedded code signature on disk: Authenticode on Windows, codesign on macOS, package signatures on Linux. We extract the publisher name and certificate thumbprint, not the certificate's private key (we couldn't even if we wanted to — only the public chain is on the binary).

  • Signer publisher name (Common Name on the certificate)
  • Certificate SHA-1 thumbprint
  • Signer kind (Authenticode / codesign / dpkg / rpm)
  • Repository / package source (Linux)
Why we collect it: Powers the trusted-publisher shortcut: a Microsoft-signed binary skips paid AV-aggregator lookups, saving the budget for the long tail of unsigned and unfamiliar binaries where it actually matters.
Where it’s stored: analytics.flows.signer_*
Where you can see it: /insights/sbom, /destinations per-binary view
Rep boundary detail: Signer metadata is sent with the file-lookup request to enable the shortcut decision; rep stores it on the global file_cache row.
Self-host: Stays in your ten when self-hosted; reaches rep when the file-reputation lookup runs.

DNS resolver state

agent Network Operational → rep: Never Optional

The DNS servers the host is currently configured to use, plus diagnostic events when the agent's built-in DNS-fallback path activates (if your ISP's DNS resolver fails, the agent falls back to Cloudflare 1.1.1.1 / 1.0.0.1 to keep itself reachable).

  • Configured DNS server IP addresses
  • DNS fallback events (hostname, error reason, fallback resolver, success/failure, latency)
Why we collect it: Diagnostic — a flap in agent connectivity is often a DNS problem on the host, not the agent itself. Visibility here pinpoints the root cause quickly.
Where it’s stored: analytics.heartbeats.metrics.host_live.dns_servers + control.agent_errors (fallback events)
Where you can see it: /my-device Network card
Self-host: Stays in your ten when self-hosted.

Blocklist enforcement state

agent Enforcement → rep: Never Feature-dependent

The agent fetches a global blocklist (D / F-graded IPs from rep + curated public threat-intel feeds + tenant overrides) and pushes those into the host firewall as block rules. We record what the agent applied and any reconciliation results.

  • Blocklist version applied
  • Number of entries applied / added / removed
  • Apply duration
  • Per-CIDR attribution (source feed, reason, dispute URL — fetched on demand)
Why we collect it: Lets the customer see exactly what was blocked, when, and why — including a dispute path for any entry the customer disagrees with.
Where it’s stored: In-memory + disk cache on agent; reported in heartbeat status; stored centrally as reputation.blocklist_entries
Where you can see it: /my-device Blocked tab, /agents/:id Blocklist tab
Self-host: Customers can self-host the rep instance that publishes the blocklist; the blocklist is then their own reputation engine's output.

Tenant blocklist overrides

agent Enforcement Operational → rep: Never Feature-dependent

Per-tenant exceptions to the global blocklist — admin-defined "always allow" or "always block" rules, with audit trail.

  • CIDR or hostname
  • Action (allow / block)
  • Note from the admin
  • Added-by user
  • Added-at timestamp
Why we collect it: Customer admins need final say over what gets blocked in their environment.
Where it’s stored: control.tenant_blocklist_overrides
Where you can see it: /agents/:id Overrides tab, /staff blocklist console
Self-host: Stays in your ten when self-hosted.

Agent self-diagnostic events

agent Diagnostics Support content → rep: Never Required

The agent reports its own errors, warnings, and lifecycle events back to ten over an outbound HTTP backchannel — installs, restarts, self-update outcomes, blocklist refresh problems, anything the agent itself thinks an operator should know.

  • Event category (install / lifecycle / runtime)
  • Severity (info / warn / error / critical)
  • Event source (which subsystem)
  • Stable error signature (for de-duplication)
  • Throttled message text (≤ 4000 chars, structured)
  • Structured details payload
  • Agent version at time of event
Why we collect it: No inbound port to the agent — this is the only way operators ever find out the agent is having a problem. Without it, the agent is a black box.
Where it’s stored: analytics.agent_errors
Where you can see it: /agents/:id Events tab
Self-host: Stays in your ten when self-hosted.

Operator-issued diagnostic commands

agent Diagnostics Support content → rep: Never Optional

Tenant admins can issue specific diagnostic commands to a specific agent (e.g., "run a network burst test against this responder," "dump current blocklist state"). Commands are pulled by the agent on its next heartbeat and the result is returned the same way. The set of allowed verbs is hard-coded into the agent — operators cannot run arbitrary code.

  • Command verb (from a fixed allowlist: power_diagnostics, wt_burst, etc.)
  • Verb-specific arguments
  • Result payload
  • Status (succeeded / failed)
  • Failure reason (if any)
  • Issued-by user, issued-at, deadline
Why we collect it: Lets an operator triage an agent without an inbound channel. Every command is audit-logged.
Where it’s stored: control.agent_commands
Where you can see it: /agents/:id Diagnostics tab (admin only)
Self-host: Stays in your ten when self-hosted.

Agent enrollment record

ten Identity → rep: Audit-trail Required

The canonical record of every agent in the system: a stable identifier, a hardware-fingerprint-based machine ID for re-enrollment detection, the tenant it belongs to, and lifecycle state.

  • Agent ID (server-generated UUID)
  • Tenant ID
  • Machine ID (stable hardware fingerprint, used to detect re-enrollment of the same physical machine)
  • Hostname
  • OS / architecture
  • Agent software version
  • IP address at enrollment time (debugging only)
  • Enrolled / first-seen / last-seen timestamps
  • Status (active / disabled / retired)
  • Per-agent dashboard token (a single-use opaque token that lets a user open /my-device for that specific agent without a login)
Why we collect it: Every other piece of data in the system foreign-keys back here. Required to operate.
Where it’s stored: control.agents
Where you can see it: /agents, /agents/:id, /my-device
Rep boundary detail: Agent ID is recorded on rep's pending-lookup queue ONLY as the "first requester" audit trail — it is not used in the lookup logic and does not appear on the verdict that other customers see.
Self-host: Stays in your ten when self-hosted.

Heartbeat delivery envelope

ten Identity → rep: Target only Required

Each heartbeat from each agent carries a timestamp, the public IP we observed it from (server-side, not agent-asserted), the agent version, and a status field. The metrics blob inside the heartbeat is everything cataloged in the agent-layer entries above.

  • Public IP at heartbeat time (server-observed)
  • Agent version (server-confirmed)
  • Heartbeat status (healthy / degraded / error)
  • Server-side observed-at timestamp
Why we collect it: Detects geographic moves (a laptop on the road), version drift across the fleet, and short-circuits the offline-detector.
Where it’s stored: analytics.heartbeats
Where you can see it: /agents, connectivity timeline
Rep boundary detail: The public IP gets queued for rep geolocation lookup with the same anonymity as flow IPs.
Self-host: Stays in your ten when self-hosted.

Agent connectivity timeline

ten Network → rep: Target only Feature-dependent

Periodic outbound probes from each agent to a small set of well-known anchor destinations (ten itself, Cloudflare, Google, the configured DNS resolver). Records geolocation and reachability of each anchor over time.

  • Reachability flags per anchor (gateway, Cloudflare, Google, DNS)
  • Geolocated country / region / city / lat-lon (from public IP at probe time)
  • ISP name
  • Probed-at timestamp
Why we collect it: Distinguishes "the agent is offline" from "the agent's ISP failed" from "the agent's machine is asleep." Critical for noisy-fleet diagnosis.
Where it’s stored: analytics.agent_connectivity
Where you can see it: /agents/:id timeline
Rep boundary detail: Public IP geolocation lookup goes to rep with no tenant identifier.
Self-host: Stays in your ten when self-hosted.

Fleet SBOM rollup

ten Files → rep: Never Feature-dependent

A daily aggregation of file-hash observations across the customer's fleet. For each unique hash: how many machines run it, what fraction of the fleet, when it was first / last seen, total sessions and bytes, what destinations it talked to, etc.

  • Hash + name + signer (mirrors flow data)
  • Agent count and active-agent count for the day
  • First-seen / last-seen day
  • Sessions per 24h, bytes out / in per 24h
  • Per-destination cohort (where this binary sent traffic)
  • Time-of-day histogram
Why we collect it: Powers the Fleet SBOM view, vendor-concentration map, beaconing detector, deviation score.
Where it’s stored: analytics.rollup_sbom and analytics.rollup_sbom_destinations
Where you can see it: /insights/sbom
Self-host: Stays in your ten when self-hosted.

Tenant, user, and access-control records

ten Account → rep: Never Required

The customer's organization, its users, their roles, and the permissions matrix. Required to gate access to all the other data in the system.

  • Tenant slug, name, lifecycle status, tier
  • User email (one per user)
  • Per-user role within each tenant (owner / admin / member / staff)
  • OAuth identity (provider name + provider-side user ID, when SSO is used)
  • Session metadata (login times, IP, browser fingerprint hash)
Why we collect it: Every authenticated request reads this; without it, we have no concept of who is allowed to see what.
Where it’s stored: control.tenants, control.users, control.user_tenants, control.user_oauth_identities, control.user_sessions
Where you can see it: /account, /tenants
Self-host: Stays in your ten when self-hosted.

Authentication credentials

ten Authentication Credential → rep: Never Required

How we verify users and agents. For password-based logins, we store an Argon2id hash — never the plaintext password. For OAuth-based logins, we store only the provider's opaque user ID, never the user's OAuth access or refresh tokens long-term. For agent / API access, we store hashed bearer keys.

  • Password hash (Argon2id, salted, never the plaintext)
  • API keys (hashed; the plaintext is shown to the user once at creation and is never persisted)
  • Enrollment tokens (single-use, expiring; hashed at rest)
  • Per-agent dashboard tokens (UUID; bound to that agent only)
Why we collect it: Required to authenticate; the hashing posture protects credentials in the event of a database compromise.
Where it’s stored: control.users.password_hash, control.api_keys.key_hash, control.enrollment_tokens.token_hash, control.agents.dashboard_token
Where you can see it: Plaintext is shown to the user once at creation, never thereafter
Self-host: Stays in your ten when self-hosted. Credentials never cross any service boundary in any deployment topology.

Billing and subscription state

ten Account → rep: Never Required

The tier and plan a tenant is on, simulated billing events (today; real billing integration is future), referrals, and the audit trail of any tier changes.

  • Subscription tier and plan
  • Billing events (tier changes, agent counts, prorated charges — simulated until real PSP integration ships)
  • Referral relationships (who referred whom)
  • Earnings ledger (referral payouts, in DT-internal accounting)
Why we collect it: Drives feature gating and entitlements.
Where it’s stored: control.billing_*, control.referrals_*
Where you can see it: /account/billing, /referrals
Self-host: Stays in your ten when self-hosted; the entitlement gate runs locally.

Support tickets and conversations

ten Support Support content → rep: Never Feature-dependent

Conversations a user opens about an agent or a tenant — subject, message bodies, AI-triage output, status. Every inbound message is run through a multi-pattern scrubber before storage that strips PEM key blocks, vendor-specific tokens (GitHub, Slack, AWS, Stripe, Anthropic, etc.), key-value secret fields, database connection strings, HTTP authorization headers, credit cards (Luhn-validated), social-security numbers, email addresses, and high-entropy unstructured strings. The pre-scrub text is never written to disk.

  • Ticket subject (≤ 200 chars)
  • Message body (post-scrub)
  • List of redacted secret kinds (so the UI can show "we stripped 1 credit-card pattern")
  • AI triage output (suggested category, priority, action)
  • Author kind (user / staff / agent)
  • Created-at timestamps and lifecycle state
Why we collect it: Customer support is impossible without preserving the conversation. The aggressive scrubber is what makes this safe to store.
Where it’s stored: support.conversations, support.messages
Where you can see it: /support
Self-host: Stays in your ten when self-hosted. The scrubber runs locally regardless of deployment.

Agent install reports

ten Diagnostics Support content → rep: Never Optional

When an agent is installed (or attempts to install), it phones home with a snapshot of the install outcome — installer version, OS, architecture, success or failure, and any error text from the installer log.

  • Installer version
  • OS, architecture
  • Success or failure flag
  • Error text from the installer log (if any)
Why we collect it: Detects deployment problems early; fed back into installer engineering.
Where it’s stored: analytics.install_reports
Where you can see it: Staff dashboards (admin can see their tenant's install reports)
Self-host: Stays in your ten when self-hosted.

Custom rules and alert configuration

ten Enforcement Operational → rep: Never Feature-dependent

Rules a tenant defines for their own environment — what destinations to allow / block, what flow patterns to alert on, what processes to flag.

  • Rule name, scope, criteria, action
  • Alert configuration (notification destinations, severity thresholds)
  • Created-by, created-at, last-modified
Why we collect it: Customer-driven policy is essential — global blocklists alone do not cover environment-specific rules.
Where it’s stored: control.rules, control.alert_configurations
Where you can see it: /rules, /alerts
Self-host: Stays in your ten when self-hosted.

IP reputation cache (global, no tenant linkage)

rep Reputation → rep: Target only Required

The authoritative reputation record for every IP rep has investigated. Keyed solely by IP. There is no tenant identifier on this record. One verdict serves every customer who asks about that IP.

  • IP address (primary key)
  • Letter grade (A+/A/A-/B+/B/B-/C+/C/C-/D/F) and numeric score
  • Findings array (blocklist hits, category hits, AI assessment, certificate observations)
  • Service identity (name, category — e.g. "Slack", "Cloudflare CDN")
  • Reverse DNS, ASN, ISP, organization, hosting-provider flag
  • Geolocation (country, region, city, lat / lon)
  • TLS certificate details (CN, SAN, issuer, expiry, self-signed flag)
  • Domain category tags (e.g. "advertising", "malware-c2", "social-media")
  • First-seen, last-analyzed, cached-at, expires-at timestamps
Why we collect it: Reputation is a property of the IP itself, not the customer asking. Sharing rep verdicts across customers is the entire reason cross-tenant reputation works.
Where it’s stored: reputation.ip_cache
Where you can see it: /intel, /destinations, /map
Rep boundary detail: This entry IS rep. The "no tenant linkage" property is enforced by schema — there is no tenant_id column on reputation.ip_cache.
Self-host: You can run your own rep instance. Your rep can either federate with ours over a signed peer link (sharing reputation across both your fleet and the broader DT customer base) or run isolated (your fleet only). The federation choice is yours.

File reputation cache (global, no tenant linkage)

rep Reputation → rep: Target only Required

The authoritative reputation record for every file hash rep has investigated. Same anonymity property as the IP cache: keyed solely by hash, no tenant identifier.

  • SHA-256 hash (primary key)
  • Verdict (signed_trusted / A / B / C / D / F / unknown)
  • Numeric score
  • Findings array (CVEs, trusted-publisher matches, VirusTotal detections, MalwareBazaar hits)
  • Signer metadata (publisher, certificate thumbprint — observed from the first-reporting agent)
  • External-DB raw responses (VT and MBZ, cached)
  • Sample-name array (e.g., "this hash is normally named chrome.exe; on one machine it appeared as svchost.exe")
  • Cached-at, expires-at
Why we collect it: File reputation is a property of the file itself, not the customer running it.
Where it’s stored: reputation.file_cache
Where you can see it: /insights/sbom
Rep boundary detail: No tenant_id column on reputation.file_cache.
Self-host: Same self-hosting story as IP reputation — bring your own rep instance, federate or stay isolated.

Pending-lookup queue (audit-trail tenant linkage)

rep Reputation Operational → rep: Audit-trail Required

When a customer first asks about an IP or hash that rep has not yet investigated, the lookup is queued. The queue records the *first asker* for accountability — but this trail is intentionally not used in the verdict and does not appear on the cached result that other tenants see.

  • Lookup target (IP or hash)
  • first_requested_by_tenant + first_requested_by_agent (audit trail only)
  • Priority, status, attempt count, last error
  • Queued-at timestamp
Why we collect it: Audit trail in case a customer asks "did you investigate IP X because we asked?" — yes, here's the record. Not used to associate the verdict with the requester.
Where it’s stored: reputation.pending_lookups, reputation.pending_file_lookups
Where you can see it: Staff dashboards only
Rep boundary detail: This is the only place on rep where any tenant identifier exists, and it is bounded to the queue row. Once the lookup completes, the verdict is written to ip_cache / file_cache without any tenant reference.
Self-host: On a self-hosted rep, the audit trail stays on your rep. On the managed rep, it stays on ours.

Threat-feed mirrors

rep Reputation → rep: Never Feature-dependent

rep mirrors several public threat-intelligence sources locally so it can serve verdicts without per-lookup external API calls. These are bulk imports from upstream sources, not data collected from customers.

  • Spamhaus DROP list
  • IPsum aggregated feed
  • UT1 / Hagezi domain category lists
  • MalwareBazaar daily malware-sample mirror
  • NSRL local SQLite (NIST known-good corpus, ~12 GB of hashes)
Why we collect it: Lets rep do most lookups without ever leaving its own network. Reduces external dependency, lowers per-lookup cost, improves latency.
Where it’s stored: reputation.threat_*, reputation.malware_bazaar_mirror, reputation.domain_categories, reputation.nsrl (SQLite)
Where you can see it: Indirectly, through any blocklist hit or category tag attributed to one of these sources
Self-host: Self-hosted rep instances pull these mirrors directly from upstream sources. No DT involvement required.

External API lookups (hash-only, never the file)

rep Reputation → rep: Target only Feature-dependent

For files not in the local mirrors, rep queries VirusTotal and MalwareBazaar by SHA-256 hash. We never upload binaries. The hash is a 32-byte fingerprint — sufficient to identify the file if you already have the original, but cannot be reversed back to the binary itself.

  • SHA-256 hash sent in the lookup query
  • Response from the external service, cached locally
Why we collect it: Coverage. Local mirrors do not have every binary; the external aggregators have wider real-world coverage.
Where it’s stored: Outbound queries are not persisted as logs; the cached response lands in reputation.file_cache
Where you can see it: /insights/sbom (the MBZ and VT chips, with hover detail)
Rep boundary detail: For self-hosted rep instances, this is a deliberate egress: hashes leave your boundary to query the external service. You can disable the external-lookup path entirely if the local mirrors are enough for you.
Self-host: On a self-hosted rep, you control whether external lookups happen at all. With them disabled, your rep relies on the local mirrors only and sends nothing outbound.
No entries match the current filter.

Your rights and our commitments

Access. Every customer-visible surface listed under "Where you can see it" is reachable from the in-product UI without contacting us. If you want a machine-readable export of everything we have on your tenant, ask — we’ll provide it.

Deletion. Tenant deletion removes all customer-scoped records (analytics.flows, analytics.heartbeats, support, account). Reputation verdicts on IPs and hashes you happened to ask about first remain on rep, because those verdicts no longer carry your identity once cached.

Portability. If you want to move from our managed cloud to a self-hosted deployment, we’ll provide the data export. If you want to move from a self-hosted deployment back to managed, same.

Changes to this page. Material changes to what we collect will be announced in-product before they ship, with at least 30 days’ notice to production-tier customers. Beta changes may move faster, but every change is reflected here within the same release that introduces it.

Found something missing or unclear? Tell us. The transparency story is only useful if it’s accurate — corrections welcome.