← Documentation

How file reputation works Packetman saysThis page explains how DataStun decides whether a program on your machine is safe. The first principle is privacy: we never take a copy of your software. We compute a SHA-256 fingerprint — a one-way hash — and ask about that. The second principle is corroboration: rather than trust one antivirus vendor, we gather independent sources and look for agreement — the publisher's code signature, the US government's known-good list, a public malware corpus, and a multi-engine scanner. Then we add two questions modern attacks demand: is this specific version of the program vulnerable — does it have published CVEs, and is any of them on CISA's actively-exploited list — and does a malware scanner's YARA or ClamAV rules flag it. The last one needs discipline, because generic malware-hunting rules fire on perfectly legitimate signed software, so a rule match only counts against an unsigned binary. Read on for the diagrams.

We grade every program that opens a network connection — from its fingerprint, never the file itself. “Clean” means independent sources agree; “risky” means we can show you why.

Why it works this way: we know the good, so we can focus on the suspicious. Almost everything on a normal computer is ordinary, legitimate software. The agent recognizes the millions of known programs instantly and spends its attention on the small handful of unknowns actually running on your machines — no wasted alarms on the software you use every day.

And it doesn’t judge a file on one quick look. Each unknown gets a full background check — who made it and whether the signature is valid, whether it’s a known legitimate program, whether it’s on any threat list, whether it matches the patterns of known malware families, and whether it’s a version with a hole attackers are exploiting — combined into one number, fully explained. No mysterious verdicts.

On this page

1. Hash-only — the file never leaves your machine 2. The evidence cluster — corroboration, not one opinion 3. Published vulnerabilities (CVEs) 4. Known-exploited (the red KEV chip) 5. YARA + ClamAV — and the false-positive discipline 6. The single verdict — how the signals combine

1. Hash-only — the file never leaves your machine

When the agent sees a program make a network connection, it computes a SHA-256 hash — a 64-character fingerprint that can’t be reversed back into the file — plus a little metadata the operating system already exposes (the publisher on the code signature, the owning package and version). That fingerprint is what we look up. The binary itself never leaves the endpoint. Two machines running the same program produce the same hash, so one lookup answers for the whole fleet.

Endpoint agent program.exe stays here sha256 + signer + version fingerprint only Reputation lookup keyed on the hash cached & shared fleet-wide Verdict + evidence
A one-way fingerprint plus OS-exposed metadata — not your software — is what gets looked up.

2. The evidence cluster — corroboration, not one opinion

A single antivirus verdict is one vendor’s guess. We gather independent sources and look for agreement:

SIG the publisher’s code signature, verified by the OS (Authenticode + catalog on Windows, codesign on macOS, package ownership on Linux).
NSRL NIST’s National Software Reference Library — the US government’s known-good catalog of shipped software.
MBZ a public malware corpus — a match here is a confirmed bad sample.
VT a multi-engine scanner — consulted when the free sources are inconclusive.

“Clean” isn’t one source saying so — it’s several independent ones agreeing. And the strongest bad signals (a confirmed malware-corpus sample, a multi-engine consensus) outrank a code signature, because stolen and abused signing certificates are real.

3. Published vulnerabilities (CVEs)

Reputation answers “is this malicious?” — but a perfectly legitimate, validly-signed program can still be vulnerable. So once the agent ships the program’s version, we look the version up in the public vulnerability databases (the OSV aggregator, and the distribution security trackers for Linux packages) and list the published CVEs that affect it — each with its severity (CVSS) and, for Linux packages, whether a fix is already in your installed version.

This is the difference between “that’s really Chrome” and “that’s really Chrome, three versions behind, with four known holes.” Until the agent has reported a version, this section simply waits — we never guess.

4. Known-exploited — the red KEV chip

Not every CVE is equal. A handful are being actively exploited in the wild right now. CISA — the US Cybersecurity and Infrastructure Security Agency — publishes that shortlist as the Known Exploited Vulnerabilities (KEV) catalog, which we mirror daily. Every CVE we find on your binary is cross-referenced against it.

CVEs on this binary CVE-2024-1111 CVE-2024-2222 CVE-2023-3333 CISA KEV catalog actively exploited, mirrored daily CVE-2024-2222 ← match + ransomware flag KEV KEV · ransomware
A CVE on both lists earns a red KEV chip — deeper red when ransomware operators are documented using it. “Patched in your version” is shown but not alarmed.

This is the single most actionable line in the whole report: not “you have vulnerabilities” (everyone does) but “this one is being used in real attacks today, and it’s on a machine in your fleet.”

5. YARA + ClamAV — and the false-positive discipline

We also run the binary’s hash against YARA (pattern-matching malware rules) and ClamAV (the open-source antivirus engine) through a public scanning service. ClamAV signatures are precise. YARA rules, though, are mostly broad hunting rules — “contains crypto constants,” “has anti-debug tricks” — and they fire on huge amounts of perfectly legitimate, hardened software. Chrome, OneDrive, even Windows Defender’s own engine match generic YARA rules.

The discipline: a YARA match counts as a bad signal only when the binary is not validly code-signed. On a validly-signed binary the match is recorded for transparency but never moves the verdict — it can’t override a good signature. A ClamAV detection or a confirmed malware-corpus sample is different: that’s precise evidence, and it counts.
Validly signed + YARA match → suppressed, shown for transparency · verdict unchanged Unsigned + YARA match → downgraded to Suspicious · external scan advised Why Generic hunting rules fire on hardened, signed software (Chrome, OneDrive, Defender). Counting them against signed binaries would bury you in false positives — so we don't.
False positives erode trust faster than catches earn it. The provenance gate keeps the YARA signal honest.

6. The single verdict — how the signals combine

All of the above rolls up into one verdict, with a clear order of precedence:

SignalEffect on the verdict
Confirmed malware sample (corpus) / multi-engine AV consensusOverrides everything, including a valid signature (stolen certs are real).
ClamAV detection on an unsigned binaryMarks it bad.
YARA match on an unsigned binaryDowngrades to Suspicious; a deeper external scan is advised.
YARA match on a signed binaryRecorded, but never overrides the signature.
Valid signature / NIST known-good / distro-managedStrong trust — reached only when nothing above fired.
A CVE on the CISA KEV listCaps the trust score (deeper for ransomware) — but never overrides a valid signature; a signed-but-vulnerable binary is trusted and flagged to patch.

The result is an honest verdict you can defend: clean when independent sources agree, flagged when we can point at the source, and — for the vulnerable-but-legitimate case — both at once.

← All documentation   The security lane →