Why your AI agent should not guess email addresses

Ask an agent for someone’s work email without giving it a data source and it will not stop to ask. It will infer. first.last@company.com is the most common corporate pattern, so the inference is often right — and that is exactly the problem. A guess that works 70% of the time produces a list where almost a third of the addresses are wrong, and nothing in the agent’s transcript looks like an error.

This post covers what actually goes wrong when agents pattern-guess addresses, what email verification can and cannot tell you, and the workflow that replaces guessing: verify what you have, look up what you don’t.

Where guessed addresses fail

Catch-all domains accept everything

A large share of B2B domains are configured as catch-all: the mail server returns 250 OK for any recipient at the domain, real or not. On a catch-all domain your guess is unfalsifiable before sending — q.tarantino@meridianlabs.com looks exactly as deliverable as the CEO’s real address. Mail to a wrong local part is silently dropped, or bounced by an internal hop after acceptance, after the attempt has already been logged against your sending domain.

So a guessed list can pass a naive existence check and still fail in volume on the day you send. The cheapest moment to learn an address is wrong is before sending. The most expensive is after.

Bounces are charged to your domain, not your list

When a guessed address does bounce, the receiving provider records a hard bounce (550 5.1.1, user unknown) against your sending domain and IP. Bounce rate is a primary reputation input. Most sending platforms intervene above a 2% hard-bounce rate, and Google’s bulk-sender requirements set the spam-complaint ceiling at 0.3% — there is no slack budgeted for guessing.

Do the arithmetic. At a generous 70% pattern accuracy, a 200-row guessed list produces 60 hard bounces: a 30% bounce rate, fifteen times the usual intervention threshold, in one send. The penalty does not land on that campaign. It lands on every email the domain sends afterwards, including the ones to real addresses. Recovering a burned domain means weeks of reduced volume; many teams abandon it and warm a new one instead, which takes two to four weeks before it can carry meaningful volume again. One careless list costs more than looking up every address on it ever would.

Patterns rot silently

A pattern guess encodes an assumption about a company’s IT history, and companies do not announce their IT history. Mail migrations flip jwarren@ to joel.warren@. Rebrands move everyone to a new domain and leave the old one on forwarders that get switched off a year later. Acquisitions merge two address books overnight. And the person you are writing to may have left 14 months ago — in which case the pattern is right, the domain is healthy, and the email still reaches nobody.

Public pattern databases lag all of these events. An LLM’s training data lags them by more.

What verification actually checks

ctc verify answers one question — will mail to this address be accepted — in under a second, for 0.02 credits. The checks run in order:

Syntax. The address parses per RFC 5321/5322.
Domain. DNS resolves and at least one MX record answers.
Mailbox. An SMTP conversation up to RCPT TO, without sending mail. A 250 means the mailbox exists; a 550 means it does not.
Flags. Disposable domains, role accounts (info@, billing@), catch-all detection.

$ ctc verify j.warren@brightpath.io
j.warren@brightpath.io: undeliverable (mailbox does not exist)
$ echo $?
2

That guess would have been a hard bounce. 0.02 credits to find out beats finding out in a deliverability dashboard three weeks later. Note the exit code: single-mode verify exits on its verdict — 0 for deliverable, 2 for undeliverable or no MX, 3 for risky/catch-all/unknown, 6 for invalid syntax — so a script branches on $? without parsing anything.

The honest limit is catch-all domains. When a server accepts every recipient, no external tool can prove a specific mailbox exists — a vendor claiming otherwise is guessing too. Verification reports the situation as what it is:

$ ctc verify anyone@meridianlabs.com
anyone@meridianlabs.com: risky (catch-all domain accepts all recipients)
$ echo $?
3

risky is a policy bucket, not a verdict. A workable policy: send to deliverable freely, route risky through a low-volume and closely watched sequence, never send to undeliverable. Because the exit code carries the verdict, the whole policy can be a shell chain: ctc verify "$EMAIL" && queue_send "$EMAIL" sends only on deliverable.

Lookup beats inference

A guessed address is one inference from one pattern. A looked-up address is a record that was observed somewhere. ctc find runs a waterfall across 20+ premium data sources and returns the first result that clears the confidence bar, verification included:

$ ctc find "Joel Warren" brightpath.io
status: found
work_email: joel.warren@brightpath.io (deliverable, confidence 0.97)

A lookup has three properties a guess cannot have:

Provenance. The address was observed in licensed data sources, not derived from a naming convention that may have changed in 2023.
Verification built in. Results arrive with deliverability status and a confidence score, so catch-all ambiguity is priced into the number instead of hidden behind it.
An honest miss. When the waterfall finds nothing, the answer is status: not_found and the charge is 0 credits. An agent that guesses never returns not-found — it returns plausible strings.

The cost is knowable before any of it runs:

$ ctc find "Joel Warren" brightpath.io --estimate
estimated_credits: 1

And it is the worst case: the estimate is what a hit costs, and a miss is free.

The agent workflow

For a list, the same logic is one command with a spend cap. A 120-row file processes in chunks of up to 100 rows:

$ ctc find leads.csv enriched.csv --max-cost 150
contactctl: find run 8f3d2c1a-77b4-4f0e-9a52-6c1e0b9d4f21 done (100 rows, 81.00 credits)
contactctl: find run c2a7e510-94d3-4b6f-8e21-7f0b3d9c4a55 done (20 rows, 15.00 credits)

Row order is preserved, every row comes back annotated with contactctl_status, contactctl_work_email, and per-row cost and error columns, and --max-cost makes overruns impossible — an unattended agent cannot spend past the cap. 96 of 120 rows found means 96 credits charged; the 24 misses are free. Before any send, gate the output (verify reads the email/work_email column, so point it at the found addresses):

$ sed '1s/contactctl_work_email/email/' enriched.csv > sendlist.csv
$ ctc verify sendlist.csv --json | jq 'map(.status) | group_by(.) | map({(.[0]): length}) | add'
{
  "deliverable": 91,
  "invalid_input": 24,
  "risky": 5
}

The 24 not-found rows have no email to check and come back invalid_input, free. That is 1.92 credits to guarantee a 0% hard-bounce rate on a 96-address send. Every command also exits with deterministic codes, so the agent can branch on the result instead of parsing prose.

Put it in the system prompt

The durable fix is three lines in your agent’s standing instructions:

Never construct an email address from a name and a domain.
Resolve addresses with `ctc find`; gate every send on `ctc verify`.
Treat catch-all results as a separate sending policy, not as valid.

Guessing feels free because the cost arrives later, on a different ledger — your domain’s reputation instead of your credit balance. Verification at 0.02 credits, and lookups that charge only on success, move that cost back to where an agent can see it, cap it, and decide. Agents make good decisions about costs they can measure, and bad ones about costs they can’t.