FieldView

A self-calibrating intelligence engine for enterprise sales — and a working proof of what disciplined data analytics looks like.

The one-line version: FieldView captures everything a sales team does automatically, predicts which deals will be won with a machine-learning model it trains on itself, explains every prediction, and is disciplined enough to tell you when it doesn't have enough data to be sure.

Prisma data models

Analytics engines

ML features · explainable

weekly

Recalibration cadence

self-grading

Closed learning loop

At a glance

What it is

A presales / sales-engineering operating system running predictive intelligence on auto-captured CRM + calendar activity

Data models

84 Prisma models, every one multi-tenant isolated

Intelligence

50+ distinct analytics engines across 9 families, running on every sync

The ML core

Logistic-regression win-probability model — 12 features, cross-validated, Brier-scored, fully explainable, trained per customer

The loop

Predict → expose → act → grade → recalibrate weekly. A closed learning loop, not a static rules engine

The discipline

Append-only ledgers, sample floors, confirmation gates, cold-state disclosure — all enforced by tests in CI

Stack

Next.js 16, React 19, Prisma 7, Postgres (Neon), OpenAI + AWS Bedrock, deployed on Vercel

Test suite

~530 test files — most of them architecture-enforcing guards, not happy-path checks

§1 · The thesis

What makes this genuinely hard

Most "AI for X" products are a thin model bolted onto a form. FieldView made three structural commitments that force real inference everywhere — the same three that separate a serious clinical-analytics product from a dashboard.

① ZERO ENTRY

The signal is derived, never typed

Required fields are forbidden. There's no "rate this deal's health" form to fall back on — every score, flag, and prediction is inferred from passively captured activity. The moment users hand-enter the truth, data quality collapses to whatever they bothered to type. So the system computes the truth from behavior.

② GRADES ITSELF

The product corrects its own model

It records every prediction, watches what the operator did about it, watches what actually happened — then retunes its own decision thresholds every week based on whether it was right. The model running next month is not the one running today; reality corrected it in between.

③ REFUSES TO LIE

It stays quiet when the data is thin

When data is insufficient it renders insufficient sample — 12 of 40 instead of a screenshot-able percentage. Every learning rollup is gated behind operator confirmation, so the model never trains on its own guesses. Enforced by append-only ledgers, hard sample floors, and static-analysis tests that fail the build.

§2 · The data substrate

An auto-capture pipeline in 8 layers

Everything downstream feeds off a sync that runs every 15 minutes, per customer, pulling from Salesforce and Google Calendar and writing a normalized activity graph. No human touches it.

Sync engine every 15 min · atomically locked

Salesforce (Accounts, Opportunities, Events, Tasks) + Google Calendar

Matching engine 3-tier confidence

Stamps every activity HIGH / MEDIUM / LOW / UNMATCHED

Intelligence engines ~50 in sequence

The analytics layer — health, risk, behavioral, prescriptive — see §3

Lifecycle tracker

Auto-detect evaluation-stage deals, stall scan

Evaluation workspace

Evidence tree, validation status

Learning loop

Predict → expose → act → grade → calibrate

Ritual & briefs

Weekly ritual, automated executive briefs

Dashboards

Operator cockpit, leadership, deal room

Layer 2 — the matching-tier decision

TIER 1 · HIGH

Direct link

CRM foreign-key link — unambiguous

TIER 2 · MEDIUM

Domain → account

Attendee email domain → best-fit deal. Downgrades to LOW when 3+ candidates

TIER 3 · LOW

Name match

Exact name (≥5 chars, no substring)

UNMATCHED

No confident link

Held out of scoring rather than guessed

Atomic locking

Before any work, the orchestrator atomically claims the job — SET status = SYNCING WHERE status = IDLE. If zero rows update, another sync is already running and it backs off cleanly. Stuck syncs older than 10 minutes auto-recover on the next tick. No double-processing, ever.

Stage isolation

From Layer 5 onward, every stage is wrapped in its own try/catch. A failure in the matching engine cannot abort scoring, summaries, or detection downstream. This is a rule enforced by CI — not a convention someone might forget.

§3 · The breadth

50+ analytics engines on every sync

Each engine is a mostly-pure function: it reads the deal plus its activity graph, computes a value, and writes it back so the UI renders from a single query. This is the part that makes people say "wait — this isn't a CRM skin." Hover any tile to flip it and read the method.

A representative catalog grouped by the six engine families — every name is drawn directly from the live codebase.

§4 · The depth

A real, explainable, calibrated ML model

The centerpiece — and deliberately not a black box. The win-probability engine is pure logistic regression written from scratch: no external ML library, no opaque service. Every coefficient and threshold is auditable in source.

The model

// logistic regression, batch gradient descent + L2 sigmoid(x) = 1 / (1 + e^(−x)) // clipped ±500 P(win) = sigmoid( w · x_norm + b ) learning rate η = 0.1 L2 lambda λ = 0.1 max iterations = 500 w[j] −= η·( gradW[j]/n + λ·w[j] ) b −= η·( gradB/n )

Persisted with its own honesty metrics

{ "accuracy": 75, // cross-validated % "brierScore": 0.18, // 0 = perfect "trainingSize": 40, "wonCount": 18, "lostCount": 22, "winRate": 45, // base rate % "trainedAt": "2026-05-01" }

The 12-feature vector

#	Feature	Meaning	Impute
0	healthScore	Composite deal health	50
1	threadingScore	Stakeholder coverage	30
2	championHealth	Key-contact engagement	40
3	committeeScore	Buying-group breadth	30
4	dysfunctionScore	Decision friction	20
5	technicalDebt	Unresolved eval gap	30
6	activityCount30d	Meetings + calls (30d)	3
7	meetingCount30d	Meetings only (30d)	2
8	closeDatePush	Timeline slippage	0
9	uniqueStakeholders	Distinct externals	2
10	stageProgress	Pipeline position	0.35
11	totalActivities	Lifetime engagement	5

Every feature is min-max normalized to [0,1] against stats persisted with the model. Out-of-range inputs clip to the training range; missing values impute to the training median.

The discipline in training

Refuses to train on too little. Requires ≥30 closed deals with full outcome snapshots, AND ≥8 won AND ≥8 lost — a balanced set. It then rejects its own trained model if accuracy doesn't beat the naïve majority-class base rate. A model no better than guessing the common case never ships.

Cross-validated & per-customer. Leave-one-out below 50 samples, 5-fold above — folds seeded with a deterministic PRNG so every run is reproducible. Each organization trains its own model on its own outcomes. No shared weights. True multi-tenant isolation.

Every prediction is decomposable

The UI shows the top forces pushing toward a win and toward a loss, rebuilt live at read time so the explanation always matches the deal's current state: contribution[i] = weight[i] × (norm[i] − 0.5)

P(win) · this deal

◀ pushing toward losspushing toward win ▶

The immutable prediction ledger

Every sync writes one immutable row per deal per UTC day — the probability and the exact feature vector that produced it. You can reconstruct any historical prediction from the honest inputs that existed at the time. Never recomputed retroactively, never silently revised. Batch inference eliminated an N+1 — one grouped read per aggregate instead of five queries per deal.

WinProbSnapshot winProbability Float // [0,1], 2dp featureVector Json // frozen at prediction time recordedDate DateTime // midnight UTC = dedup anchor @@unique([orgId, opportunityId, recordedDate])

§5 · The loop

Predict → expose → act → grade → recalibrate

The moat — what separates "a model" from "a system that gets smarter." FieldView closes the loop on every prediction it makes. Watch the token travel; it cannot reach grading until the confirmation gate stamps it.

①

Predict

Logged with type, value, timestamp, numeric metadata — HEALTH_SCORE, STALL_ALERT, RISK_SIGNAL, TRAJECTORY…

②

Expose

Every time a prediction is shown: surface, mode, dwell time. 30-min dedup debounce.

③

Act

The move the operator executed, with observed result (HELPED / NO_CLEAR_EFFECT…).

④

Infer

A passive cron links action → outcome in fixed windows: action ≤48h, outcome ≤10d.

⑤

Confirmation gate

An inferred link only counts toward learning if operator-CONFIRMED. Auto-grading noise can never pollute truth.

⑥

Grade

correct / correct_but_mitigated / wrong / partial / not_judgeable.

⑦

Calibrate

Weekly sweep retunes each threshold, logs every attempt to an append-only ledger. ⟲

The windows & gates

Mechanism	Value
Action inference window	48h (2h warm-up)
Outcome inference window	10d
Exposure dedup debounce	30 min
Inference batch size	500 rows
Multi-signal attribution	most-recent prior exposure wins

Per-type grading windows

Prediction type	Judge	Stale	Mitig.
HEALTH_SCORE	14d	90d	14d
STALL_ALERT	14d	60d	14d
RISK_SIGNAL	14d	60d	14d
TRAJECTORY	14d	90d	14d
POV_SIGNAL	21d	90d	21d
SYNTHESIS_PATTERN	30d	120d	21d

The clever grade — correct_but_mitigated. Suppose the engine fires a pessimistic warning (AT_RISK, STALLED) and the deal then wins. Naïvely that's "wrong." But if the operator completed a follow-up inside the mitigation window and the health score climbed ≥10 points, the system credits the warning: the alarm was right, and acting on it is why the outcome flipped.

accuracy = (correct + correct_but_mitigated) / judgeable

correct correct_but_mitigated wrong partial not_judgeable · never in a denominator

A system that punished its own true-positive warnings for being acted upon would train itself to stop warning. This one doesn't. The clinical parallel is exact: an early-deterioration alert that triggers an intervention and prevents harm is a success, not a false alarm.

§6 · The conscience

Calibration & the honesty architecture

If there's one section to linger on with a healthcare CEO, it's this. Anyone can ship a number. The trust-defining engineering is knowing when you're not allowed to.

The weekly calibration sweep

Click a lime node — a week where a threshold actually moved — to see previous → proposed → final (clamped). Grey nodes are sweeps that ran and held steady; the ledger records those too.

Governed threshold registry

Threshold	Default	Min	Max	Floor
HEALTH_SCORE	30	15	50	20
STALL_ALERT	14d	7	28	20
CHAMPION_GONE	14d	7	28	15
STAGE_STAGNATION	30d	14	60	15
FOLLOW_UP_GAP	5d	3	14	10

Adopts a new threshold only if it differs and strictly beats the incumbent. Ties never displace what's already there. No customer-specific threshold publishes below ≥20 graded outcomes — until then the UI shows a transparent cold state.

Discipline, enforced by CI

Sample floors everywhere. MIN_FOR_PUBLISHED_RATE = 40 (±15pp), MIN_FOR_TIGHT_RATE = 96 (±10pp, hero metrics), RESOLVED_PREDICTION_FLOOR = 150 before any accuracy claim. Below floor → a cold state, never a fabricated number.

Correlational language only. Precedent citations say "followed by progress in N of M cases" — never "worked" or "caused." A banned-phrase test fails the build on any causal claim the data can't support.

Append-only is real. Eleven ledgers are never updated or deleted. State changes use explicit "mark, don't remove" columns.

Static analysis is architecture. A test grep-walks every server action and fails CI if any query omits the tenant ID — the structural guarantee one customer can never see another's data.

Two append-only ledgers, by design

ThresholdHistory

Records only the thresholds that changed: previous → proposed → final (clamped). Idempotent on (orgId, sweepId, thresholdKey). An auditable lineage of how the engine retuned itself over time.

CalibrationSweepHistory

Records every sweep — including "ran, nothing changed" — with a deterministic narrative headline. Timeline honesty: you can prove it considered a change and declined, not that it was idle.

Real sweep headlines from the code: 6 thresholds evaluated · 2 changed · 1 clamped · no material changes · insufficient resolved outcomes (12) — skipped

§7 · The governance layer

Cost, memory, and grounding

Three subsystems keep the AI features honest, cheap, and consistent — each built to fail safe rather than fail loud.

LLM cost — metered, fail-closed

Every model call is metered to an append-only ledger — model, stable label, outcome, token counts, cost — and never stores prompt or response text. Cost is tracked in integer micro-dollars (1 USD = 1,000,000 µUSD) to avoid float drift.

Fail-closed daily breaker: before each call it sums today's spend against budget. Over? The call is refused before any money is spent and logged BUDGET_BLOCKED. Soft-fail — it returns nothing rather than throwing, so a budget stop never breaks the UI.

Model	Input µUSD/1M	Output µUSD/1M
gpt-4o-mini	150,000	600,000
gpt-4o	2,500,000	10,000,000

Unknown models fall back to gpt-4o-mini pricing (conservative). External text is fenced in <source_data> tags with an untrusted-source notice — prompt-injection hardening.

Memory arbitration — one truth per screen

Seven independent composers each produce one candidate line; a single deterministic arbiter picks exactly one to speak (or stays silent). No scoring, no ML — a strict precedence cascade that guarantees two contradictory "truths" never appear on one screen.

▼ exactly one truth per surface

POV claim verification — 3-path grounded fallback

When a sales engineer verifies a customer-facing claim ("does it support SAML SCIM?"), the verifier tries three paths in order, so the result is never worse than the plainest one.

①

OpenAI web search — a single query against the current online docs, grounded in the source URLs the model cites.

②

Bedrock RAG — KB retrieve → cohere.rerank-v3-5 → quality gate (MIN_USEFUL_SCORE 0.15) → grounded synthesis with citations.

③

OpenAI-direct fallback — parametric answer, empty sources, explicitly stamped MODEL_KNOWLEDGE_NOTICE so it's never mistaken for a grounded verdict.

The verdict set is deliberately graded — not a binary:

VERIFIED PARTIALLY_VERIFIED NOT_VERIFIED CONTRADICTED INSUFFICIENT_EVIDENCE NEEDS_HUMAN_REVIEW

§8 · The executive layer

Analytics that roll up to a decision

The intelligence rolls up into one trustworthy number. The flagship is the forecast-truth waterfall: it discounts raw pipeline down to what's actually been validated by evidence, first-match-wins, each step subtracting from the remaining dollars.

Forecast-truth waterfall

Quality-adjusted haircut tightening 31% → 23% over 12 weeks. The same projectForecastTruthWaterfall helper draws this on the dashboard and persists it to the weekly ledger — byte-identical, so the live view can never drift from the record.

Metric

12 wks ago

Last week

Raw pipeline

$8.4M

$9.6M

Quality-adjusted

$5.8M

$7.4M

Buyer-validated

$1.9M

$4.6M

Unmatched signoff

Customer-matched

Deals at risk

$4.5M

$3.0M

Trust is earned, not asserted

An org climbs a 4-tier ladder; the strength of language the product is allowed to use scales with it. Each rung is gated on real volume.

Buildingcold start

Observation only

Directional≥25 closed · ≥50 graded

"Directional" numeric hints

Trusted≥40 closed · ≥150 graded · ≥90d

Published accuracy claims

Compounding≥100 · ≥250 · ≥180d · 7 hygiene gates

Full moat narrative

hygiene gate at risk · 14-day grace, then downgrade

Even the credibility of the system's own self-description is governed — a compounding org that lets a gate slip gets a soft warning, then a hard downgrade after 60 days below floor.

"89% of the 4-week-old forecast has landed or is still on track; the quality haircut eased 2.1pp to 14% — the discount on raw pipeline is shrinking; prediction trust tier: Trusted." — the forecast-honesty line, narrated in one executive sentence.

§9 · Why this matters for telemedicine

The same machine, a clinical domain

Strip away the sales domain and look at the machine. FieldView is structurally a real-world-signal-to-trusted-prediction engine with a self-correcting feedback loop and a conscience. That is exactly the shape of a serious telehealth analytics platform. The mapping is one-to-one.

What FieldView does

The direct telemedicine analog

The takeaway: the genuinely hard problems in clinical analytics aren't "can you train a model" — they're can you make a prediction someone will safely act on, prove what you predicted and why, score yourself honestly, and know when to stay quiet. FieldView is a shipped demonstration that the person who built it has already solved all four.

Note on scope, kept honest: FieldView itself is a sales product and makes no HIPAA / SOC 2 / clinical-compliance claims. What transfers is the architecture and the discipline — the predictive pipeline, the calibration loop, the audit ledgers, the honesty gates — not a compliance certification.

§10 · Under the hood

Engineering rigor, for the skeptics

Multi-tenancy is absolute

All 84 models carry a tenant ID; every query filters on it; raw SQL is banned; a static-analysis test grep-walks every server action and fails CI if any query omits the tenant scope.

Encryption at rest

All third-party OAuth tokens (Salesforce, Google, Slack) are AES-256-GCM encrypted; decryption throws if the key is unset in production.

Guarded server actions

Mutations route through guarded server actions, each calling requireSession() / requireAdmin() / requireOwner() — not loose API endpoints.

Hydration discipline

No wall-clock or randomness in client render paths; all locale formatting pinned to UTC; each rule backed by a regression test born from a real production incident.

8 scheduled crons

15-min sync, hourly exposure inference (:17), hourly outcome grader (:30), hourly reality-check sweep (:42), hourly digest, a daily brief (14:00 UTC), a Friday brief, and a weekly leadership ledger (Mon 06:00).

The stack

Next.js 16 (App Router), React 19, Prisma 7 over Neon Postgres, NextAuth 5, OpenAI + AWS Bedrock, Tailwind v4, Sentry — on Vercel. ~530 test files, most architecture-enforcing guards.

Predictive depth, plus the discipline to be honest about uncertainty.

FieldView captures its own data, trains its own model per customer, explains every prediction, watches what happens next, grades itself, recalibrates weekly, and — the part almost nobody builds — refuses to state a number it hasn't earned. That combination is the whole game in clinical analytics. This page is the proof it's already been built.

Every number, formula, threshold & schema on this page is extracted from live, shipped code