FieldView

A self-calibrating intelligence engine for enterprise sales — and a working proof of what disciplined data analytics looks like.
The one-line version: FieldView captures everything a sales team does automatically, predicts which deals will be won with a machine-learning model it trains on itself, explains every prediction, and is disciplined enough to tell you when it doesn't have enough data to be sure.
0
Prisma data models
0
Analytics engines
0
ML features · explainable
weekly
Recalibration cadence
self-grading
Closed learning loop
At a glance
What it is
A presales / sales-engineering operating system running predictive intelligence on auto-captured CRM + calendar activity
Data models
84 Prisma models, every one multi-tenant isolated
Intelligence
50+ distinct analytics engines across 9 families, running on every sync
The ML core
Logistic-regression win-probability model — 12 features, cross-validated, Brier-scored, fully explainable, trained per customer
The loop
Predict → expose → act → grade → recalibrate weekly. A closed learning loop, not a static rules engine
The discipline
Append-only ledgers, sample floors, confirmation gates, cold-state disclosure — all enforced by tests in CI
Stack
Next.js 16, React 19, Prisma 7, Postgres (Neon), OpenAI + AWS Bedrock, deployed on Vercel
Test suite
~530 test files — most of them architecture-enforcing guards, not happy-path checks
§1 · The thesis

What makes this genuinely hard

Most "AI for X" products are a thin model bolted onto a form. FieldView made three structural commitments that force real inference everywhere — the same three that separate a serious clinical-analytics product from a dashboard.

① ZERO ENTRY

The signal is derived, never typed

Required fields are forbidden. There's no "rate this deal's health" form to fall back on — every score, flag, and prediction is inferred from passively captured activity. The moment users hand-enter the truth, data quality collapses to whatever they bothered to type. So the system computes the truth from behavior.

② GRADES ITSELF

The product corrects its own model

It records every prediction, watches what the operator did about it, watches what actually happened — then retunes its own decision thresholds every week based on whether it was right. The model running next month is not the one running today; reality corrected it in between.

③ REFUSES TO LIE

It stays quiet when the data is thin

When data is insufficient it renders insufficient sample — 12 of 40 instead of a screenshot-able percentage. Every learning rollup is gated behind operator confirmation, so the model never trains on its own guesses. Enforced by append-only ledgers, hard sample floors, and static-analysis tests that fail the build.

§2 · The data substrate

An auto-capture pipeline in 8 layers

Everything downstream feeds off a sync that runs every 15 minutes, per customer, pulling from Salesforce and Google Calendar and writing a normalized activity graph. No human touches it.

01
Sync engine every 15 min · atomically locked
Salesforce (Accounts, Opportunities, Events, Tasks) + Google Calendar
02
Matching engine 3-tier confidence
Stamps every activity HIGH / MEDIUM / LOW / UNMATCHED
03
Intelligence engines ~50 in sequence
The analytics layer — health, risk, behavioral, prescriptive — see §3
04
Lifecycle tracker
Auto-detect evaluation-stage deals, stall scan
05
Evaluation workspace
Evidence tree, validation status
06
Learning loop
Predict → expose → act → grade → calibrate
07
Ritual & briefs
Weekly ritual, automated executive briefs
08
Dashboards
Operator cockpit, leadership, deal room

Layer 2 — the matching-tier decision

TIER 1 · HIGH
Direct link
CRM foreign-key link — unambiguous
TIER 2 · MEDIUM
Domain → account
Attendee email domain → best-fit deal. Downgrades to LOW when 3+ candidates
TIER 3 · LOW
Name match
Exact name (≥5 chars, no substring)
UNMATCHED
No confident link
Held out of scoring rather than guessed

Atomic locking

Before any work, the orchestrator atomically claims the job — SET status = SYNCING WHERE status = IDLE. If zero rows update, another sync is already running and it backs off cleanly. Stuck syncs older than 10 minutes auto-recover on the next tick. No double-processing, ever.

Stage isolation

From Layer 5 onward, every stage is wrapped in its own try/catch. A failure in the matching engine cannot abort scoring, summaries, or detection downstream. This is a rule enforced by CI — not a convention someone might forget.

§3 · The breadth

50+ analytics engines on every sync

Each engine is a mostly-pure function: it reads the deal plus its activity graph, computes a value, and writes it back so the UI renders from a single query. This is the part that makes people say "wait — this isn't a CRM skin." Hover any tile to flip it and read the method.

A representative catalog grouped by the six engine families — every name is drawn directly from the live codebase.

§4 · The depth

A real, explainable, calibrated ML model

The centerpiece — and deliberately not a black box. The win-probability engine is pure logistic regression written from scratch: no external ML library, no opaque service. Every coefficient and threshold is auditable in source.

The model

// logistic regression, batch gradient descent + L2 sigmoid(x) = 1 / (1 + e^(−x)) // clipped ±500 P(win) = sigmoid( w · x_norm + b ) learning rate η = 0.1 L2 lambda λ = 0.1 max iterations = 500 w[j] −= η·( gradW[j]/n + λ·w[j] ) b −= η·( gradB/n )

Persisted with its own honesty metrics

{ "accuracy": 75, // cross-validated % "brierScore": 0.18, // 0 = perfect "trainingSize": 40, "wonCount": 18, "lostCount": 22, "winRate": 45, // base rate % "trainedAt": "2026-05-01" }

The 12-feature vector

#FeatureMeaningImpute
0healthScoreComposite deal health50
1threadingScoreStakeholder coverage30
2championHealthKey-contact engagement40
3committeeScoreBuying-group breadth30
4dysfunctionScoreDecision friction20
5technicalDebtUnresolved eval gap30
6activityCount30dMeetings + calls (30d)3
7meetingCount30dMeetings only (30d)2
8closeDatePushTimeline slippage0
9uniqueStakeholdersDistinct externals2
10stageProgressPipeline position0.35
11totalActivitiesLifetime engagement5

Every feature is min-max normalized to [0,1] against stats persisted with the model. Out-of-range inputs clip to the training range; missing values impute to the training median.

The discipline in training

Refuses to train on too little. Requires ≥30 closed deals with full outcome snapshots, AND ≥8 won AND ≥8 lost — a balanced set. It then rejects its own trained model if accuracy doesn't beat the naïve majority-class base rate. A model no better than guessing the common case never ships.

Cross-validated & per-customer. Leave-one-out below 50 samples, 5-fold above — folds seeded with a deterministic PRNG so every run is reproducible. Each organization trains its own model on its own outcomes. No shared weights. True multi-tenant isolation.

Every prediction is decomposable

The UI shows the top forces pushing toward a win and toward a loss, rebuilt live at read time so the explanation always matches the deal's current state: contribution[i] = weight[i] × (norm[i] − 0.5)

0%
P(win) · this deal
◀ pushing toward losspushing toward win ▶

The immutable prediction ledger

Every sync writes one immutable row per deal per UTC day — the probability and the exact feature vector that produced it. You can reconstruct any historical prediction from the honest inputs that existed at the time. Never recomputed retroactively, never silently revised. Batch inference eliminated an N+1 — one grouped read per aggregate instead of five queries per deal.

WinProbSnapshot winProbability Float // [0,1], 2dp featureVector Json // frozen at prediction time recordedDate DateTime // midnight UTC = dedup anchor @@unique([orgId, opportunityId, recordedDate])
§5 · The loop

Predict → expose → act → grade → recalibrate

The moat — what separates "a model" from "a system that gets smarter." FieldView closes the loop on every prediction it makes. Watch the token travel; it cannot reach grading until the confirmation gate stamps it.

Predict
Logged with type, value, timestamp, numeric metadata — HEALTH_SCORE, STALL_ALERT, RISK_SIGNAL, TRAJECTORY…
Expose
Every time a prediction is shown: surface, mode, dwell time. 30-min dedup debounce.
Act
The move the operator executed, with observed result (HELPED / NO_CLEAR_EFFECT…).
Infer
A passive cron links action → outcome in fixed windows: action ≤48h, outcome ≤10d.
Confirmation gate
An inferred link only counts toward learning if operator-CONFIRMED. Auto-grading noise can never pollute truth.
Grade
correct / correct_but_mitigated / wrong / partial / not_judgeable.
Calibrate
Weekly sweep retunes each threshold, logs every attempt to an append-only ledger. ⟲

The windows & gates

MechanismValue
Action inference window48h (2h warm-up)
Outcome inference window10d
Exposure dedup debounce30 min
Inference batch size500 rows
Multi-signal attributionmost-recent prior exposure wins

Per-type grading windows

Prediction typeJudgeStaleMitig.
HEALTH_SCORE14d90d14d
STALL_ALERT14d60d14d
RISK_SIGNAL14d60d14d
TRAJECTORY14d90d14d
POV_SIGNAL21d90d21d
SYNTHESIS_PATTERN30d120d21d

The clever grade — correct_but_mitigated. Suppose the engine fires a pessimistic warning (AT_RISK, STALLED) and the deal then wins. Naïvely that's "wrong." But if the operator completed a follow-up inside the mitigation window and the health score climbed ≥10 points, the system credits the warning: the alarm was right, and acting on it is why the outcome flipped.

accuracy = (correct + correct_but_mitigated) / judgeable

correct correct_but_mitigated wrong partial not_judgeable · never in a denominator

A system that punished its own true-positive warnings for being acted upon would train itself to stop warning. This one doesn't. The clinical parallel is exact: an early-deterioration alert that triggers an intervention and prevents harm is a success, not a false alarm.

§6 · The conscience

Calibration & the honesty architecture

If there's one section to linger on with a healthcare CEO, it's this. Anyone can ship a number. The trust-defining engineering is knowing when you're not allowed to.

The weekly calibration sweep

Click a lime node — a week where a threshold actually moved — to see previous → proposed → final (clamped). Grey nodes are sweeps that ran and held steady; the ledger records those too.

Governed threshold registry

ThresholdDefaultMinMaxFloor
HEALTH_SCORE30155020
STALL_ALERT14d72820
CHAMPION_GONE14d72815
STAGE_STAGNATION30d146015
FOLLOW_UP_GAP5d31410

Adopts a new threshold only if it differs and strictly beats the incumbent. Ties never displace what's already there. No customer-specific threshold publishes below ≥20 graded outcomes — until then the UI shows a transparent cold state.

Discipline, enforced by CI

Sample floors everywhere. MIN_FOR_PUBLISHED_RATE = 40 (±15pp), MIN_FOR_TIGHT_RATE = 96 (±10pp, hero metrics), RESOLVED_PREDICTION_FLOOR = 150 before any accuracy claim. Below floor → a cold state, never a fabricated number.

Correlational language only. Precedent citations say "followed by progress in N of M cases" — never "worked" or "caused." A banned-phrase test fails the build on any causal claim the data can't support.

Append-only is real. Eleven ledgers are never updated or deleted. State changes use explicit "mark, don't remove" columns.

Static analysis is architecture. A test grep-walks every server action and fails CI if any query omits the tenant ID — the structural guarantee one customer can never see another's data.

Two append-only ledgers, by design

ThresholdHistory

Records only the thresholds that changed: previous → proposed → final (clamped). Idempotent on (orgId, sweepId, thresholdKey). An auditable lineage of how the engine retuned itself over time.

CalibrationSweepHistory

Records every sweep — including "ran, nothing changed" — with a deterministic narrative headline. Timeline honesty: you can prove it considered a change and declined, not that it was idle.

Real sweep headlines from the code: 6 thresholds evaluated · 2 changed · 1 clamped · no material changes · insufficient resolved outcomes (12) — skipped

§7 · The governance layer

Cost, memory, and grounding

Three subsystems keep the AI features honest, cheap, and consistent — each built to fail safe rather than fail loud.

LLM cost — metered, fail-closed

Every model call is metered to an append-only ledger — model, stable label, outcome, token counts, cost — and never stores prompt or response text. Cost is tracked in integer micro-dollars (1 USD = 1,000,000 µUSD) to avoid float drift.

Fail-closed daily breaker: before each call it sums today's spend against budget. Over? The call is refused before any money is spent and logged BUDGET_BLOCKED. Soft-fail — it returns nothing rather than throwing, so a budget stop never breaks the UI.

ModelInput µUSD/1MOutput µUSD/1M
gpt-4o-mini150,000600,000
gpt-4o2,500,00010,000,000

Unknown models fall back to gpt-4o-mini pricing (conservative). External text is fenced in <source_data> tags with an untrusted-source notice — prompt-injection hardening.

Memory arbitration — one truth per screen

Seven independent composers each produce one candidate line; a single deterministic arbiter picks exactly one to speak (or stays silent). No scoring, no ML — a strict precedence cascade that guarantees two contradictory "truths" never appear on one screen.

▼ exactly one truth per surface

POV claim verification — 3-path grounded fallback

When a sales engineer verifies a customer-facing claim ("does it support SAML SCIM?"), the verifier tries three paths in order, so the result is never worse than the plainest one.

OpenAI web search — a single query against the current online docs, grounded in the source URLs the model cites.
Bedrock RAG — KB retrieve → cohere.rerank-v3-5 → quality gate (MIN_USEFUL_SCORE 0.15) → grounded synthesis with citations.
OpenAI-direct fallback — parametric answer, empty sources, explicitly stamped MODEL_KNOWLEDGE_NOTICE so it's never mistaken for a grounded verdict.

The verdict set is deliberately graded — not a binary:

VERIFIED PARTIALLY_VERIFIED NOT_VERIFIED CONTRADICTED INSUFFICIENT_EVIDENCE NEEDS_HUMAN_REVIEW
§8 · The executive layer

Analytics that roll up to a decision

The intelligence rolls up into one trustworthy number. The flagship is the forecast-truth waterfall: it discounts raw pipeline down to what's actually been validated by evidence, first-match-wins, each step subtracting from the remaining dollars.

Forecast-truth waterfall

Quality-adjusted haircut tightening 31%23% over 12 weeks. The same projectForecastTruthWaterfall helper draws this on the dashboard and persists it to the weekly ledger — byte-identical, so the live view can never drift from the record.

Metric
12 wks ago
Last week
Raw pipeline
$8.4M
$9.6M
Quality-adjusted
$5.8M
$7.4M
Buyer-validated
$1.9M
$4.6M
Unmatched signoff
9
5
Customer-matched
3
12
Deals at risk
$4.5M
$3.0M

Trust is earned, not asserted

An org climbs a 4-tier ladder; the strength of language the product is allowed to use scales with it. Each rung is gated on real volume.

Buildingcold start
Observation only
Directional≥25 closed · ≥50 graded
"Directional" numeric hints
Trusted≥40 closed · ≥150 graded · ≥90d
Published accuracy claims
Compounding≥100 · ≥250 · ≥180d · 7 hygiene gates
Full moat narrative
hygiene gate at risk · 14-day grace, then downgrade

Even the credibility of the system's own self-description is governed — a compounding org that lets a gate slip gets a soft warning, then a hard downgrade after 60 days below floor.

"89% of the 4-week-old forecast has landed or is still on track; the quality haircut eased 2.1pp to 14% — the discount on raw pipeline is shrinking; prediction trust tier: Trusted." — the forecast-honesty line, narrated in one executive sentence.

§9 · Why this matters for telemedicine

The same machine, a clinical domain

Strip away the sales domain and look at the machine. FieldView is structurally a real-world-signal-to-trusted-prediction engine with a self-correcting feedback loop and a conscience. That is exactly the shape of a serious telehealth analytics platform. The mapping is one-to-one.

What FieldView does
The direct telemedicine analog

The takeaway: the genuinely hard problems in clinical analytics aren't "can you train a model" — they're can you make a prediction someone will safely act on, prove what you predicted and why, score yourself honestly, and know when to stay quiet. FieldView is a shipped demonstration that the person who built it has already solved all four.

Note on scope, kept honest: FieldView itself is a sales product and makes no HIPAA / SOC 2 / clinical-compliance claims. What transfers is the architecture and the discipline — the predictive pipeline, the calibration loop, the audit ledgers, the honesty gates — not a compliance certification.

§10 · Under the hood

Engineering rigor, for the skeptics

Multi-tenancy is absolute

All 84 models carry a tenant ID; every query filters on it; raw SQL is banned; a static-analysis test grep-walks every server action and fails CI if any query omits the tenant scope.

Encryption at rest

All third-party OAuth tokens (Salesforce, Google, Slack) are AES-256-GCM encrypted; decryption throws if the key is unset in production.

Guarded server actions

Mutations route through guarded server actions, each calling requireSession() / requireAdmin() / requireOwner() — not loose API endpoints.

Hydration discipline

No wall-clock or randomness in client render paths; all locale formatting pinned to UTC; each rule backed by a regression test born from a real production incident.

8 scheduled crons

15-min sync, hourly exposure inference (:17), hourly outcome grader (:30), hourly reality-check sweep (:42), hourly digest, a daily brief (14:00 UTC), a Friday brief, and a weekly leadership ledger (Mon 06:00).

The stack

Next.js 16 (App Router), React 19, Prisma 7 over Neon Postgres, NextAuth 5, OpenAI + AWS Bedrock, Tailwind v4, Sentry — on Vercel. ~530 test files, most architecture-enforcing guards.

Predictive depth, plus the discipline to be honest about uncertainty.

FieldView captures its own data, trains its own model per customer, explains every prediction, watches what happens next, grades itself, recalibrates weekly, and — the part almost nobody builds — refuses to state a number it hasn't earned. That combination is the whole game in clinical analytics. This page is the proof it's already been built.

Every number, formula, threshold & schema on this page is extracted from live, shipped code