FieldView
A self-calibrating intelligence engine for enterprise sales — and a working proof of what disciplined data analytics looks like.What makes this genuinely hard
Most "AI for X" products are a thin model bolted onto a form. FieldView made three structural commitments that force real inference everywhere — the same three that separate a serious clinical-analytics product from a dashboard.
The signal is derived, never typed
Required fields are forbidden. There's no "rate this deal's health" form to fall back on — every score, flag, and prediction is inferred from passively captured activity. The moment users hand-enter the truth, data quality collapses to whatever they bothered to type. So the system computes the truth from behavior.
The product corrects its own model
It records every prediction, watches what the operator did about it, watches what actually happened — then retunes its own decision thresholds every week based on whether it was right. The model running next month is not the one running today; reality corrected it in between.
It stays quiet when the data is thin
When data is insufficient it renders insufficient sample — 12 of 40 instead of a screenshot-able percentage. Every learning rollup is gated behind operator confirmation, so the model never trains on its own guesses. Enforced by append-only ledgers, hard sample floors, and static-analysis tests that fail the build.
An auto-capture pipeline in 8 layers
Everything downstream feeds off a sync that runs every 15 minutes, per customer, pulling from Salesforce and Google Calendar and writing a normalized activity graph. No human touches it.
Layer 2 — the matching-tier decision
Atomic locking
Before any work, the orchestrator atomically claims the job — SET status = SYNCING WHERE status = IDLE. If zero rows update, another sync is already running and it backs off cleanly. Stuck syncs older than 10 minutes auto-recover on the next tick. No double-processing, ever.
Stage isolation
From Layer 5 onward, every stage is wrapped in its own try/catch. A failure in the matching engine cannot abort scoring, summaries, or detection downstream. This is a rule enforced by CI — not a convention someone might forget.
50+ analytics engines on every sync
Each engine is a mostly-pure function: it reads the deal plus its activity graph, computes a value, and writes it back so the UI renders from a single query. This is the part that makes people say "wait — this isn't a CRM skin." Hover any tile to flip it and read the method.
A representative catalog grouped by the six engine families — every name is drawn directly from the live codebase.
A real, explainable, calibrated ML model
The centerpiece — and deliberately not a black box. The win-probability engine is pure logistic regression written from scratch: no external ML library, no opaque service. Every coefficient and threshold is auditable in source.
The model
Persisted with its own honesty metrics
The 12-feature vector
| # | Feature | Meaning | Impute |
|---|---|---|---|
| 0 | healthScore | Composite deal health | 50 |
| 1 | threadingScore | Stakeholder coverage | 30 |
| 2 | championHealth | Key-contact engagement | 40 |
| 3 | committeeScore | Buying-group breadth | 30 |
| 4 | dysfunctionScore | Decision friction | 20 |
| 5 | technicalDebt | Unresolved eval gap | 30 |
| 6 | activityCount30d | Meetings + calls (30d) | 3 |
| 7 | meetingCount30d | Meetings only (30d) | 2 |
| 8 | closeDatePush | Timeline slippage | 0 |
| 9 | uniqueStakeholders | Distinct externals | 2 |
| 10 | stageProgress | Pipeline position | 0.35 |
| 11 | totalActivities | Lifetime engagement | 5 |
Every feature is min-max normalized to [0,1] against stats persisted with the model. Out-of-range inputs clip to the training range; missing values impute to the training median.
The discipline in training
Refuses to train on too little. Requires ≥30 closed deals with full outcome snapshots, AND ≥8 won AND ≥8 lost — a balanced set. It then rejects its own trained model if accuracy doesn't beat the naïve majority-class base rate. A model no better than guessing the common case never ships.
Cross-validated & per-customer. Leave-one-out below 50 samples, 5-fold above — folds seeded with a deterministic PRNG so every run is reproducible. Each organization trains its own model on its own outcomes. No shared weights. True multi-tenant isolation.
Every prediction is decomposable
The UI shows the top forces pushing toward a win and toward a loss, rebuilt live at read time so the explanation always matches the deal's current state: contribution[i] = weight[i] × (norm[i] − 0.5)
The immutable prediction ledger
Every sync writes one immutable row per deal per UTC day — the probability and the exact feature vector that produced it. You can reconstruct any historical prediction from the honest inputs that existed at the time. Never recomputed retroactively, never silently revised. Batch inference eliminated an N+1 — one grouped read per aggregate instead of five queries per deal.
Predict → expose → act → grade → recalibrate
The moat — what separates "a model" from "a system that gets smarter." FieldView closes the loop on every prediction it makes. Watch the token travel; it cannot reach grading until the confirmation gate stamps it.
The windows & gates
| Mechanism | Value |
|---|---|
| Action inference window | 48h (2h warm-up) |
| Outcome inference window | 10d |
| Exposure dedup debounce | 30 min |
| Inference batch size | 500 rows |
| Multi-signal attribution | most-recent prior exposure wins |
Per-type grading windows
| Prediction type | Judge | Stale | Mitig. |
|---|---|---|---|
| HEALTH_SCORE | 14d | 90d | 14d |
| STALL_ALERT | 14d | 60d | 14d |
| RISK_SIGNAL | 14d | 60d | 14d |
| TRAJECTORY | 14d | 90d | 14d |
| POV_SIGNAL | 21d | 90d | 21d |
| SYNTHESIS_PATTERN | 30d | 120d | 21d |
The clever grade — correct_but_mitigated. Suppose the engine fires a pessimistic warning (AT_RISK, STALLED) and the deal then wins. Naïvely that's "wrong." But if the operator completed a follow-up inside the mitigation window and the health score climbed ≥10 points, the system credits the warning: the alarm was right, and acting on it is why the outcome flipped.
accuracy = (correct + correct_but_mitigated) / judgeable
A system that punished its own true-positive warnings for being acted upon would train itself to stop warning. This one doesn't. The clinical parallel is exact: an early-deterioration alert that triggers an intervention and prevents harm is a success, not a false alarm.
Calibration & the honesty architecture
If there's one section to linger on with a healthcare CEO, it's this. Anyone can ship a number. The trust-defining engineering is knowing when you're not allowed to.
The weekly calibration sweep
Governed threshold registry
| Threshold | Default | Min | Max | Floor |
|---|---|---|---|---|
| HEALTH_SCORE | 30 | 15 | 50 | 20 |
| STALL_ALERT | 14d | 7 | 28 | 20 |
| CHAMPION_GONE | 14d | 7 | 28 | 15 |
| STAGE_STAGNATION | 30d | 14 | 60 | 15 |
| FOLLOW_UP_GAP | 5d | 3 | 14 | 10 |
Adopts a new threshold only if it differs and strictly beats the incumbent. Ties never displace what's already there. No customer-specific threshold publishes below ≥20 graded outcomes — until then the UI shows a transparent cold state.
Discipline, enforced by CI
Sample floors everywhere. MIN_FOR_PUBLISHED_RATE = 40 (±15pp), MIN_FOR_TIGHT_RATE = 96 (±10pp, hero metrics), RESOLVED_PREDICTION_FLOOR = 150 before any accuracy claim. Below floor → a cold state, never a fabricated number.
Correlational language only. Precedent citations say "followed by progress in N of M cases" — never "worked" or "caused." A banned-phrase test fails the build on any causal claim the data can't support.
Append-only is real. Eleven ledgers are never updated or deleted. State changes use explicit "mark, don't remove" columns.
Static analysis is architecture. A test grep-walks every server action and fails CI if any query omits the tenant ID — the structural guarantee one customer can never see another's data.
Two append-only ledgers, by design
ThresholdHistory
Records only the thresholds that changed: previous → proposed → final (clamped). Idempotent on (orgId, sweepId, thresholdKey). An auditable lineage of how the engine retuned itself over time.
CalibrationSweepHistory
Records every sweep — including "ran, nothing changed" — with a deterministic narrative headline. Timeline honesty: you can prove it considered a change and declined, not that it was idle.
Real sweep headlines from the code: 6 thresholds evaluated · 2 changed · 1 clamped · no material changes · insufficient resolved outcomes (12) — skipped
Cost, memory, and grounding
Three subsystems keep the AI features honest, cheap, and consistent — each built to fail safe rather than fail loud.
LLM cost — metered, fail-closed
Every model call is metered to an append-only ledger — model, stable label, outcome, token counts, cost — and never stores prompt or response text. Cost is tracked in integer micro-dollars (1 USD = 1,000,000 µUSD) to avoid float drift.
Fail-closed daily breaker: before each call it sums today's spend against budget. Over? The call is refused before any money is spent and logged BUDGET_BLOCKED. Soft-fail — it returns nothing rather than throwing, so a budget stop never breaks the UI.
| Model | Input µUSD/1M | Output µUSD/1M |
|---|---|---|
| gpt-4o-mini | 150,000 | 600,000 |
| gpt-4o | 2,500,000 | 10,000,000 |
Unknown models fall back to gpt-4o-mini pricing (conservative). External text is fenced in <source_data> tags with an untrusted-source notice — prompt-injection hardening.
Memory arbitration — one truth per screen
Seven independent composers each produce one candidate line; a single deterministic arbiter picks exactly one to speak (or stays silent). No scoring, no ML — a strict precedence cascade that guarantees two contradictory "truths" never appear on one screen.
POV claim verification — 3-path grounded fallback
When a sales engineer verifies a customer-facing claim ("does it support SAML SCIM?"), the verifier tries three paths in order, so the result is never worse than the plainest one.
cohere.rerank-v3-5 → quality gate (MIN_USEFUL_SCORE 0.15) → grounded synthesis with citations.MODEL_KNOWLEDGE_NOTICE so it's never mistaken for a grounded verdict.The verdict set is deliberately graded — not a binary:
Analytics that roll up to a decision
The intelligence rolls up into one trustworthy number. The flagship is the forecast-truth waterfall: it discounts raw pipeline down to what's actually been validated by evidence, first-match-wins, each step subtracting from the remaining dollars.
Forecast-truth waterfall
Quality-adjusted haircut tightening 31% → 23% over 12 weeks. The same projectForecastTruthWaterfall helper draws this on the dashboard and persists it to the weekly ledger — byte-identical, so the live view can never drift from the record.
Trust is earned, not asserted
An org climbs a 4-tier ladder; the strength of language the product is allowed to use scales with it. Each rung is gated on real volume.
Even the credibility of the system's own self-description is governed — a compounding org that lets a gate slip gets a soft warning, then a hard downgrade after 60 days below floor.
"89% of the 4-week-old forecast has landed or is still on track; the quality haircut eased 2.1pp to 14% — the discount on raw pipeline is shrinking; prediction trust tier: Trusted." — the forecast-honesty line, narrated in one executive sentence.
The same machine, a clinical domain
Strip away the sales domain and look at the machine. FieldView is structurally a real-world-signal-to-trusted-prediction engine with a self-correcting feedback loop and a conscience. That is exactly the shape of a serious telehealth analytics platform. The mapping is one-to-one.
The takeaway: the genuinely hard problems in clinical analytics aren't "can you train a model" — they're can you make a prediction someone will safely act on, prove what you predicted and why, score yourself honestly, and know when to stay quiet. FieldView is a shipped demonstration that the person who built it has already solved all four.
Note on scope, kept honest: FieldView itself is a sales product and makes no HIPAA / SOC 2 / clinical-compliance claims. What transfers is the architecture and the discipline — the predictive pipeline, the calibration loop, the audit ledgers, the honesty gates — not a compliance certification.
Engineering rigor, for the skeptics
Multi-tenancy is absolute
All 84 models carry a tenant ID; every query filters on it; raw SQL is banned; a static-analysis test grep-walks every server action and fails CI if any query omits the tenant scope.
Encryption at rest
All third-party OAuth tokens (Salesforce, Google, Slack) are AES-256-GCM encrypted; decryption throws if the key is unset in production.
Guarded server actions
Mutations route through guarded server actions, each calling requireSession() / requireAdmin() / requireOwner() — not loose API endpoints.
Hydration discipline
No wall-clock or randomness in client render paths; all locale formatting pinned to UTC; each rule backed by a regression test born from a real production incident.
8 scheduled crons
15-min sync, hourly exposure inference (:17), hourly outcome grader (:30), hourly reality-check sweep (:42), hourly digest, a daily brief (14:00 UTC), a Friday brief, and a weekly leadership ledger (Mon 06:00).
The stack
Next.js 16 (App Router), React 19, Prisma 7 over Neon Postgres, NextAuth 5, OpenAI + AWS Bedrock, Tailwind v4, Sentry — on Vercel. ~530 test files, most architecture-enforcing guards.
Predictive depth, plus the discipline to be honest about uncertainty.
FieldView captures its own data, trains its own model per customer, explains every prediction, watches what happens next, grades itself, recalibrates weekly, and — the part almost nobody builds — refuses to state a number it hasn't earned. That combination is the whole game in clinical analytics. This page is the proof it's already been built.