Confidential · Hypernym Labs · R15

Ship α validation now. Train β from α traces later. The unit between them is the Verified Transition Cell.

R15 is a strategic pivot from R7-R14's depth-first architecture rounds to a balanced view: 90-day product wedges that ship to one buyer who hands you money, plus 18-36 month zero-to-one R&D bets, plus moat analysis cross-cutting both. The 6-model panel converged 6/6 on the structural primitive that ties wedges to bets — every Track A product emits trace data that becomes Track D's training corpus.

The primitive is the Verified Transition Cell — a typed unit of valid state change carrying preconditions, postconditions, invariants, confidence, and cryptographic provenance. Frontier models propose; Hypernym validates; committed VTCs become composable units for long-horizon simulation. This is what Hypernym ships once R14's Substrate Lemma propagation engine has substrate density to compound on.

6/6

Panel agreement on the structural primitive — same shape, six different names.

Track A wedges ranked by speed-to-customer × moat strength.

Track B R&D bets — VTC #1, Substrate Lemma #2.

α-then-β recommendation — ship validation now, train substrate-native model from α traces later.

Panel-Convergent Synthesis · 6/6

The missing primitive is not a larger model, not a latent world model, and not a simulator kernel. It is a typed transition object that can be proposed by any model or simulator but accepted only if substrate evidence, invariants, confidence, and provenance all close. Hypernym validates; everyone else proposes. That separation is the moat.

01 · Six Models · One Object

The 6-model panel.

Top-tier from each provider. NDA-bound. First-principles only. Six models reasoning independently arrived at the same structural object with six different names — same convergence pattern as R14's Substrate Lemma.

Codex · Synthesis pick

Verified Transition Cell

117 KB · Form A

Claude

Grounded Rollout

91 KB · Form B

Gemini-3.1

Grounded Simulation Step

20 KB · Form A

Grok-4.20

Simulation Kernel

17 KB · Form B

Qwen3-Q8

Substrate-Driven Sim Kernel

20 KB · Form A

Gemma-4 (MLX)

Causal Substrate Frame

17 KB · Form A

02 · What 6/6 Agreed On

The convergence is on sequencing, not the wedge.

The strongest 6/6 agreement

Codex, Claude, Gemini, Grok, Qwen, and Gemma all rejected a pure foundation-model-first strategy and converged on γ: ship substrate validation over existing models now, use the resulting traces and failures as the substrate-native training corpus later. Existing frontier models and scientific simulators are useful proposers, but weak validators of their own transitions.

Track A consensus: 6/6 included a verifier or trace-grading product. 6/6 included context compression / pre-routing. 6/6 included Modulum Router or inference SaaS. 5/6 included persistent memory. 5/6 included domain endpoints. 5/6 included false-positive elimination. 4/6 included Forge OS Solo or IDE plugin. The shared ranking logic was speed to first customer, substrate data generated, and whether the product can be bought by one developer or one team without a network.

Moat consensus: 6/6 honest that Modulum Router and generic inference are revenue wedges, not durable moats unless M5 cost-quality claims remain protected and measurable. 5/6 treated Omnifact verification as medium-to-strong because the generic API is copyable but substrate provenance and domain substrate are harder. 6/6 placed the deepest moat in VTC, Substrate Lemma propagation, Federation Protocol, and domain substrate density.

R16 split: Codex pushed VTC; Claude pushed Substrate Distribution Physics; Gemini and Qwen leaned temporal/counterfactual; Grok pushed world-model continuation; Gemma pushed temporal substrate + counterfactual fork. Synthesis: R16 is "VTC + temporal/counterfactual closure" — subsumes all surfaced candidates into one falsifiable architecture round.

03 · Track A — Wedges

Ten ranked wedges. One-buyer = full value.

Wedges where one customer hands you money in 2-16 weeks. Ranked by panel-converged speed-to-customer × moat strength. Each scored on the data corpus it generates for VTC training (the γ data flywheel).

#	Wedge	Revenue	TTC	Build	Moat	Panel
1	Omnifact Verify / Trace-Grade API	per-call + dashboard	2-8 wk	4-8 ew	4 / strong	6/6
2	Context Pre-Router / Pre-Compressor	per-token saved	3-10 wk	4-10 ew	4 / strong	6/6
3	Persistent Memory API	per-seat + storage	4-10 wk	6-12 ew	3 / medium	5/6
4	Domain-Precision Endpoint (legal · biomed · finance)	per-token + ent. min	6-16 wk	8-16 ew/dom	4 / strong	6/6
5	Fine-Grained Citation API	per-call / per-doc	6-12 wk	6-12 ew	3 / medium	4/6
6	False-Positive Eliminator	per-call / outcome	4-16 wk	5-18 ew	4 / strong	5/6
7	Modulum Inference SaaS	per-token	6-20 wk	10-24 ew	2 / weak	6/6
8	Modulum Router	per-token margin	3-18 wk	6-20 ew	2 / weak	6/6
9	Substrate Audit Replay	enterprise license	12-24 wk	10-24 ew	4 / strong	3/6
10	Forge OS Solo / IDE Magic	per-seat	6-20 wk	12-18 ew	3 / medium	5/6

The decision rule: ship weak-moat wedges only when they feed a strong-moat asset. A router that merely routes tokens is weak; a router that learns cost per verified transition across domains is useful. A memory API that stores summaries is weak; a memory API that builds grounded, invalidatable project substrate is useful. A citation API that attaches links is weak; a citation API that decomposes claims into reusable provenance cells is useful.

04 · Track B — R&D Bets

Seven ranked moonshots. 18-36 month horizon.

#	Bet	Magnitude	Feasibility	Moat	Panel
1	Verified Transition Cell simulation stack	category-defining → civilizational	medium	very high	6/6
2	Substrate Lemma propagation engine (R14 carry)	category-defining	medium-high	very high	6/6
3	Substrate-native Modulum scaling (β)	category-defining	medium	high	4/6
4	Hallucination-zero domain systems	category-defining in verticals	medium-high	high	5/6
5	Federation Protocol cryptographic spec	standard-setting	medium	very high if adopted	4/6
6	Targeted scientific instruments	category-defining	medium	high	5/6
7	Composition Type Theory	foundational	medium-low	high but indirect	4/6

VTC ranks #1 because it's the missing unit that makes "simulation-grade targeted intelligence" operational. If it works, Hypernym becomes a transition-validation layer across frontier models, simulators, and later substrate-native models. Substrate Lemma stays #2 as core but coordination-heavy — its moat improves only after Track A produces substrate density. Substrate-native Modulum (β) sits at #3 — fund it, don't overfund before α products create VTC traces and failure taxonomies.

05 · Track C — Moat Analysis

Four moat archetypes. Be honest about which you have.

Durable structural moats require: proprietary substrate that improves with use, patentable transition/validation mechanics, regulatory or audit embedding, or standard/protocol adoption. Execution moats are not enough in AI infrastructure — well-funded competitors copy API shape quickly.

Strong structural moats: VTC simulation stack · Substrate Lemma propagation · Federation Protocol after adoption · domain substrate in legal/biomed · Substrate Audit Replay · false-positive elimination in audited workflows. These create switching cost through accumulated substrate, replay obligations, calibration history, or cross-system protocol dependency.
Medium moats: Omnifact Verify · Trace-Grading · Context Pre-Router · Fine-Grained Citation · Persistent Memory · Domain-Precision Endpoint before deep customer data. These become strong only if they feed a proprietary corpus of claims, traces, refusals, and validated transitions. Without that corpus, they're features.
Weak fundamentals: Modulum Router · generic inference SaaS · generic IDE plugin · prompt safety gate · broad AI governance suite. Worth shipping if they generate cash, substrate data, distribution, or benchmarks. Kill them if they become support-heavy commodity software.

Avoid confusing patentability with monopoly. M5-conditioned routing, substrate validation, and transition cells may be filable, but patents do not replace proof. The commercial moat appears only when customers believe the measured output: lower false positives without higher false negatives, fewer unsupported claims at the same task completion rate, shorter context without recall loss, longer simulations without calibration drift. Every wedge ships with a benchmark harness as part of the product, not as a research afterthought.

The strongest compound loop: verify claims → log failures → classify failure modes → convert repeated failures into substrate lemmas → use lemmas to improve future verification → distill the resulting VTC corpus into substrate-native Modulum. Products that don't contribute to this loop are cash extraction or distribution experiments, not core strategy.

06 · Track D · The Primitive

Verified Transition Cell. Like the relation, the process, the commit.

Synthesis Pick

Verified Transition Cell (VTC) — Codex's name, panel-merged structure

All six panel models converged on the same structural object — a typed, provenance-bearing unit of valid state change. Six different names point at the same primitive. The synthesis pick is Verified Transition Cell: "verified" captures the validator role that differentiates Hypernym from frontier model rollouts, "transition" captures the core temporal/causal unit, "cell" captures atomicity and composition. "Kernel" overstates execution; "rollout" overstates sequence; "step" is too close to Grounded Step; "frame" is too static.

07 · Structural Fields

The VTC schema — merged from all 6 panel proposals.

VerifiedTransitionCell {
  id: CellID
  target_question: TypedQuestion
  domain: DomainType

  state_before: TypedWorldState
  state_after: TypedWorldState | TypedCounterfactualState
  action: Action | Intervention | Observation | Policy | MechanisticTransition
  transition_type: temporal | causal | counterfactual | observational | mechanistic | policy

  preconditions: Predicate[]
  postconditions: Predicate[]
  invariants: {
    hard: Invariant[]
    soft: Invariant[]
    scale_bridge: ScaleInvariant[]
  }

  substrate_inputs: {
    pds_refs: PDSRef[]
    grounded_steps: GroundedStepRef[]
    substrate_lemmas: SubstrateLemmaRef[]
    sensor_refs: SensorRef[]
    simulator_outputs: SimulatorOutputRef[]
    model_outputs: ModelOutputRef[]
    human_annotations: ExpertAnnotationRef[]
  }

  confidence: {
    score: Float[0,1]
    interval: ConfidenceInterval
    calibration_class: CalibrationClass
    provenance: ConfidenceProvenance
    irreducible_uncertainty: UncertaintyReport
  }

  validation: {
    verdict: valid | invalid | contradicted | underdetermined | refused
    failure_mode: ProposalError | SubstrateGap | InvariantViolation | CalibrationFailure | SensorConflict | None
    refusal_reason: String?
  }

  commitment: {
    content_hash: Hash
    parent_hashes: Hash[]
    signature: AttestationSignature
    audit_replay_ref: ReplayRef
  }

  composition: {
    parents: CellID[]
    children: CellID[]
    branch_id: BranchID
    timeline_id: TimelineID
    scale_level: ScaleLevel
    compatibility_rules: Rule[]
  }
}

Three load-bearing fields: failure_mode prevents the α-to-β corpus from becoming noisy (a bad proposal, stale evidence, missing substrate, and invariant bug should not train the same correction). scale_level prevents molecular claims from composing directly into organism-level claims without a declared bridge. commitment makes simulation replay auditable; without a hash chain, VTC becomes another unverifiable trace format.

Validity invariants: state_before and state_after must be typed under the same domain schema or a declared ScaleBridge. Hard invariants cannot be violated. Preconditions must be satisfied before application. Postconditions must be checkable or explicitly marked unobserved. Every accepted VTC must have a hash commitment over inputs, outputs, invariants, and validation trace. Composition is legal only when the first cell's state_after satisfies the second cell's preconditions and no invariant contradictions cross the boundary.

08 · Closed Algebra

Six operations. All return a VTC, a VTC trace, or a typed refusal.

compose(a, b)

Returns ordered trace if a.state_after satisfies b.preconditions; else invalid trace + violated condition.

branch(vtc, intervention)

Counterfactual cells sharing parent state, with declared changed-vs-held-fixed variables.

merge(branches)

Combines compatible branches only when states reconcile under invariants and confidence does not hide contradictions. Defaults to refusal.

revert(vtc)

Returns prior committed state and invalidates descendants if reverted cell was load-bearing.

attest(vtc)

Signs cell, source commitments, and validation verdict for audit replay.

query(trace, target)

Extracts supported, contradicted, underdetermined, or refused claims from a VTC trace.

Closure matters: if any operation returns untyped narrative, long-horizon simulation collapses back into hallucination. merge and revert need strict treatment in R16 — incompatible branches can look semantically compatible while hiding contradicted assumptions; substrate updates will invalidate earlier cells and a simulation platform that can't revoke descendants accumulates stale certainty.

09 · α / β / γ Recommendation

The 6/6 panel consensus is γ.

Validation layer over existing models

Hypernym sits as substrate-validation pre/post processor around GPT, Claude, Gemini, Llama, video-prediction, climate, multi-physics. Per-call API. Capital-light. Cross-stack. Hyperscaler-neutral. Insufficient alone — can't invent missing hypotheses, can't fix wrong latent ontology, can't recover unobserved causal variables.

Train fresh substrate-native foundation model

Atom-1.4B → 8B → 30-70B class, trained natively on substrate-typed scaffolds with M5 attention-mask conditioning baked in. Hypernym owns inference plane. Capital-heavy + hyperscaler-friction. β-first trains before Hypernym owns the right corpus.

α now → β trained from α traces ← PICK

Ship α products immediately. Log every verified, rejected, contradicted, refused transition as training data. Start β at narrow scales. Scale β only when it beats frontier-plus-validation on cost per valid transition, calibration error, refusal correctness, long-horizon trace integrity.

The recommended β target is not a general frontier model. It's a substrate-native proposer/validator optimized for domains where Hypernym owns substrate density. Atom-scale handles extraction, routing, compression, local validation. 8B-class powers domain endpoints and memory. 30-70B-class becomes relevant only after VTC traces prove enough signal for long-horizon reasoning. Trillion-scale training is not required if the company remains targeted rather than general.

10 · Simulation Products Enabled

Ten products that become possible only with VTC at the center.

Drug Adverse Event Simulator

Validates compound→pathway→physiology transitions. First customer: top-20 pharma or Osmium. Per-study license.

Climate Tipping-Point Confidence Engine

Validates existing climate-model transitions and assumptions. First customer: climate institution or government. Grant + enterprise license.

AV Counterfactual Safety Simulator

Validates counterfactuals for sensor failure, braking delay, occlusion, pedestrian motion. First customer: AV lab or regulator. Per-simulation or certification license.

Economic Policy Simulation Auditor

Audits structural claims in policy simulations rather than predicting the whole economy. First customer: treasury, central bank, think tank. Per-scenario license.

Bio-Defense Provenance Simulator

Cryptographically traces outbreak and intervention simulations. First customer: public-health or defense agency. Government contract.

Protein Edge-Case Validator

Checks proposed structures against invariants and known failure classes. First customer: protein design lab. Per-run.

Novel Chemistry Pathway Validator

Validates reaction plausibility and evidence. First customer: materials or chemistry lab. Per-pathway.

Emergency Response Simulator

Combines sensor streams, infrastructure, weather, policy actions. First customer: city, FEMA-like agency, insurer. Per-region deployment.

Long-Horizon Agent Simulation

Converts multi-day agent plans into checked transition traces. First customer: AI lab or enterprise agent team. Per-run infrastructure SaaS.

Multi-Physics Consistency Layer

Audits coupled solver outputs across thermal/fluid/structural/EM. First customer: aerospace or energy. Enterprise license.

The common product pattern is not "replace the domain simulator" — it's "wrap proposers with transition validity." That distinction keeps scope sane. Hypernym should not build a climate model, AV simulator, chemistry engine, and macroeconomic model from scratch. It should validate, compose, calibrate, and audit the transitions those systems produce, then train substrate-native proposers only where repeated validation failures show frontier or incumbent tools cannot generate the right candidate transitions.

11 · D.4 Gaps R7-R14 Missed

What the panel said we hadn't addressed.

First-class counterfactual primitive — what changes, what stays fixed, which branches are admissible
Temporal-causal substrate — decay, lag, persistence, reversible vs irreversible transitions, uncertainty over long horizons
Sensor-to-substrate ingestion — telemetry, lab measurements, LIDAR, climate stations, clinical time series, video
Multi-scale composition — molecule → cell → organ → organism → institution → planet (scale bridges)
Human targeting operator — converts expert intent into typed query + invariants + evidence preferences + refusal threshold
Simulation-scale compute economics — 10⁶ validation steps require caching, invariant precompilation, hierarchical validation, approximate checks with escalation, early stopping
Substrate governance — source decay, contradiction propagation, revocation, confidence aging, audit trails. If a paper is retracted, a sensor recalibrated, or a legal rule changes, descendants must be re-scored. Not compliance — simulation correctness.

12 · M18 Falsifier

The single benchmark VTC must pass.

The Frozen 1,000-Item VTC Benchmark

At month 18, Hypernym must pass a frozen 1,000-item VTC benchmark across biomedical adverse-event mechanisms · scientific/engineering causal transitions · legal/regulatory procedural claims. Each item must require multi-step reasoning, source grounding, at least one invariant check, confidence estimation, and refusal when underdetermined.

Pass thresholds

≥90% valid-transition accuracy · ≤2% unsupported load-bearing hallucination · confidence calibration error ≤5% · ≥25 percentage points absolute gain over frontier baseline with same evidence · cost per valid transition ≤50% of frontier+human-review baseline.

Kill switch

If at M18: valid-transition accuracy <85% · unsupported hallucination >3% · calibration error >8% · absolute gain <15 points → pause simulation-platform claims and revert to verification, citation, domain endpoints, memory, and context products until the primitive passes.

13 · Recommended Portfolio Mix

35 / 25 / 40.

35% · 90-day wedges

25% · 6-9mo wedges

40% · Hyperlab R&D bets

35% · 90-day wedges

Omnifact Verify / Trace-Grade
Context Pre-Router
Fine-Grained Citation
First Persistent Memory API

Share backend substrate components. Generate immediate validation traces.

25% · 6-9 month wedges

Domain-Precision Legal
False-Positive Eliminator
Substrate Audit Replay
Limited Modulum Inference SaaS (after IP filing)

Convert trace infrastructure into higher-ACV customers.

40% · Hyperlab R&D

20% · VTC stack + benchmark
10% · Substrate Lemma propagation
5% · substrate-native Modulum scaling
5% · Federation Protocol / CTT

Keeps γ flywheel alive without starving revenue.

The allocation is intentionally not 70% R&D even though VTC is the defining bet. Without customer traces, VTC becomes an elegant internal format. Without VTC R&D, the wedges become a loose API business. The portfolio forces the two sides to share substrate artifacts: every Track A product must emit reusable cells or cell-adjacent traces, and every Hyperlab milestone must improve at least one Track A metric.

14 · Cross-Stack Pricing

Per-tier packaging across the AI stack.

Tier	Product	Price	Key feature
Lab	VTC benchmark + domain validator	$2K-$20K/mo	batch validation, reproducibility, citations
Research	Scientific claim/citation API	$500-$5K/mo	corpus grounding + claim decomposition
Dev	Verify, context, memory SDK	free-$99/mo	one-key integration, small quotas
Startup	API bundle + dashboard	$500-$10K/mo	metered calls, router, memory, observability
Enterprise	Domain endpoint + audit replay	$60K-$500K/yr	SSO, RBAC, VPC, audit-of-record
Hyperscaler	M5 / Modulum / Federation licensing	$2M-$50M/yr	infrastructure integration, protocol participation
Government	Provenance + simulation pilots	$500K-$50M/yr	air-gapped, attestation, cleared support

15 · What NOT to Build

Eight wedges and bets to avoid.

Consumer chatbot. Weak substrate leverage. Brutal distribution economics. No durable moat.

Generic OpenRouter clone. Build routing only when it uses verification, context targeting, memory, or cost per verified transition.

Broad AI governance first. Governance is too vague. Verification, citation, trace grading, audit replay are concrete.

Standalone vector DB / generic RAG framework. The moat is typed substrate, not storage plumbing.

Train 70B+ generalist before α data exists. Revisit when VTC traces prove β has a unique corpus.

Climate as primary commercial wedge. Keep it Hyperlab/grant-funded until paying institutional buyer anchors.

Vertical agents without domain experts and launch customers. Revisit only when domain substrate, benchmark, refusal policy are owned.

Over-standardize Federation before substrate density exists. Revisit when 2+ customers need cross-boundary exchange.

16 · R16 Framing Recommendation

VTC + temporal/counterfactual closure.

R16 Pick — Synthesis

"Verified Transition Cell: temporal, counterfactual, and distribution closure."

Specify the minimal VTC implementation, SubstrateFork, SubstrateStream, ScaleBridge, TargetingLens, confidence propagation, and the M18 benchmark. Narrower than a broad strategy round, more directly falsifiable than a pure distribution-physics round. Links the top 90-day wedges to the company-defining bet: every verify, context, citation, memory, and audit call becomes a candidate VTC training datum.

Panel-surfaced candidates: temporal-substrate / counterfactual-fork (Gemma, Gemini, Qwen) · Substrate Distribution Physics (Claude, Qwen, Gemma) · deep-dive on top wedges (Codex secondary, Claude alt) · deep-dive on top bet / VTC (Codex primary, Grok, Claude alt). Synthesis: VTC + temporal/counterfactual closure subsumes all four into one falsifiable architecture round.