Confidential · Hypernym Labs · R15

Ship α validation now. Train β from α traces later. The unit between them is the Verified Transition Cell.

R15 is a strategic pivot from R7-R14's depth-first architecture rounds to a balanced view: 90-day product wedges that ship to one buyer who hands you money, plus 18-36 month zero-to-one R&D bets, plus moat analysis cross-cutting both. The 6-model panel converged 6/6 on the structural primitive that ties wedges to bets — every Track A product emits trace data that becomes Track D's training corpus.

The primitive is the Verified Transition Cell — a typed unit of valid state change carrying preconditions, postconditions, invariants, confidence, and cryptographic provenance. Frontier models propose; Hypernym validates; committed VTCs become composable units for long-horizon simulation. This is what Hypernym ships once R14's Substrate Lemma propagation engine has substrate density to compound on.

6/6
Panel agreement on the structural primitive — same shape, six different names.
10
Track A wedges ranked by speed-to-customer × moat strength.
7
Track B R&D bets — VTC #1, Substrate Lemma #2.
γ
α-then-β recommendation — ship validation now, train substrate-native model from α traces later.
Panel-Convergent Synthesis · 6/6
The missing primitive is not a larger model, not a latent world model, and not a simulator kernel. It is a typed transition object that can be proposed by any model or simulator but accepted only if substrate evidence, invariants, confidence, and provenance all close. Hypernym validates; everyone else proposes. That separation is the moat.
01 · Six Models · One Object

The 6-model panel.

Top-tier from each provider. NDA-bound. First-principles only. Six models reasoning independently arrived at the same structural object with six different names — same convergence pattern as R14's Substrate Lemma.

Codex · Synthesis pick
Verified Transition Cell
117 KB · Form A
Claude
Grounded Rollout
91 KB · Form B
Gemini-3.1
Grounded Simulation Step
20 KB · Form A
Grok-4.20
Simulation Kernel
17 KB · Form B
Qwen3-Q8
Substrate-Driven Sim Kernel
20 KB · Form A
Gemma-4 (MLX)
Causal Substrate Frame
17 KB · Form A
02 · What 6/6 Agreed On

The convergence is on sequencing, not the wedge.

The strongest 6/6 agreement
Codex, Claude, Gemini, Grok, Qwen, and Gemma all rejected a pure foundation-model-first strategy and converged on γ: ship substrate validation over existing models now, use the resulting traces and failures as the substrate-native training corpus later. Existing frontier models and scientific simulators are useful proposers, but weak validators of their own transitions.

Track A consensus: 6/6 included a verifier or trace-grading product. 6/6 included context compression / pre-routing. 6/6 included Modulum Router or inference SaaS. 5/6 included persistent memory. 5/6 included domain endpoints. 5/6 included false-positive elimination. 4/6 included Forge OS Solo or IDE plugin. The shared ranking logic was speed to first customer, substrate data generated, and whether the product can be bought by one developer or one team without a network.

Moat consensus: 6/6 honest that Modulum Router and generic inference are revenue wedges, not durable moats unless M5 cost-quality claims remain protected and measurable. 5/6 treated Omnifact verification as medium-to-strong because the generic API is copyable but substrate provenance and domain substrate are harder. 6/6 placed the deepest moat in VTC, Substrate Lemma propagation, Federation Protocol, and domain substrate density.

R16 split: Codex pushed VTC; Claude pushed Substrate Distribution Physics; Gemini and Qwen leaned temporal/counterfactual; Grok pushed world-model continuation; Gemma pushed temporal substrate + counterfactual fork. Synthesis: R16 is "VTC + temporal/counterfactual closure" — subsumes all surfaced candidates into one falsifiable architecture round.

03 · Track A — Wedges

Ten ranked wedges. One-buyer = full value.

Wedges where one customer hands you money in 2-16 weeks. Ranked by panel-converged speed-to-customer × moat strength. Each scored on the data corpus it generates for VTC training (the γ data flywheel).

#WedgeRevenueTTCBuildMoatPanel
1Omnifact Verify / Trace-Grade APIper-call + dashboard2-8 wk4-8 ew4 / strong6/6
2Context Pre-Router / Pre-Compressorper-token saved3-10 wk4-10 ew4 / strong6/6
3Persistent Memory APIper-seat + storage4-10 wk6-12 ew3 / medium5/6
4Domain-Precision Endpoint (legal · biomed · finance)per-token + ent. min6-16 wk8-16 ew/dom4 / strong6/6
5Fine-Grained Citation APIper-call / per-doc6-12 wk6-12 ew3 / medium4/6
6False-Positive Eliminatorper-call / outcome4-16 wk5-18 ew4 / strong5/6
7Modulum Inference SaaSper-token6-20 wk10-24 ew2 / weak6/6
8Modulum Routerper-token margin3-18 wk6-20 ew2 / weak6/6
9Substrate Audit Replayenterprise license12-24 wk10-24 ew4 / strong3/6
10Forge OS Solo / IDE Magicper-seat6-20 wk12-18 ew3 / medium5/6

The decision rule: ship weak-moat wedges only when they feed a strong-moat asset. A router that merely routes tokens is weak; a router that learns cost per verified transition across domains is useful. A memory API that stores summaries is weak; a memory API that builds grounded, invalidatable project substrate is useful. A citation API that attaches links is weak; a citation API that decomposes claims into reusable provenance cells is useful.

04 · Track B — R&D Bets

Seven ranked moonshots. 18-36 month horizon.

#BetMagnitudeFeasibilityMoatPanel
1Verified Transition Cell simulation stackcategory-defining → civilizationalmediumvery high6/6
2Substrate Lemma propagation engine (R14 carry)category-definingmedium-highvery high6/6
3Substrate-native Modulum scaling (β)category-definingmediumhigh4/6
4Hallucination-zero domain systemscategory-defining in verticalsmedium-highhigh5/6
5Federation Protocol cryptographic specstandard-settingmediumvery high if adopted4/6
6Targeted scientific instrumentscategory-definingmediumhigh5/6
7Composition Type Theoryfoundationalmedium-lowhigh but indirect4/6

VTC ranks #1 because it's the missing unit that makes "simulation-grade targeted intelligence" operational. If it works, Hypernym becomes a transition-validation layer across frontier models, simulators, and later substrate-native models. Substrate Lemma stays #2 as core but coordination-heavy — its moat improves only after Track A produces substrate density. Substrate-native Modulum (β) sits at #3 — fund it, don't overfund before α products create VTC traces and failure taxonomies.

05 · Track C — Moat Analysis

Four moat archetypes. Be honest about which you have.

Durable structural moats require: proprietary substrate that improves with use, patentable transition/validation mechanics, regulatory or audit embedding, or standard/protocol adoption. Execution moats are not enough in AI infrastructure — well-funded competitors copy API shape quickly.

Avoid confusing patentability with monopoly. M5-conditioned routing, substrate validation, and transition cells may be filable, but patents do not replace proof. The commercial moat appears only when customers believe the measured output: lower false positives without higher false negatives, fewer unsupported claims at the same task completion rate, shorter context without recall loss, longer simulations without calibration drift. Every wedge ships with a benchmark harness as part of the product, not as a research afterthought.

The strongest compound loop: verify claims → log failures → classify failure modes → convert repeated failures into substrate lemmas → use lemmas to improve future verification → distill the resulting VTC corpus into substrate-native Modulum. Products that don't contribute to this loop are cash extraction or distribution experiments, not core strategy.

06 · Track D · The Primitive

Verified Transition Cell. Like the relation, the process, the commit.

Synthesis Pick

Verified Transition Cell (VTC) — Codex's name, panel-merged structure

All six panel models converged on the same structural object — a typed, provenance-bearing unit of valid state change. Six different names point at the same primitive. The synthesis pick is Verified Transition Cell: "verified" captures the validator role that differentiates Hypernym from frontier model rollouts, "transition" captures the core temporal/causal unit, "cell" captures atomicity and composition. "Kernel" overstates execution; "rollout" overstates sequence; "step" is too close to Grounded Step; "frame" is too static.

07 · Structural Fields

The VTC schema — merged from all 6 panel proposals.

VerifiedTransitionCell { id: CellID target_question: TypedQuestion domain: DomainType state_before: TypedWorldState state_after: TypedWorldState | TypedCounterfactualState action: Action | Intervention | Observation | Policy | MechanisticTransition transition_type: temporal | causal | counterfactual | observational | mechanistic | policy preconditions: Predicate[] postconditions: Predicate[] invariants: { hard: Invariant[] soft: Invariant[] scale_bridge: ScaleInvariant[] } substrate_inputs: { pds_refs: PDSRef[] grounded_steps: GroundedStepRef[] substrate_lemmas: SubstrateLemmaRef[] sensor_refs: SensorRef[] simulator_outputs: SimulatorOutputRef[] model_outputs: ModelOutputRef[] human_annotations: ExpertAnnotationRef[] } confidence: { score: Float[0,1] interval: ConfidenceInterval calibration_class: CalibrationClass provenance: ConfidenceProvenance irreducible_uncertainty: UncertaintyReport } validation: { verdict: valid | invalid | contradicted | underdetermined | refused failure_mode: ProposalError | SubstrateGap | InvariantViolation | CalibrationFailure | SensorConflict | None refusal_reason: String? } commitment: { content_hash: Hash parent_hashes: Hash[] signature: AttestationSignature audit_replay_ref: ReplayRef } composition: { parents: CellID[] children: CellID[] branch_id: BranchID timeline_id: TimelineID scale_level: ScaleLevel compatibility_rules: Rule[] } }

Three load-bearing fields: failure_mode prevents the α-to-β corpus from becoming noisy (a bad proposal, stale evidence, missing substrate, and invariant bug should not train the same correction). scale_level prevents molecular claims from composing directly into organism-level claims without a declared bridge. commitment makes simulation replay auditable; without a hash chain, VTC becomes another unverifiable trace format.

Validity invariants: state_before and state_after must be typed under the same domain schema or a declared ScaleBridge. Hard invariants cannot be violated. Preconditions must be satisfied before application. Postconditions must be checkable or explicitly marked unobserved. Every accepted VTC must have a hash commitment over inputs, outputs, invariants, and validation trace. Composition is legal only when the first cell's state_after satisfies the second cell's preconditions and no invariant contradictions cross the boundary.

08 · Closed Algebra

Six operations. All return a VTC, a VTC trace, or a typed refusal.

compose(a, b)
Returns ordered trace if a.state_after satisfies b.preconditions; else invalid trace + violated condition.
branch(vtc, intervention)
Counterfactual cells sharing parent state, with declared changed-vs-held-fixed variables.
merge(branches)
Combines compatible branches only when states reconcile under invariants and confidence does not hide contradictions. Defaults to refusal.
revert(vtc)
Returns prior committed state and invalidates descendants if reverted cell was load-bearing.
attest(vtc)
Signs cell, source commitments, and validation verdict for audit replay.
query(trace, target)
Extracts supported, contradicted, underdetermined, or refused claims from a VTC trace.

Closure matters: if any operation returns untyped narrative, long-horizon simulation collapses back into hallucination. merge and revert need strict treatment in R16 — incompatible branches can look semantically compatible while hiding contradicted assumptions; substrate updates will invalidate earlier cells and a simulation platform that can't revoke descendants accumulates stale certainty.

09 · α / β / γ Recommendation

The 6/6 panel consensus is γ.

α

Validation layer over existing models

Hypernym sits as substrate-validation pre/post processor around GPT, Claude, Gemini, Llama, video-prediction, climate, multi-physics. Per-call API. Capital-light. Cross-stack. Hyperscaler-neutral. Insufficient alone — can't invent missing hypotheses, can't fix wrong latent ontology, can't recover unobserved causal variables.

β

Train fresh substrate-native foundation model

Atom-1.4B → 8B → 30-70B class, trained natively on substrate-typed scaffolds with M5 attention-mask conditioning baked in. Hypernym owns inference plane. Capital-heavy + hyperscaler-friction. β-first trains before Hypernym owns the right corpus.

γ

α now → β trained from α traces ← PICK

Ship α products immediately. Log every verified, rejected, contradicted, refused transition as training data. Start β at narrow scales. Scale β only when it beats frontier-plus-validation on cost per valid transition, calibration error, refusal correctness, long-horizon trace integrity.

The recommended β target is not a general frontier model. It's a substrate-native proposer/validator optimized for domains where Hypernym owns substrate density. Atom-scale handles extraction, routing, compression, local validation. 8B-class powers domain endpoints and memory. 30-70B-class becomes relevant only after VTC traces prove enough signal for long-horizon reasoning. Trillion-scale training is not required if the company remains targeted rather than general.

10 · Simulation Products Enabled

Ten products that become possible only with VTC at the center.

Drug Adverse Event Simulator
Validates compound→pathway→physiology transitions. First customer: top-20 pharma or Osmium. Per-study license.
Climate Tipping-Point Confidence Engine
Validates existing climate-model transitions and assumptions. First customer: climate institution or government. Grant + enterprise license.
AV Counterfactual Safety Simulator
Validates counterfactuals for sensor failure, braking delay, occlusion, pedestrian motion. First customer: AV lab or regulator. Per-simulation or certification license.
Economic Policy Simulation Auditor
Audits structural claims in policy simulations rather than predicting the whole economy. First customer: treasury, central bank, think tank. Per-scenario license.
Bio-Defense Provenance Simulator
Cryptographically traces outbreak and intervention simulations. First customer: public-health or defense agency. Government contract.
Protein Edge-Case Validator
Checks proposed structures against invariants and known failure classes. First customer: protein design lab. Per-run.
Novel Chemistry Pathway Validator
Validates reaction plausibility and evidence. First customer: materials or chemistry lab. Per-pathway.
Emergency Response Simulator
Combines sensor streams, infrastructure, weather, policy actions. First customer: city, FEMA-like agency, insurer. Per-region deployment.
Long-Horizon Agent Simulation
Converts multi-day agent plans into checked transition traces. First customer: AI lab or enterprise agent team. Per-run infrastructure SaaS.
Multi-Physics Consistency Layer
Audits coupled solver outputs across thermal/fluid/structural/EM. First customer: aerospace or energy. Enterprise license.

The common product pattern is not "replace the domain simulator" — it's "wrap proposers with transition validity." That distinction keeps scope sane. Hypernym should not build a climate model, AV simulator, chemistry engine, and macroeconomic model from scratch. It should validate, compose, calibrate, and audit the transitions those systems produce, then train substrate-native proposers only where repeated validation failures show frontier or incumbent tools cannot generate the right candidate transitions.

11 · D.4 Gaps R7-R14 Missed

What the panel said we hadn't addressed.

12 · M18 Falsifier

The single benchmark VTC must pass.

The Frozen 1,000-Item VTC Benchmark

At month 18, Hypernym must pass a frozen 1,000-item VTC benchmark across biomedical adverse-event mechanisms · scientific/engineering causal transitions · legal/regulatory procedural claims. Each item must require multi-step reasoning, source grounding, at least one invariant check, confidence estimation, and refusal when underdetermined.

Pass thresholds

90% valid-transition accuracy · ≤2% unsupported load-bearing hallucination · confidence calibration error ≤5% · ≥25 percentage points absolute gain over frontier baseline with same evidence · cost per valid transition ≤50% of frontier+human-review baseline.

Kill switch

If at M18: valid-transition accuracy <85% · unsupported hallucination >3% · calibration error >8% · absolute gain <15 points → pause simulation-platform claims and revert to verification, citation, domain endpoints, memory, and context products until the primitive passes.

13 · Recommended Portfolio Mix

35 / 25 / 40.

35% · 90-day wedges
25% · 6-9mo wedges
40% · Hyperlab R&D bets
35% · 90-day wedges
  • Omnifact Verify / Trace-Grade
  • Context Pre-Router
  • Fine-Grained Citation
  • First Persistent Memory API

Share backend substrate components. Generate immediate validation traces.

25% · 6-9 month wedges
  • Domain-Precision Legal
  • False-Positive Eliminator
  • Substrate Audit Replay
  • Limited Modulum Inference SaaS (after IP filing)

Convert trace infrastructure into higher-ACV customers.

40% · Hyperlab R&D
  • 20% · VTC stack + benchmark
  • 10% · Substrate Lemma propagation
  • 5% · substrate-native Modulum scaling
  • 5% · Federation Protocol / CTT

Keeps γ flywheel alive without starving revenue.

The allocation is intentionally not 70% R&D even though VTC is the defining bet. Without customer traces, VTC becomes an elegant internal format. Without VTC R&D, the wedges become a loose API business. The portfolio forces the two sides to share substrate artifacts: every Track A product must emit reusable cells or cell-adjacent traces, and every Hyperlab milestone must improve at least one Track A metric.

14 · Cross-Stack Pricing

Per-tier packaging across the AI stack.

TierProductPriceKey feature
LabVTC benchmark + domain validator$2K-$20K/mobatch validation, reproducibility, citations
ResearchScientific claim/citation API$500-$5K/mocorpus grounding + claim decomposition
DevVerify, context, memory SDKfree-$99/moone-key integration, small quotas
StartupAPI bundle + dashboard$500-$10K/mometered calls, router, memory, observability
EnterpriseDomain endpoint + audit replay$60K-$500K/yrSSO, RBAC, VPC, audit-of-record
HyperscalerM5 / Modulum / Federation licensing$2M-$50M/yrinfrastructure integration, protocol participation
GovernmentProvenance + simulation pilots$500K-$50M/yrair-gapped, attestation, cleared support
15 · What NOT to Build

Eight wedges and bets to avoid.

Consumer chatbot. Weak substrate leverage. Brutal distribution economics. No durable moat.
Generic OpenRouter clone. Build routing only when it uses verification, context targeting, memory, or cost per verified transition.
Broad AI governance first. Governance is too vague. Verification, citation, trace grading, audit replay are concrete.
Standalone vector DB / generic RAG framework. The moat is typed substrate, not storage plumbing.
Train 70B+ generalist before α data exists. Revisit when VTC traces prove β has a unique corpus.
Climate as primary commercial wedge. Keep it Hyperlab/grant-funded until paying institutional buyer anchors.
Vertical agents without domain experts and launch customers. Revisit only when domain substrate, benchmark, refusal policy are owned.
Over-standardize Federation before substrate density exists. Revisit when 2+ customers need cross-boundary exchange.
16 · R16 Framing Recommendation

VTC + temporal/counterfactual closure.

R16 Pick — Synthesis

"Verified Transition Cell: temporal, counterfactual, and distribution closure."

Specify the minimal VTC implementation, SubstrateFork, SubstrateStream, ScaleBridge, TargetingLens, confidence propagation, and the M18 benchmark. Narrower than a broad strategy round, more directly falsifiable than a pure distribution-physics round. Links the top 90-day wedges to the company-defining bet: every verify, context, citation, memory, and audit call becomes a candidate VTC training datum.

Panel-surfaced candidates: temporal-substrate / counterfactual-fork (Gemma, Gemini, Qwen) · Substrate Distribution Physics (Claude, Qwen, Gemma) · deep-dive on top wedges (Codex secondary, Claude alt) · deep-dive on top bet / VTC (Codex primary, Grok, Claude alt). Synthesis: VTC + temporal/counterfactual closure subsumes all four into one falsifiable architecture round.