Ship α validation now. Train β from α traces later. The unit between them is the Verified Transition Cell.
R15 is a strategic pivot from R7-R14's depth-first architecture rounds to a balanced view: 90-day product wedges that ship to one buyer who hands you money, plus 18-36 month zero-to-one R&D bets, plus moat analysis cross-cutting both. The 6-model panel converged 6/6 on the structural primitive that ties wedges to bets — every Track A product emits trace data that becomes Track D's training corpus.
The primitive is the Verified Transition Cell — a typed unit of valid state change carrying preconditions, postconditions, invariants, confidence, and cryptographic provenance. Frontier models propose; Hypernym validates; committed VTCs become composable units for long-horizon simulation. This is what Hypernym ships once R14's Substrate Lemma propagation engine has substrate density to compound on.
The missing primitive is not a larger model, not a latent world model, and not a simulator kernel. It is a typed transition object that can be proposed by any model or simulator but accepted only if substrate evidence, invariants, confidence, and provenance all close. Hypernym validates; everyone else proposes. That separation is the moat.
The 6-model panel.
Top-tier from each provider. NDA-bound. First-principles only. Six models reasoning independently arrived at the same structural object with six different names — same convergence pattern as R14's Substrate Lemma.
The convergence is on sequencing, not the wedge.
Codex, Claude, Gemini, Grok, Qwen, and Gemma all rejected a pure foundation-model-first strategy and converged on γ: ship substrate validation over existing models now, use the resulting traces and failures as the substrate-native training corpus later. Existing frontier models and scientific simulators are useful proposers, but weak validators of their own transitions.
Track A consensus: 6/6 included a verifier or trace-grading product. 6/6 included context compression / pre-routing. 6/6 included Modulum Router or inference SaaS. 5/6 included persistent memory. 5/6 included domain endpoints. 5/6 included false-positive elimination. 4/6 included Forge OS Solo or IDE plugin. The shared ranking logic was speed to first customer, substrate data generated, and whether the product can be bought by one developer or one team without a network.
Moat consensus: 6/6 honest that Modulum Router and generic inference are revenue wedges, not durable moats unless M5 cost-quality claims remain protected and measurable. 5/6 treated Omnifact verification as medium-to-strong because the generic API is copyable but substrate provenance and domain substrate are harder. 6/6 placed the deepest moat in VTC, Substrate Lemma propagation, Federation Protocol, and domain substrate density.
R16 split: Codex pushed VTC; Claude pushed Substrate Distribution Physics; Gemini and Qwen leaned temporal/counterfactual; Grok pushed world-model continuation; Gemma pushed temporal substrate + counterfactual fork. Synthesis: R16 is "VTC + temporal/counterfactual closure" — subsumes all surfaced candidates into one falsifiable architecture round.
Ten ranked wedges. One-buyer = full value.
Wedges where one customer hands you money in 2-16 weeks. Ranked by panel-converged speed-to-customer × moat strength. Each scored on the data corpus it generates for VTC training (the γ data flywheel).
| # | Wedge | Revenue | TTC | Build | Moat | Panel |
|---|---|---|---|---|---|---|
| 1 | Omnifact Verify / Trace-Grade API | per-call + dashboard | 2-8 wk | 4-8 ew | 4 / strong | 6/6 |
| 2 | Context Pre-Router / Pre-Compressor | per-token saved | 3-10 wk | 4-10 ew | 4 / strong | 6/6 |
| 3 | Persistent Memory API | per-seat + storage | 4-10 wk | 6-12 ew | 3 / medium | 5/6 |
| 4 | Domain-Precision Endpoint (legal · biomed · finance) | per-token + ent. min | 6-16 wk | 8-16 ew/dom | 4 / strong | 6/6 |
| 5 | Fine-Grained Citation API | per-call / per-doc | 6-12 wk | 6-12 ew | 3 / medium | 4/6 |
| 6 | False-Positive Eliminator | per-call / outcome | 4-16 wk | 5-18 ew | 4 / strong | 5/6 |
| 7 | Modulum Inference SaaS | per-token | 6-20 wk | 10-24 ew | 2 / weak | 6/6 |
| 8 | Modulum Router | per-token margin | 3-18 wk | 6-20 ew | 2 / weak | 6/6 |
| 9 | Substrate Audit Replay | enterprise license | 12-24 wk | 10-24 ew | 4 / strong | 3/6 |
| 10 | Forge OS Solo / IDE Magic | per-seat | 6-20 wk | 12-18 ew | 3 / medium | 5/6 |
The decision rule: ship weak-moat wedges only when they feed a strong-moat asset. A router that merely routes tokens is weak; a router that learns cost per verified transition across domains is useful. A memory API that stores summaries is weak; a memory API that builds grounded, invalidatable project substrate is useful. A citation API that attaches links is weak; a citation API that decomposes claims into reusable provenance cells is useful.
Seven ranked moonshots. 18-36 month horizon.
| # | Bet | Magnitude | Feasibility | Moat | Panel |
|---|---|---|---|---|---|
| 1 | Verified Transition Cell simulation stack | category-defining → civilizational | medium | very high | 6/6 |
| 2 | Substrate Lemma propagation engine (R14 carry) | category-defining | medium-high | very high | 6/6 |
| 3 | Substrate-native Modulum scaling (β) | category-defining | medium | high | 4/6 |
| 4 | Hallucination-zero domain systems | category-defining in verticals | medium-high | high | 5/6 |
| 5 | Federation Protocol cryptographic spec | standard-setting | medium | very high if adopted | 4/6 |
| 6 | Targeted scientific instruments | category-defining | medium | high | 5/6 |
| 7 | Composition Type Theory | foundational | medium-low | high but indirect | 4/6 |
VTC ranks #1 because it's the missing unit that makes "simulation-grade targeted intelligence" operational. If it works, Hypernym becomes a transition-validation layer across frontier models, simulators, and later substrate-native models. Substrate Lemma stays #2 as core but coordination-heavy — its moat improves only after Track A produces substrate density. Substrate-native Modulum (β) sits at #3 — fund it, don't overfund before α products create VTC traces and failure taxonomies.
Four moat archetypes. Be honest about which you have.
Durable structural moats require: proprietary substrate that improves with use, patentable transition/validation mechanics, regulatory or audit embedding, or standard/protocol adoption. Execution moats are not enough in AI infrastructure — well-funded competitors copy API shape quickly.
- Strong structural moats: VTC simulation stack · Substrate Lemma propagation · Federation Protocol after adoption · domain substrate in legal/biomed · Substrate Audit Replay · false-positive elimination in audited workflows. These create switching cost through accumulated substrate, replay obligations, calibration history, or cross-system protocol dependency.
- Medium moats: Omnifact Verify · Trace-Grading · Context Pre-Router · Fine-Grained Citation · Persistent Memory · Domain-Precision Endpoint before deep customer data. These become strong only if they feed a proprietary corpus of claims, traces, refusals, and validated transitions. Without that corpus, they're features.
- Weak fundamentals: Modulum Router · generic inference SaaS · generic IDE plugin · prompt safety gate · broad AI governance suite. Worth shipping if they generate cash, substrate data, distribution, or benchmarks. Kill them if they become support-heavy commodity software.
Avoid confusing patentability with monopoly. M5-conditioned routing, substrate validation, and transition cells may be filable, but patents do not replace proof. The commercial moat appears only when customers believe the measured output: lower false positives without higher false negatives, fewer unsupported claims at the same task completion rate, shorter context without recall loss, longer simulations without calibration drift. Every wedge ships with a benchmark harness as part of the product, not as a research afterthought.
The strongest compound loop: verify claims → log failures → classify failure modes → convert repeated failures into substrate lemmas → use lemmas to improve future verification → distill the resulting VTC corpus into substrate-native Modulum. Products that don't contribute to this loop are cash extraction or distribution experiments, not core strategy.
Verified Transition Cell. Like the relation, the process, the commit.
Verified Transition Cell (VTC) — Codex's name, panel-merged structure
All six panel models converged on the same structural object — a typed, provenance-bearing unit of valid state change. Six different names point at the same primitive. The synthesis pick is Verified Transition Cell: "verified" captures the validator role that differentiates Hypernym from frontier model rollouts, "transition" captures the core temporal/causal unit, "cell" captures atomicity and composition. "Kernel" overstates execution; "rollout" overstates sequence; "step" is too close to Grounded Step; "frame" is too static.
The VTC schema — merged from all 6 panel proposals.
Three load-bearing fields: failure_mode prevents the α-to-β corpus from becoming noisy (a bad proposal, stale evidence, missing substrate, and invariant bug should not train the same correction). scale_level prevents molecular claims from composing directly into organism-level claims without a declared bridge. commitment makes simulation replay auditable; without a hash chain, VTC becomes another unverifiable trace format.
Validity invariants: state_before and state_after must be typed under the same domain schema or a declared ScaleBridge. Hard invariants cannot be violated. Preconditions must be satisfied before application. Postconditions must be checkable or explicitly marked unobserved. Every accepted VTC must have a hash commitment over inputs, outputs, invariants, and validation trace. Composition is legal only when the first cell's state_after satisfies the second cell's preconditions and no invariant contradictions cross the boundary.
Six operations. All return a VTC, a VTC trace, or a typed refusal.
a.state_after satisfies b.preconditions; else invalid trace + violated condition.Closure matters: if any operation returns untyped narrative, long-horizon simulation collapses back into hallucination. merge and revert need strict treatment in R16 — incompatible branches can look semantically compatible while hiding contradicted assumptions; substrate updates will invalidate earlier cells and a simulation platform that can't revoke descendants accumulates stale certainty.
The 6/6 panel consensus is γ.
Validation layer over existing models
Hypernym sits as substrate-validation pre/post processor around GPT, Claude, Gemini, Llama, video-prediction, climate, multi-physics. Per-call API. Capital-light. Cross-stack. Hyperscaler-neutral. Insufficient alone — can't invent missing hypotheses, can't fix wrong latent ontology, can't recover unobserved causal variables.
Train fresh substrate-native foundation model
Atom-1.4B → 8B → 30-70B class, trained natively on substrate-typed scaffolds with M5 attention-mask conditioning baked in. Hypernym owns inference plane. Capital-heavy + hyperscaler-friction. β-first trains before Hypernym owns the right corpus.
α now → β trained from α traces ← PICK
Ship α products immediately. Log every verified, rejected, contradicted, refused transition as training data. Start β at narrow scales. Scale β only when it beats frontier-plus-validation on cost per valid transition, calibration error, refusal correctness, long-horizon trace integrity.
The recommended β target is not a general frontier model. It's a substrate-native proposer/validator optimized for domains where Hypernym owns substrate density. Atom-scale handles extraction, routing, compression, local validation. 8B-class powers domain endpoints and memory. 30-70B-class becomes relevant only after VTC traces prove enough signal for long-horizon reasoning. Trillion-scale training is not required if the company remains targeted rather than general.
Ten products that become possible only with VTC at the center.
The common product pattern is not "replace the domain simulator" — it's "wrap proposers with transition validity." That distinction keeps scope sane. Hypernym should not build a climate model, AV simulator, chemistry engine, and macroeconomic model from scratch. It should validate, compose, calibrate, and audit the transitions those systems produce, then train substrate-native proposers only where repeated validation failures show frontier or incumbent tools cannot generate the right candidate transitions.
What the panel said we hadn't addressed.
- First-class counterfactual primitive — what changes, what stays fixed, which branches are admissible
- Temporal-causal substrate — decay, lag, persistence, reversible vs irreversible transitions, uncertainty over long horizons
- Sensor-to-substrate ingestion — telemetry, lab measurements, LIDAR, climate stations, clinical time series, video
- Multi-scale composition — molecule → cell → organ → organism → institution → planet (scale bridges)
- Human targeting operator — converts expert intent into typed query + invariants + evidence preferences + refusal threshold
- Simulation-scale compute economics — 10⁶ validation steps require caching, invariant precompilation, hierarchical validation, approximate checks with escalation, early stopping
- Substrate governance — source decay, contradiction propagation, revocation, confidence aging, audit trails. If a paper is retracted, a sensor recalibrated, or a legal rule changes, descendants must be re-scored. Not compliance — simulation correctness.
The single benchmark VTC must pass.
The Frozen 1,000-Item VTC Benchmark
At month 18, Hypernym must pass a frozen 1,000-item VTC benchmark across biomedical adverse-event mechanisms · scientific/engineering causal transitions · legal/regulatory procedural claims. Each item must require multi-step reasoning, source grounding, at least one invariant check, confidence estimation, and refusal when underdetermined.
Pass thresholds
≥90% valid-transition accuracy · ≤2% unsupported load-bearing hallucination · confidence calibration error ≤5% · ≥25 percentage points absolute gain over frontier baseline with same evidence · cost per valid transition ≤50% of frontier+human-review baseline.
Kill switch
If at M18: valid-transition accuracy <85% · unsupported hallucination >3% · calibration error >8% · absolute gain <15 points → pause simulation-platform claims and revert to verification, citation, domain endpoints, memory, and context products until the primitive passes.
35 / 25 / 40.
35% · 90-day wedges
- Omnifact Verify / Trace-Grade
- Context Pre-Router
- Fine-Grained Citation
- First Persistent Memory API
Share backend substrate components. Generate immediate validation traces.
25% · 6-9 month wedges
- Domain-Precision Legal
- False-Positive Eliminator
- Substrate Audit Replay
- Limited Modulum Inference SaaS (after IP filing)
Convert trace infrastructure into higher-ACV customers.
40% · Hyperlab R&D
- 20% · VTC stack + benchmark
- 10% · Substrate Lemma propagation
- 5% · substrate-native Modulum scaling
- 5% · Federation Protocol / CTT
Keeps γ flywheel alive without starving revenue.
The allocation is intentionally not 70% R&D even though VTC is the defining bet. Without customer traces, VTC becomes an elegant internal format. Without VTC R&D, the wedges become a loose API business. The portfolio forces the two sides to share substrate artifacts: every Track A product must emit reusable cells or cell-adjacent traces, and every Hyperlab milestone must improve at least one Track A metric.
Per-tier packaging across the AI stack.
| Tier | Product | Price | Key feature |
|---|---|---|---|
| Lab | VTC benchmark + domain validator | $2K-$20K/mo | batch validation, reproducibility, citations |
| Research | Scientific claim/citation API | $500-$5K/mo | corpus grounding + claim decomposition |
| Dev | Verify, context, memory SDK | free-$99/mo | one-key integration, small quotas |
| Startup | API bundle + dashboard | $500-$10K/mo | metered calls, router, memory, observability |
| Enterprise | Domain endpoint + audit replay | $60K-$500K/yr | SSO, RBAC, VPC, audit-of-record |
| Hyperscaler | M5 / Modulum / Federation licensing | $2M-$50M/yr | infrastructure integration, protocol participation |
| Government | Provenance + simulation pilots | $500K-$50M/yr | air-gapped, attestation, cleared support |
Eight wedges and bets to avoid.
VTC + temporal/counterfactual closure.
"Verified Transition Cell: temporal, counterfactual, and distribution closure."
Specify the minimal VTC implementation, SubstrateFork, SubstrateStream, ScaleBridge, TargetingLens, confidence propagation, and the M18 benchmark. Narrower than a broad strategy round, more directly falsifiable than a pure distribution-physics round. Links the top 90-day wedges to the company-defining bet: every verify, context, citation, memory, and audit call becomes a candidate VTC training datum.
Panel-surfaced candidates: temporal-substrate / counterfactual-fork (Gemma, Gemini, Qwen) · Substrate Distribution Physics (Claude, Qwen, Gemma) · deep-dive on top wedges (Codex secondary, Claude alt) · deep-dive on top bet / VTC (Codex primary, Grok, Claude alt). Synthesis: VTC + temporal/counterfactual closure subsumes all four into one falsifiable architecture round.