The Dual-Emit Paper Pattern: Data-Driven Manuscripts
Ensuring that your paper’s text matches your interactive plots is a constant struggle in ML research. You update a benchmark, the number changes in the JSON, but you forget to update the \newcommand in your LaTeX source.
In the lean-mining project, we’ve solved this with the Dual-Emit Paper Pattern.
The Concept: Single-Source Truth
The “Dual-Emit” pattern ensures that every number in our manuscript is a direct export from the experiment pipeline. We generate both LaTeX macros and microsite JSON simultaneously.
flowchart TD
E["Experiment Runner"] --> R["Result Reporter"]
R --> L["values.tex (LaTeX Macros)"]
R --> J["site_data.json (Microsite JSON)"]
L --> P["PDF Manuscript"]
J --> W["Interactive Web App"]
P -.->|"1:1 Parity"| W
When a benchmark run completes, our reporter emits two files:
values.tex: For the paper. e.g.,\newcommand{\MoEBetaRange}{0.92--0.97}.site_data.json: For the interactive charts. e.g.,{"beta_range": [0.92, 0.97]}.
1:1 Parity by Design
This ensures that the “0.92” mentioned in the abstract is mathematically identical to the “0.92” in the interactive hover-state of the chart.
Intuition: By eliminating manual copy-pasting, we’ve reduced the surface area for “drift” in our research reports to zero. The manuscript isn’t a description of the research; it’s a view of the data.
Connectivity to the Microsite
As we discussed in the microsite-as-publication post, the future of research is interactive. By using the Dual-Emit pattern, we can build “live” papers that allow readers to explore the full distribution of results while maintaining the rigor of a static LaTeX manuscript.
This is the engineering foundation for the Verified Neural Compilation track. If we’re going to claim a proof is machine-checked, the numbers supporting it should be machine-generated.
Next: Verified Security Gates, our final engineering milestone.
Next in this series: Verified Security Gates: Safe ML Deserialization