← All tags

Posts tagged #ml-research

The Five-Minute Daily Drift Check

May 10, 2026

Solo research programs drift one small exception at a time. Three shell commands, run daily, catch the most common protocol violations before they compound into reproducibility failures.

#ml-research #methodology #research-operations #tooling
Theorem-Screened Experiments

May 9, 2026

A three-step decision rule for running fewer, better experiments: check your theorem library before you touch a GPU. Calibration checks, falsifier traps, and parameter compression from the physicist mode of ML research.

#ml-research #methodology #lean4 #experimental-design
Naming What Fails: The Obstacle Taxonomy

May 8, 2026

25+ preregistered kills over six weeks of compression research. The tempting story is "compression is hard." The physicist story is better: 25 kills, ~10 structural failure patterns, one Lean theorem per class.

#ml-research #methodology #lean4 #compression #negative-results
Two Research Modes, and Why the Second One Needs Lean 4

May 7, 2026

AI makes hypothesis generation cheap. Evaluation stays expensive. Lean 4 proofs are the filter that changes the economics: a proved theorem screens an entire family of candidates before GPU time is allocated.

#ml-research #methodology #research-style #lean4
Zero-Sorry Discipline: What a Lean 4 Appendix Actually Costs

May 6, 2026

Two theorems in this paper — MoEGauge and JensenFloor — had to reach zero sorries before the paper shipped. What that process looks like, why sorry is dangerous, and what JensenFloor actually says.

#ml-research #lean4 #formal-verification #methodology
What Experimental Design Actually Means

May 5, 2026

Theoretical physicists barely need it. Experimental physicists cannot live without it. Life sciences rewrote it for complexity. Pharma made it law. ML borrows the wrong one.

#ml-research #methodology #experimental-design #statistics
A Survey as a Living Document

May 3, 2026

What it means to maintain a formal proof corpus that stays in sync with its own coverage badge, and what happened when RWKV was added as a new architectural family without invalidating existing theorems.

#ml-research #lean4 #formal-verification #RWKV #methodology
How to Honestly Test if a Neural Network Can Be Compressed

Apr 28, 2026

Pre-registration, trap cells, τ-hardened baselines, and kill-fast protocols: a field methodology for compression research that tries to kill its own ideas. With actual results from OLMoE-1B-7B.

#ml-research #compression #methodology #quantization #MoE
Hypothesis Testing from Scratch, and Its Bayesian Analogue

Apr 27, 2026

Frequentist hypothesis testing rebuilt from first principles for ML researchers who half-remember p-values. Then: the Bayesian reframe, why it fits the kill-ladder better, and what each one actually buys you.

#ml-research #methodology #statistics #bayesian #hypothesis-testing
A Catalogue of Symmetries Compression Must Respect

Apr 26, 2026

Compression schemes regularly violate algebraic invariants of weight structure—producing models that pass perplexity checks but fail downstream. Here are the five core symmetry types a formally verified survey is collecting.

#ml-research #compression #symmetry #quantization #RoPE
Part E Pivot: FFN Rotation and the Narrow-d Falsification

Apr 25, 2026

After the KV-cache gauge, the obvious next move was applying β-lift to FFN weights. We tested it. It failed. Here is what the RAdam convergence probe and the 1-bit generation test actually showed.

#ml-research #quantization #MoE #compression #MLP
The Microsite as Interactive Publication

Apr 24, 2026

Building a GitHub Pages research microsite with d3 widgets, a Lean 4 theorem status page, and a reproducibility shim. One gotcha with Jekyll and markdown-inside-divs. One real answer to whether live widgets are worth the effort.

#ml-research #d3 #jekyll #lean4 #reproducibility #visualization
Phase-Collapse Defragmentation: Why MoE KV-Cache Resists 1-bit Quantization

Apr 23, 2026

Attention head activations in Mixture-of-Experts models cluster around expert routing patterns. Quantizing the KV-cache destroys this signal. The MoEGauge framework builds provable bounds on exactly how much.

#ml-research #quantization #MoE #compression #KV-cache
Adversarial Passes That Killed Claims

Apr 22, 2026

Two hypotheses that started as clean ideas and ended as documented failures: the DPO-CLM orthogonal-complement hypothesis, and the cross-probe srank claim retracted as a length-bias artifact.

#ml-research #lora #dpo #falsification #methodology #negative-results
Pre-Registration for Solo ML Researchers

Apr 20, 2026

How to borrow the clinical trial discipline of writing down what "pass" looks like before running the experiment — and why a SHA256 hash is the cheapest honesty enforcement mechanism available.

#ml-research #methodology #pre-registration #experimentation
The Manuscript-as-Codebase Pattern

Apr 19, 2026

Hierarchical Makefiles, data-driven macro generation, and paper-scoped .gitignore: how treating a research paper like a software project caught hardcoded inconsistencies before reviewers did.

#ml-research #reproducibility #latex #makefile #workflow
Stable Rank as an Overfitting Signature in LoRA Fine-Tuning

Apr 18, 2026

Why we picked stable rank to detect overfitting geometry in DPO vs CLM fine-tuning, how it connects to "alignment geometry," and what the BitFit baseline was there to check.

#ml-research #lora #fine-tuning #dpo #geometry #stable-rank