MLE and Bayesian: Which Is More “Correct”?

From Statistics to the Nature of Reality: A Philosophical Thread

Apr 22, 2026

This essay began with a question from my mathematical statistics course: which is more “correct” — MLE or Bayesian estimation? It seemed like a technical question, but in the text book the author mentioned that this are two parties in theoretical statistics which had been debating about this for a long time. This question hit me again when I was learning similar topics again in UC Berkeley’s DATA 100. And it turned out to be a thread that, once pulled, unravels into one of the deepest problems in the history of philosophy: where does knowledge come from, are the laws we observe real, and is the world we see actually the world itself?

I. The Starting Point: Two Ways to Estimate a Parameter

When you study mathematical statistics, you encounter two fundamentally different approaches to estimating unknown parameters. They often give similar answers, but their underlying logic could not be more different.

Maximum Likelihood Estimation (MLE)

Suppose you observe a dataset X = (x₁, x₂, …, xₙ), generated by some probability model with an unknown parameter θ. MLE asks:

Which value of θ makes the probability of observing this data as large as possible?

Formally, define the likelihood function:

\(\mathcal{L}(\theta) = P(X \mid \theta) = \prod_{i=1}^{n} P(x_i \mid \theta)\)

(This uses the assumption that samples are i.i.d.)

The MLE answer is the θ̂ that maximizes this function:

\(\hat{\theta}_{MLE} = \arg\max_\theta \ \mathcal{L}(\theta)\)

In practice, we take the log (a monotone transformation that doesn’t change the location of the maximum, but converts the product into a tractable sum):

\(\hat{\theta}_{MLE} = \arg\max_\theta \sum_{i=1}^{n} \log P(x_i \mid \theta)\)

MLE’s philosophical stance is clear: θ is a fixed, unknown constant that exists somewhere in the world. We use data to approximate it. We make no assumptions about θ beyond the data itself. The data is the only evidence that matters.

Bayesian Estimation

Bayesian inference starts from a completely different place. It says: before observing any data, you already have some belief about θ — expressed as a prior distribution P(θ). After observing data, you update that belief using Bayes’ theorem to get the posterior distribution:

\(P(\theta \mid X) = \frac{P(X \mid \theta) \cdot P(\theta)}{P(X)}\)

Written proportionally, this is more intuitive:

\(P(\theta \mid X) \propto P(X \mid \theta) \cdot P(\theta)\)

That is, the posterior (updated belief) is proportional to the product of the likelihood () and the prior (what you believed before).

Rather than a point estimate, Bayesian inference gives you an entire distribution over θ — reflecting your degree of belief in each possible value after seeing the data.

The philosophical stance here is that θ is a random variable, or more precisely, our knowledge of θ is inherently incomplete, and the prior distribution encodes that uncertainty. Knowledge is a combination of prior belief and evidence — not evidence alone.

The Core Difference

MLE (Frequentist) Bayesian Nature of θ Fixed constant Random variable / object of belief Role of prior Does not exist Must be explicitly specified Role of data The only evidence Updates the prior Output Point estimate θ̂ Posterior distribution P(θ|X) Philosophical root Empiricism Rationalism

The last row is the most important one. The disagreement between MLE and Bayesian inference is not merely a technical choice — it is the echo of a centuries-old philosophical debate about where knowledge comes from.

II. The First Philosophical Layer: Empiricism vs. Rationalism

Empiricism: All Knowledge Comes from Observation

MLE is rooted in the tradition of empiricism, whose major figures include Locke, Hume, and Mill. The central claim is:

The mind begins as a blank slate (tabula rasa). All knowledge derives from sensory experience and observation.

In statistical terms, this means: beyond the data in front of you, there is no reliable source of knowledge. Prior beliefs are subjective and potentially contaminating. MLE takes this to its logical extreme — let the data speak for itself, without injecting any external beliefs.

The strength of this position is objectivity and reproducibility: two researchers given the same data will arrive at the same MLE estimate, because there is no subjective prior to disagree about.

Rationalism: Reason Is Also a Source of Knowledge

Bayesian inference is closer in spirit to rationalism, associated with Descartes, Leibniz, and Kant. The central claim is:

Sensory experience alone is insufficient to build knowledge. Reason itself is a source of knowledge. Some knowledge exists prior to experience (a priori).

The prior distribution is the concrete realization of this idea in statistics. We know things before we look at the data — from past experience, domain expertise, or logical reasoning. Discarding this knowledge entirely, rationalists would argue, is wasteful. Bayesian inference systematically incorporates it.

When data is scarce, Bayesian methods can be strikingly more sensible. Flip a coin three times and get three heads; MLE tells you p = 1.0, a 100% probability of heads — clearly absurd. A Bayesian with a prior centered around a fair coin would give a much more reasonable estimate.

The Deeper Implication

This empiricism vs. rationalism debate has persisted for centuries without a definitive winner. Its projection into statistics — frequentism vs. Bayesianism — has been equally long-running and unresolved. But both positions face a more fundamental challenge, one raised by Hume himself.

III. The Second Layer: Hume’s Problem of Induction — A Crack in the Foundation

David Hume (1711–1776) was a Scottish philosopher and the most rigorous of the empiricists. He did something unusual: he pushed empiricism to its extreme, and in doing so, discovered that it undermines its own foundations.

What Is Induction?

Both science and statistics depend on induction: drawing general conclusions from a finite set of observations.

Observe 1,000 swans, all white → conclude “all swans are white”
Measure 1,000 samples → use MLE to estimate a population parameter
Run 100 physics experiments → formulate a physical law

Hume’s Challenge

Hume pointed out that induction has no logical foundation:

From a finite number of observations, it is logically impossible to derive a universal conclusion.

No matter how many white swans you have seen, you cannot prove the next swan will be white — its color is logically independent of every previous observation. More fundamentally, induction rests on the assumption of the Uniformity of Nature: “the future will resemble the past.” But this assumption can only be justified inductively (nature has always been uniform in the past), which is circular reasoning.

The farmer’s chicken from The Three-Body Problem makes this viscerally clear: the chicken observes, every day for a thousand days, that food arrives in the morning. It inductively derives a “law”: food arrives every morning. On day one thousand and one, the farmer arrives — and kills the chicken. The chicken was not stupid. It did the most rational thing available to it. Induction simply failed it.

The implication for statistics: MLE’s estimate θ̂ depends on the assumption that future data comes from the same distribution as past data. This assumption cannot be proven — it is a belief. Bayesian methods don’t resolve this problem either; the prior and the likelihood function are both assumptions made by the human modeler.

Hume’s conclusion was bleak: we cannot rationally justify induction. Our belief that the sun will rise tomorrow is a habit, not a logical necessity. This crack runs through the entire foundation of scientific knowledge, and it has never been fully repaired.

IV. The Third Layer: Popper’s Falsificationism — A Different Way to Understand Science

Faced with Hume’s challenge, 20th-century philosopher Karl Popper proposed a different solution. Rather than defend induction, he accepted its limitations and reframed how science actually works.

The Core Idea

Scientific theories do not earn their authority by being “proven true.” They earn it by surviving repeated attempts to prove them false.

A theory qualifies as scientific only if it satisfies falsifiability: there must exist some possible observation that could refute it.

“All swans are white” — falsifiable, because finding a black swan disproves it. This is a scientific claim.
“God exists” — not falsifiable, because any event can be interpreted as consistent with God’s will. This is not a scientific claim.
“Tomorrow it might rain or it might not” — not falsifiable, because either outcome is compatible. This is not a scientific claim.

Scientific progress is not the accumulation of proven truths — it is the elimination of false theories. We can never prove a theory is true, but we can prove it is false. Surviving theories are our best guesses so far.

The Statistical Counterpart

Popper’s thinking has a direct counterpart in statistics: hypothesis testing.

We don’t say “the data proves H₁ is true.” We say “the data provides sufficient evidence to reject H₀.” This is exactly the falsificationist spirit: science advances by rejecting false hypotheses, not by confirming true ones.

The logic of the p-value is Popperian: assuming H₀ is true, how unlikely is the observed data? If sufficiently unlikely, we treat H₀ as falsified.

The Hidden Vulnerability

Popper’s framework is elegant, but it carries an implicit assumption: physical laws are stable. An experiment today and an experiment tomorrow operate under the same rules — otherwise falsification becomes meaningless, since you can’t distinguish “the theory is wrong” from “the rules changed today.”

This is exactly what The Three-Body Problem breaks. The sophons interfere with particle accelerator experiments, producing inexplicable anomalies. Physicists face a horrifying situation: they cannot distinguish between “our theory has a flaw” and “someone is modifying the laws of physics.” Popper’s framework silently assumes the ground is solid beneath you. The Three-Body Problem removes the ground.

V. The Fourth Layer: The Sharpshooter Paradox — Laws Themselves May Be Illusions

Liu Cixin’s thought experiment in The Three-Body Problem pushes all the previous philosophical problems to their extreme.

The Sharpshooter and the Farmer

The Sharpshooter: A marksman of impossible precision shoots at a two-dimensional plane, producing bullet holes at perfectly regular intervals. The two-dimensional creatures living on this plane observe enough holes and derive a “law of the universe”:

At fixed intervals, there must always be a large hole.

This law holds perfectly within everything they can observe. There are no exceptions. But it is not the nature of the universe. It is simply a property of the sharpshooter’s aim — a structure imposed on their world from outside.

The Farmer’s Chicken (used in parallel): The chicken inductively derives “food arrives every morning.” It’s killed on day one thousand and one.

Both parables point to the same thing:

The regularities we observe may not be intrinsic properties of the world. They may be the product of some external condition. When that condition changes, the regularity vanishes.

The Context in The Three-Body Problem

In the novel, this thought experiment is a metaphor for the condition of human physics. When physicists realize their particle accelerator experiments are producing inexplicable results — that physical constants appear to be unstable — the horrifying implication is that someone (via sophons) is manipulating the foundational conditions of physical observation.

The line that shook readers — “physics no longer exists” — does not mean nature has no rules. It means the preconditions on which human physics is built (that laws are stable and observable) have been deliberately destroyed.

One Level Deeper Than Hume

Hume said: induction is unreliable — you can’t logically derive universal laws from finite observations.

The sharpshooter paradox says something more radical: even if induction were perfectly reliable, even if you had observed infinitely many regularities with no exceptions, the law could still be false — because it might be the projection of some external structure you cannot see, not an intrinsic property of reality.

The statistical implications are deep:

A model that achieves perfect fit on training data may collapse entirely when the data-generating process shifts — not because the model was poorly estimated, but because the underlying structure changed.
More fundamentally: you can never know whether your model captures the true data-generating process, or merely the shadow of some higher-level structure you cannot access.

VI. The Fifth Layer: Kant — Laws May Come from the Mind Itself

The sharpshooter paradox shows that laws can be imposed from outside. Immanuel Kant (1724–1804) proposed a symmetric possibility: laws may also be imposed from inside.

The Copernican Revolution in Philosophy

Kant described his work as a “Copernican revolution” in philosophy:

The old question was: “How does our knowledge conform to the laws of the world?”

Kant reversed it: “Does the world as it appears to us reflect, in large part, the structure of our own cognition?”

His answer: yes, and profoundly so.

A Priori Synthetic Knowledge

Kant distinguished two types of knowledge:

A posteriori knowledge: dependent on experience, e.g., “this swan is white”
A priori knowledge: independent of experience, e.g., “7 + 5 = 12”

He further argued that some knowledge is a priori and synthetic — independent of experience, yet genuinely informative about the world. The most important examples:

Causality: every event has a cause. This is not inductively derived from observation; it is the necessary precondition for understanding any experience at all.
Space and time: not objective properties of the world, but the necessary framework through which we perceive it. We cannot experience anything that is not structured in space and time.

In other words: the “orderly world” we perceive is partly because our cognition can only process ordered input. If there were phenomena with no pattern whatsoever — no causal structure, no spatial or temporal organization — our cognitive apparatus might be incapable of registering them at all.

The Statistical Counterpart

This idea maps directly onto something in statistics: model assumptions.

Before you run MLE or Bayesian inference, you choose a model — linear regression, logistic regression, Gaussian distribution... This choice is not dictated by the data. It is a cognitive framework you bring to the data.

Different frameworks reveal different patterns:

A linear model finds linear structure
A neural network finds highly non-linear structure
Both may fit the same dataset reasonably well

The patterns you find depend on the framework you bring. This is Kant’s insight translated into modern statistical practice.

Bayesian priors make this even more explicit: the prior directly shapes the posterior, and different priors extract different signals from identical data. You see what your framework allows you to see.

The Symmetry

These two ideas form a striking pair:

The sharpshooter: laws are imposed externally (the marksman’s aim → the “law” of the 2D world)
Kant: laws are imposed internally (our cognitive framework → the “ordered world” we perceive)

Both challenge the same assumption: do the regularities we observe actually belong to the world itself?

VII. The Sixth Layer: The Simulation Hypothesis — The Ultimate Sharpshooter

The sharpshooter paradox extends naturally to one of the most provocative thought experiments in contemporary philosophy.

Nick Bostrom’s Simulation Argument

Philosopher Nick Bostrom argued in 2003 that at least one of the following must be true:

Almost all civilizations go extinct before reaching the technological capacity to simulate conscious experience
Almost all civilizations that reach this capacity choose not to run such simulations
We are almost certainly living in a computer simulation

The logic: if (1) and (2) are both false, a sufficiently advanced civilization would run vast numbers of historical simulations. Simulated universes would vastly outnumber real ones. By probability, we are almost certainly simulated.

The Connection to the Sharpshooter

The simulation hypothesis is the sharpshooter paradox taken to its limit:

Sharpshooter: physical laws are a projection of the marksman’s aim; the marksman exists outside the world
Simulation hypothesis: physical laws are code written by a programmer; we live inside the program

If true, “physical constants” are hard-coded parameters, and “natural laws” are program logic. They were not discovered — they were designed.

The Echo in The Three-Body Problem

The sophon interference in The Three-Body Problem resonates here: the sophons don’t merely observe or deceive — they actively alter the apparent behavior of particles at the most fundamental level. This is not the full simulation hypothesis, but it reveals the same underlying point: the stability of physical law is not necessary; it is conditional.

\btw I am always afraid of imagining that our world is designed by some higher dimensional or more intelligent life forms. And I wander if there is any philosopher who has ever thought about this question before. What should we do if so? This is a question for all future humans.

The Statistical Implication

If we inhabit a simulation, the data-generating process can be arbitrarily modified. Any inductive inference is inference about a program that can be rewritten at any time. This is not merely a distribution shift problem — it is fundamental: the object of inference may have no stable essence.

In practice, we don’t need to take the simulation hypothesis seriously to do good statistics. But as a thought experiment, it forces us to see the full depth of the induction problem.

VIII. The Final Layer: Scientific Realism vs. Instrumentalism

All of these threads converge on a single foundational question.

Two Positions

Scientific Realism:

Scientific theories describe things that genuinely exist. Electrons are not merely useful fictions — they are real. Physical laws do not merely predict outcomes — they reflect the actual structure of the world.

In statistics, the realist position is: there exists a true data-generating process, a true parameter θ*, and our models are approximations of it. MLE is estimating something real.

Instrumentalism / Anti-Realism:

Scientific theories are predictive tools, nothing more. There is no need to assume they correspond to anything that actually exists. “Electrons” are a concept that allows accurate predictions — whether they “really exist” is a meaningless question.

In statistics, the instrumentalist position is: models are tools, not truths. We don’t need to believe the normal distribution “really” describes how data is generated — we only need it to perform well enough on the task at hand. All models are wrong, but some are useful (George Box).

In Modern Machine Learning

This debate becomes particularly sharp in contemporary machine learning.

A deep neural network can achieve superhuman accuracy in image recognition, language translation, and protein structure prediction. Yet no one truly understands what it has “learned.” Its internal representations are almost entirely opaque to human interpretation.

The realist says: the network must have learned something about the true structure of reality; we simply don’t yet understand what.
The instrumentalist says: it works, and that’s sufficient. There is no need to assume it has learned anything “real.”

The MLE/Bayesian split maps onto this:

MLE carries a realist tendency: it assumes θ* exists, and the estimate converges to it with enough data.
Bayesian inference is closer to instrumentalism: the posterior distribution expresses a state of belief, not a claim about the ontological status of the parameter. Bayesian prediction is a direct probability statement about future data — it doesn’t require committing to what the parameter “really is.”

Occam’s Razor

Occam’s Razor belongs here:

Entities should not be multiplied beyond necessity. Among competing hypotheses that explain the same observations, prefer the simplest.

Occam’s Razor is a mediating principle between realism and instrumentalism. It doesn’t say complex ontological commitments are wrong — it says: without additional evidence, don’t introduce unnecessary complexity.

In statistics, this is regularization: Ridge Regression’s ‖θ‖² penalty, Lasso’s sparsity preference — all saying “prefer simpler parameters.” Occam’s Razor, mathematized.

But the sharpshooter paradox marks the limits of this principle: the true world is not guaranteed to be simple. Sometimes the simplest explanation is exactly wrong, because you don’t know whether you’re the chicken who has observed a thousand mornings.

IX. Back to the Beginning: Which Is More “Correct”?

After this long journey, we return to the original question.

The answer is: there is no technical answer, because the question is ultimately a philosophical choice.

Choosing MLE means accepting:

Parameters have objective true values; data is the only means of approaching them
An empiricist epistemology: knowledge comes from observation
No assumptions beyond the data (a form of Occam’s Razor)
But also: the full weight of the induction problem — reliable with sufficient data, potentially absurd with little

Choosing Bayesian inference means accepting:

Uncertainty is intrinsic to knowledge; probability distributions express it properly
A rationalist epistemology: prior knowledge is legitimate knowledge
Priors are subjective — different people will reach different conclusions
But also: a philosophically more honest position — we are always looking at the world through a framework

Conclusion: Are We Discovering Laws, or Constructing Stories?

This thread began with a statistics question and passed through empiricism and rationalism, Hume’s induction problem, Popper’s falsificationism, the sharpshooter paradox, Kant’s cognitive frameworks, the simulation hypothesis, and the debate between scientific realism and instrumentalism.

These are not separate problems. They are different faces of the same core puzzle:

When we observe the world, identify patterns, and build models — are we uncovering an objective truth that exists independently of us, or constructing a story that explains our observations?

This question may be impossible to resolve from the inside. Just as the two-dimensional creatures cannot determine from within their plane whether the regularity of the bullet holes reflects the nature of their universe or the precision of a marksman outside it — we cannot, from within our own cognitive frameworks, determine whether those frameworks are reliable.

But this is not a reason for despair. Hume never proved that induction was reliable, and science advanced for centuries anyway, achieving results of extraordinary power and precision. Popper showed us that science’s strength lies not in its ability to discover eternal truths, but in its mechanism for eliminating errors.

Perhaps knowledge is like this: not the possession of truth, but the continuous correction of mistakes.

In statistics, this translates into practical wisdom: MLE and Bayesian inference are both tools, each appropriate in different regimes — different sample sizes, different prior reliability, different task requirements. What matters is not which method is more “correct” in the abstract, but understanding precisely what assumptions each method makes, and when those assumptions might be wrong.

That is, perhaps, the most important lesson that statistics has to teach — not any particular technique, but a habit of mind: every method rests on assumptions, every assumption embeds a worldview, and every worldview is worth questioning.

Floating Point

Discussion about this post

Ready for more?