and What Scientists Do Anyway
Introduction
This follows on from my earlier essay “What Counts as Science — What’s the point of demarcation, and when does it mislead”¹, but shifts the focus. Instead of arguing where the science boundary sits, I’m looking at what sits under our claims when we use induction, drawing on Alan Chalmers’ What Is This Thing Called Science? (Chs 1–7).
A common picture is that scientific theories are “derived from the facts.” But the support isn’t that direct. Facts aren’t the only substrate. Think of a bed: it’s not just legs; it has slats, springs, a mattress, blankets—layers that stabilise and shape what the bed can carry. And the mattress is important: it gives under weight, it conforms to our shape, and it dampens some movements while amplifying others. The blankets matter too: they insulate, they filter, and they change what gets in from the outside as well as what leaks out.
Likewise scientific observations are filtered and stabilised by instruments, definitions, and the field’s rules for what counts as evidence. That doesn’t make science arbitrary—it makes the premise set engineered, not pristine. By “engineered” I mean: observations are produced through instruments, calibrations, categories, and agreed measurement rules that make them shareable. That scaffolding improves reliability, but it also filters what can be seen and how it is described—so the premise set is constructed for public use, not simply handed to us “as is.” This is why “derive” can mislead: it suggests that, given the facts, a theory can be proven as a logical consequence. Chalmers argues that this strong claim cannot be substantiated. (Chalmers, p. 38)
Thesis
Science is always at risk of blind spots if it assumes that careful method can eliminate every external influence and make the premises perfectly clean. Science manages the risk by tightening claims, exposing them to failure, and revising theories—and sometimes revising what counts as relevant evidence. But it cannot ignore the mattress it rests on: the background assumptions and definitions that shape what gets counted as a fact in the first place.
Note: This isn’t just a philosophy of science problem. It’s also the shape of Machine Learning and automation risk: models generalise from limited evidence, and “worked so far” can hide scope limits and measurement gaps. The practical response is lifecycle assurance—explicit scope, failure conditions, monitoring, and rollback.
Argument
1. What logic cannot do
Logic tells us what follows from what. It can’t certify that the starting premises are true. (Chalmers, pp. 39–40) So a deductively valid argument can still mislead if it begins from poorly grounded assumptions.
That said, deduction plays a vital function in science because it’s truth-preserving (but only conditional on true premises): if the premises are true, whatever you derive from them will be true as well. (Chalmers, p. 40) The weak link is the “if”—and logic alone can’t strengthen it.
Induction is different. It moves from some observed cases to claims about all cases of a kind, so it goes beyond what is contained in the premises. (Chalmers, p. 42) That is why general laws can’t be proven by deduction from observations. The best you can do is treat them as testable bets.
2. Inductive risks
Inductive generalisations are vulnerable in at least four ways:
(a) Scope risk: observations may be too narrow in range or type to justify a broad “all” claim.
(b) Measurement-limit risk: later observations (especially with improved methods) may reveal features or effects that earlier observations could not detect.
(c ) Underdetermination risk (the same evidence can fit more than one theory/model): scientific laws outrun the finite evidence available, so they can’t be proven by deduction from that evidence. (Chalmers, p. 42)
(d) Assumption risk (the mattress): background assumptions shape what gets counted as an observation, and what counts as decisive evidence. If those assumptions are left implicit, evidence can drift toward self-confirmation: a practical circularity, where the test starts to assume what it is meant to test. (Chalmers, p. 34)
A classic case is resistance to Copernicus: one argument held that if the Earth orbited the sun, the moon would be left behind. That “impossibility” only looks decisive inside a background picture of motion and what can be carried along. Galileo’s telescopic observation of moons around Jupiter undermined the objection, because even Copernicus’ opponents agreed that Jupiter moves—and yet it carries its moons. The remaining dispute was then about whether the telescopic observation could be objectified and made shareable. (Chalmers, p. 21) This is the mattress problem in action: what counted as a decisive objection depended on the background picture of motion.
Induction is indispensable, but these risks show why it can’t deliver deductive proof. Three examples:
• Scope: Even if many metals expand when heated in the tested range of conditions, it still doesn’t follow logically that all metals expand when heated in every circumstance. Proper inductive practice therefore earns the right to generalise by naming the conditions—it pauses before the unqualified “all.” That leap is what makes induction useful, and what stops it being a guarantee. (Chalmers, pp. 41–42)
• Measurement limits: Hertz’s experiments on cathode rays concluded that they were not beams of charged particles. But with improved experimental conditions, Thomson was able to detect deflections Hertz had not observed: “Thomson was able to establish the deflections that Hertz had declared to be non-existent.” (Chalmers, p. 30)
• Background assumptions (and new information): Galileo’s observation of Jupiter’s moons introduced new, shareable evidence that undercut a key anti-Copernican objection. It mattered because it forced a revision in what opponents could treat as “impossible” within their background picture of motion—showing that a moving body can carry its moons. (Chalmers, p. 21; p. 34)
3. Corrections: what scientists do anyway
Corrections are the practical moves that stop induction from pretending to be proof. They do two things: they raise confidence in a prediction, and they set its limits. That second part matters more than people admit. A prediction isn’t just what a claim says will happen; it’s also how far the claim is entitled to reach.
Confidence improves when the evidence base is widened and toughened: larger samples, more varied conditions, replication, and clearer rules about what counts as a valid observation. But science also needs a way to stress-test claims. Popper’s falsification pressure captures this: speculative theories should be exposed to observation and experiment in ways that could show them wrong, and theories that fail those tests should be dropped or replaced. (Chalmers, p. 56)
Practical translation: tighten the claim, name the scope, state what would count as failure, and keep a revision path open (a “rollback” in engineering terms).
4. The Markov blanket as a correction device
Here I’m using “Markov blanket” loosely—borrowing the phrase from statistics and machine learning—as a practical metaphor for a rule scientists already use: be explicit about what your prediction is allowed to depend on. In practice, that means drawing a relevance boundary between what you will treat as signal (inputs you will model) and what you will treat as noise unless it enters through a specified, testable channel. I’m not invoking any broader “free energy” framework or claiming a deep ontological boundary here—just a discipline for scope and accountability. For continuity with the earlier essay, think in terms of two disciplines:
• Input discipline: thin walls (less blankets) — don’t reject anomalies just because they arrive in odd frameworks.
• Output discipline: thick walls (more blankets) — claims only earn standing through shared testing and correction.
Here the metaphor does double duty: it constrains what counts as relevant input, and it supports correction when prediction fails.
The key point is that this boundary doesn’t sit directly on raw facts—it sits on a mattress of background assumptions, definitions, and measurement choices. Those choices shape what variables we think to include, what counts as a valid observation, and what gets treated as admissible evidence. The boundary is engineered, not given.
A local weather forecast is a good analogy. The forecast is bounded: it leans on regional pressure systems, ocean temperatures, and nearby dynamics. For Melbourne, that’s usually Bass Strait / Southern Ocean systems—not the headline forecast for Chicago. It usually doesn’t help to import the day-to-day forecast from another continent. If distant conditions do matter, they matter through a named mechanism—some channel you can model—so that channel becomes part of what you treat as relevant.
That’s the point: a blanket doesn’t deny long-range influence; it refuses influence-by-vibes. It says: “show me the mechanism.” Or, at minimum, show me a stable, testable dependency and the limits where it breaks.
When prediction fails, the failure is diagnostic. Either:
• you missed a relevant variable/channel,
• you mis-specified the dependency, or
• your measurements were poor.
Rule of thumb: If failure persists across replications, inspect the mattress—definitions and background commitments that decide what counts as measurable or relevant.
This matters because a test plan can look “complete” while quietly becoming self-confirming. If your definitions, instruments, and success criteria are chosen so that the expected pattern is effectively built in, the system can pass every check—yet the checks are only showing that reality has been filtered to fit the theory. The result is tick-box correctness without genuine contact with the failure mode you care about. In those cases, more testing of the same kind won’t help. You need to revisit what counts as an observation, what counts as replication, and what would count as decisive failure.
So error becomes a guide to revision, not a reason to throw up your hands. And sometimes the correction goes deeper than adding a variable: repeated failure can force you to revisit the mattress itself—your definitions, instrument assumptions, or how you’ve carved the system up in the first place. Crucially, the boundary isn’t fixed: as methods improve, what counts as a relevant channel can change too.
Best objection
If induction can’t be justified, isn’t science irrational—just habit dressed up as method?
Reply
No. Science is rational because it is built to catch itself out. It trades proof for publicly contestable tests, criticism, replication, and correction—so fallible generalisations stay exposed to error and can improve over time.
So what
Induction shouldn’t pretend to deliver certainty. It supports probabilistic, scope-bounded claims. The point of scientific correction isn’t a final justification; it’s a discipline: tighten the claim, name the scope, test it hard, and revise both the theory and what counts as relevant evidence when reality pushes back.
Conclusion
Induction can’t prove laws from facts, but science can’t do without it because science must generalise to predict and explain. The best response isn’t to pretend induction yields certainty, but to discipline it: make claims precise, expose them to tests that can fail, and revise when they do. The Markov blanket metaphor captures the practical stance—constrain relevance, treat error as guidance, and update the boundary as methods improve. But the boundary doesn’t float: it rests on a mattress of background assumptions and definitions, and those need periodic inspection too.
Note
- Ross Anderson, “What Counts as Science — What’s the point of demarcation, and when does it mislead?”, After Certainty, https://aftercertainty.net/index.php/what-counts-as-science.