Attention is not proportional to effort.

What it does is obvious. How it does it is opaque.

Humans and LLMs can both be seen as nonlinear function approximators. That does not make them the same thing. It means the interesting question moves from what is the function? to what hidden machinery realizes it?

Follow the thread
Human attention

Attention as lived allocation

A human does not merely weight inputs. A human organism allocates scarce perception, memory, emotion, action, and concern across a world that matters to it.

LLM attention

Attention as learned routing

An LLM computes relationships among tokens and routes representational influence through a trained network. The word is the same; the substrate is not.

At the level of function approximation

response = f(context, state, history, constraints)

The outside view is clean: inputs become outputs. The inside view is not clean at all. The computation is distributed, high-dimensional, and resistant to simple translation into ordinary explanation.

The seductive mistake

Because both humans and LLMs transform context into response, it is tempting to collapse them into the same category. But abstraction cuts both ways: it reveals a shared shape while hiding the thing that makes the difference.

1.Effort increases attention, but not linearly.
2.Attention can saturate, distort, or collapse under strain.
3.LLM attention is a mathematical operation, not a subject caring about the world.
4.Humans are embodied adaptive control systems with stakes.
5.The functional description can be obvious while the mechanism remains opaque.

The black box is not empty. It is too full.

We can observe behavior, measure activations, perturb parts, and build partial interpretations. But the full route from prompt to answer, perception to action, or effort to attention is not naturally written in human-readable steps.

A small change in input can produce a different path through the hidden space.

So the point is not that humans are machines, or that machines are humans. The point is sharper: once a system is powerful enough, the obviousness of its function does not make its mechanism transparent. We know what it is for. We do not fully know how the learned interior does what it does.