Last time, I talked about Leif Weatherby’s fantastic Language Machines (for my initial synopsis and thoughts on the book, see here) and his identification of a Kantian problematic behind what he calls the syntax view of language, which is prominently associated with Chomsky. Although Chomsky called his book Cartesian Linguistics, Weatherby thinks the better reference is to Kant. I think this makes a lot of sense, and it helps (this was the trajectory last time) to understand why structuralist, post-structuralist and Wittgensteinian work seems to have real traction when applied to language models.
Here I want to step back a little and note part of what motivates the Kantian account, because I think it shows the political stakes of Kantianism. On a standard epistemological reading, Kant was awakened from his dogmatic slumber by Humean empiricism. Causality demands necessity and empiricism can’t get you there (see B123-4). I have no quarrel with the epistemological reading, but it’s worth noting that the language of the First Critique also is full of juridical terminology. For example, we need to “institute a tribunal which will assure to reason its lawful claims, and dismiss all groundless pretensions, not by despotic decrees, but in accordance with its own eternal and unalterable laws” (A xiii). As David Lachterman showed, this kind of language is all over the First Critique and is critical to the project of disciplining reason. In starting the Deduction, Kant distinguishes a question of right from a question of fact and applies the distinction to our use of the categories:
“Since experience is always available for the proof of their objective reality, we believe ourselves, even without a deduction, to be justified in appropriating to them a meaning, an ascribed significance” (B116-17).
But without a deduction, we have no right to use them. This matters because there’s other ways of organizing our perceptions, and he wants to rule those out. Hence, he cites fate and fortune as usurping strategies that need to be combatted. The general problem is that both explain anything and everything, but neither explains anything specifically. Why did x happen? “It was fate” works, no matter what your x is.
As Dieter Henrich argues, Kant is relying on a very specific legal tradition here, one that attempts to determine the legitimacy of a legal claim based on its origin. To answer this quaestio juris, “one has to focus exclusively upon those aspects of the acquisition of an allegedly rightful possession by virtue of which a right has been bestowed, such that the possession has become a property” (36). Kant thought that physiological or psychological accounts for the use of the categories were inadequate, so we had to show that they could be tethered to a legal origin. The situation is sort of like a will: we may not know why someone is leaving their estate to their cats or their dog, but we can determine if the transfer is legally valid and so binding by looking at its legal origin. Looking at the prehistory of when the person sat around and cooked up the idea is not only not something that is going to yield a satisfactory answer, it’s also not legally dispositive. One might also consider a property claim: that I may be in possession of something doesn’t establish that I am in possession of it by right.
What the strategy does epistemologically is pivot between concepts of necessity. Rejecting the determinist notion of natural necessity as unknowable, and also rejecting Epicurean randomness, Kant selects necessity that we create, on the model of law or legislation. It seems to me that his use of this model indicates a few things:
(1) The artifact of Kant’s critique has a politics, if not as famously as a bridge. That politics, as Lachterman argues, has to do with “the suppression, the excapacitation of theoretical desire as desire or eros and its replacement by practical needs and interests” (182). Philosophy must learn to police itself by showing the illegitimacy of theoretical reason and its inevitable lapse into transcendental illusion (as Lachterman puts the distinction between Verstand and Vernunft, “the former achieves this [relevance to sensible experience] only in concert with sensible intuition, the latter purports to achieve it without having recourse to such intuition …. Understanding stands at the base of solid science; reason, in this account, floats idly above the clouds of ‘dialectical illusion’” (191)). But, and this is the relevant point here, philosophy also “eliminates its immediately political and threatening character by showing itself to be essentially cosmopolitical” (182).
It seems to me that there is room to pursue the politics of language models in terms of Kant’s cosmopolitanism. The models themselves are much more Humean in character in that they dutifully encode whatever customary language use would indicate. But all of the apparatus designed to make them present as palatable interlocutors are examples of the cosmopolitan, Kantian impulse at work. The Kantian juridical apparatus is designed to intervene in precisely the way that techniques like RLHF are designed to intervene: to restore referentiality, a relation between the output of the model and the world it purports to describe. And like in Kant, there is a hard limit to what this sort of work can do: the model has no idea what the Ding an sich is, because it has no access to it. RLHF and other techniques attempt to mediate this problem, by regulating both the model’s internal representations along acceptable lines, and by working to reassure us that it is neither political nor threatening: that we have a “right” to them, that they successfully navigate away from confabulation, sycophancy, and so forth.
(2) Another way to explore the Kant connection would be through Kant’s anthropology – not necessarily for its own sake, but for how someone like Foucault connects it to humanism (“subject”) versus the better attempt to think about “structure.” Again, the connection to structuralism and what Weatherby calls an unfortunate “remainder humanism” seems like a good place so start; in Foucault’s case, it would mean going to his works of the mid-late 1960s especially, and the ways he talks about structuralism. Recall that he vociferously denies being a structuralist. He also denies that structuralism is a unified thing. As he replies to Sartre in 1968:
“When you ask those who are classified under the rubric of ‘structuralism’— like Lévi-Strauss, Lacan, Althusser and the linguists, etc.— they answer that they have nothing in common with one another, or very little in common. Structuralism is a category that exists for others, for those who are not structuralists. It’s from the outside that one can sav that so and so are structuralists. You must ask Sartre who the structuralists are, since he thinks that Levi-Strauss, Althusser, Dumézil, Lacan and me constitute a coherent group, a group constituting some kind of unity that we ourselves don’t perceive” (Foucault Live, 53).
More generally, I think that Weatherby opens up a really interesting space to think about language models in the context of non-linguistic structuralisms.
(3) Hume is an under-explored resource here. I’m not actually getting that idea from Hume – I’m getting it from Deleuze’s Hume book (which I haven’t read farther into than the part I’m about to cite; this is very much a provisional thought). Deleuze suggests the following of Hume:
“Hume constantly affirms the identity between the mind, the imagination, and ideas. The mind is not nature, nor does it have a nature. It is identical with the ideas in the mind. Ideas are given, as given; they are experience. The mind, on the other hand, is given as a collection of ideas and not as a system. It follows that our earlier question can be expressed as follows: how does a collection become a system? The collection of ideas is called “imagination,” insofar as the collection designates not a faculty but rather an assemblage of things, in the most vague sense of the term: things are as they appear-a collection without an album, a play without a stage, a flux of perceptions. ‘The comparison of the theatre must not mislead us; nor have we the most distant notion of the place, where these scenes are represented, or of the materials, of which it is compos’d.’ The place is not different from what takes place in it; the representation does not take place in a subject. Then again the question may be: how does the mind become a subject? How does the imagination become a faculty?” (Empiricism and Subjectivity, 22-3; the reference is to Hume’s Treatise, p. 253 in the Clarendon ed.)
(4) The Kantian strategy seems profoundly question-begging when applied to language: why do we have to have an abstract right to use grammatical concepts? Derrida seems to me to be onto the problem when he says of Hegel’s account of language in “The Pit and the Pyramid:”
“Now calculation, the machine, and mute writing belong to the same system of equivalences, and their work poses the same problem: at the moment when meaning is lost, when thought is opposed to its other, when spirit is absent from itself, is the result of the operation certain” (Margins of Philosophy, 107)?
Absent the glue of spirit and judgment, we have no certainty:
“what Hegel … could never think is a machine that would work. That would work without, to this extent, being governed by an order of reappropriation [une machine qui fonctionnerait. Qui fonctionnerait sans être en cela réglée par un ordre de réappropriation]. Such a functioning would be unthinkable in that it inscribes within itself an effect of pure loss” (Margins, 107; Marges 126)).
Language models give us a machine that works well enough. And they do it without consciousness or mind or Kantian rules. But that said, they seem to need the “cosmopolitan” aspect of the Kantian approach – there’s no acceptable language model that doesn’t build in normativity.

Leave a comment