In Language Machines (see here), Leif Weatherby argues that what he calls the “syntax” view of language, which is most closely associated with Chomsky, is better viewed as a Kantian system than a Cartesian one:

“Syntax, universal grammar, principles and parameters, and the more recent ‘minimalist program’ with its key idea of ‘merge’ – all these are attempts to isolate and formalize the ability to use language as a distinctively human operation shared neither by animals nor by machines. For this reason, I think that his linguistics is more Kantian than Cartesian. Chomskyan linguistics is the search for the categories of a transcendental logic as it exists extensively, to find the rules that we impose on sound or paper …. The search for the rules of that knowledge in the empirical order is futile, Kant argued, and Chomsky’s argument against statistics ha its analog here, not in Descartes or in Humboldt” (46-7).

Chomsky’s aversion to empiricism (in this Kantian sense) is “at the cost of defining” language “not as actually spoken languages but as the formal production unit – in the brain or some computational formalism – that achieves the fit between knowing and saying, the internal and external aspects of the linguistic act” (51).  On the Chomskyan argument, it is not possible to bootstrap from semantics to syntax; the cost is explaining “how the deep structure of syntax actually imposes form on specific languages, like English or Lao” (51).

The situation is analogous to Kant, who faces “his own linking problem” for which the categories and schema are the solution.  The categories, which outline the structural features of all possible experience require further specification in the form of the schema: roughly, the categories are applied to experience in general, rendering it comprehensible; the schemata then apply concepts to specific experiences.  As Kant says, “the categories… without schemata, are merely functions of the understanding for concepts; and represent no object” (B187).  The understanding comes from the top, and imposes the categories, and the schemata come from the bottom – from the imagination – and provide something like a generalized image that allows one to link a specific image to the concept.

Kant says that “this representation of a universal procedure of imagination in providing an image for a concept, I entitle the schema of this concept” (B179-80; readers will suspect that this is a resurrection of the medieval Aristotelian sensus communis, for exactly the same purpose).  We know that but not how; as he continues, “this much only we can assert: the image is a product of the empirical faculty of reproductive imagination; the schema of sensible concepts, such as of figures in space, is a product and, as it were, a monogram, of pure a priori imagination, through which, and in accordance with which, images themselves become possible” (B181). One of Kant’s explanations involves the concept “dog:”

“The concept ‘dog’ signifies a rule according to which my imagination can delineate the figure of a four-footed animal in a general manner, without limitation to any single determinate figure such as experience, or any possible image that I can represent in concreto, actually presents” (B180).

Inquiring minds will want to know how this happens; they will also be disappointed: it is an “art concealed in the depths of the human soul, whose real modes of activity nature is hardly likely ever to allow us to discover, and to have open to our gaze” (B180-1).  Weatherby quips that “it is not much of an exaggeration to say that no one after Kant has ever been happy with this section of the Critique of Pure Reason” (52).  That sounds about right.

This retreat to “this is just the way things are” is also present in the categories.  After the transcendental deduction, Kant fields the question: why these categories, and not some others?  Inquiring minds will be disappointed again:

“This peculiarity of our understanding, that it can produce a priori unity of apperception solely be means of the categories, and only by such and so many, is as little capable of further explanation as why we have just these and no other functions of judgment, or why space and time are the only forms of our sensible intuition” (B145-6).

It seems to me that the Kantian provenance of these problems is important in part because it underscores the kinds of philosophical work that we could expect to have traction in understanding LLMs.  Weatherby argues that a lot of what he calls the “ladder of reference” theory of language, which views language as primarily referential, dates to the ascendance of logical positivism and figures such as Carnap. It is this view of language that Weatherby takes to be fundamentally refuted by the success of LLMs.  After all, they produce language and they aren’t referential (this is the vector grounding problem). 

It is perhaps thus not too surprising that the theoretical work that seems to be most fruitful in understanding LLMs is itself based on an effort to rethink language and thought in the face of logical positivism.  One strand derives from the late Wittgenstein, and as Lydia Liu has noted, has been actually influential on the development of linguistic translation in the Cambridge lab of Wittgenstein’s student Margaret Masterman. 

The other is 20c French thought, almost in general.  As Henry Somers-Hall has argued, one thread that unifies otherwise disparate thinkers in mid-century France (he focuses on Bergson, Sartre, Foucault, Merleau-Ponty, Deleuze and Derrida) is their effort to overcome the Kantian understanding of thinking as judgment. Per Kant:

“If understanding in general is to be viewed as the faculty of rules, judgment will be the faculty of subsuming under rules; that is, of distinguishing whether something does or does not stand under a given rule (casus datae legis).  General logic contains, and can contain, no rules for judgment” (B172).

He then immediately says that judgment isn’t subject to rules in the strict sense, but rather “judgment is a peculiar talent which can be practiced only, and cannot be taught” (B172).

Getting past this, on Somers-Hall’s reading, is the One Thing that holds together a broad swath of thinkers.  As Somers-Hall notes in his introduction:

“The French tradition considered in this volume can be seen as attempting to discover a new way of understanding organization that does not rely on judgment, or notions such as subjects or predicates … this logic tends to be a logic of sense that seeks to explain how we encounter a world that is meaningful. We will find that all six of the philosophers we will be dealing with present arguments to show that sense cannot be grounded in judgement for the reason that everything resembles everything else to some degree, and hence once we are dealing with the discrete elements that make up a judgement, it is impossible to distinguish meaningful and coincidental relations. As such, we need an account of the constitution of the elements of judgement that a model of thinking as judging presupposes as simply given. Rather than the logic of association that we find in, for instance, empiricism, this logic will be responsible for constituting a field of elements with relationships of sense” (5).

In the broadest possible brush-strokes, this is the kind of logic that Weatherby finds in structuralism – and why it seems more satisfactory for language models than either the Kantian account or the empiricist one represented by the distributional hypothesis.

Next time I’ll say a little more about the Kant-French connection.

Posted in

4 responses to “Language Machines, Kant, and 20c French Thought (part 1)”

  1. dmf Avatar

    I like casting Kant as a mysterian.
    Another post-Wittgenstein approach to the human side of these matters could be in the work of folks like Dan Hutto on enactivist takes on related matters like math acquisition:

    Click to access HuttoKirchhoffAbrahamson2015-EPR_.pdf

    which keep the better parts of Bert Dreyfus’ sense of what machines can’t do and why LLMs are likely not doing anything like what we are…

    Like

    1. Gordon Hull Avatar

      Interesting. I don’t know a lot about enactivism, but Abebe Birhane has invoked it to talk about the differences b/t human speech and what language models do: https://doi.org/10.1016/j.langsci.2024.101672

      (she’s always worth reading…)

      Like

      1. dmf Avatar

        thanks I’ll take a look. Hutto builds directly off of some aspects of Wittgenstein and he and his crew’s anti-representationalism is akin to Wittgenstein on the limits of rules & rule-following for organization/acting. Perhaps in that sense is not so far from the promise of post-structuralisms I think you are circling around here. But enactivism in a kind of pragmatist instrumentalism (by my account) in that our habits serve our interests/desires in (and so make use of) the world and there obviously there is a kind of chasm as machines are worldless and without interests.
        https://uow.academia.edu/DanielDHutto

        Like

      2. dmf Avatar

        need to go back and read their work again but I think Sperber and Mercier are also working along related lines.

        Click to access MercierSperberWhydohumansreason.pdf

        Like

Leave a comment