New APPS

Bing also hallucinates, even with footnotes

May 4, 2023

Large Language Models (LLMs) like ChatGPT are well-known to hallucinate – to make up answers that sound pretty plausible, but have no relation to reality. That of course is because they’re designed to produce text that sounds about right given a prompt. What sounds kind of right may or may not be right, however. ChatGPT-3 made up a hilariously bad answer to a Kierkagaard prompt I gave it and put a bunch of words into Sartre’s mouth. It also fabricated a medical journal article to support a fabricated risk to oral contraception. ChatGPT-4 kept right on making up cites for me. It has also defamed an Australian mayor and an American law professor. Let’s call this a known problem. You might even suggest, following Harry Frankfurt, that it’s not so much hallucinating as it is bullshitting.

Microsoft’s Bing chatbot-assisted search puts footnotes in its answers. So it makes sense to wonder if it also hallucinates, or if it does better. I started with ChatGPT today and asked it to name some articles by “Gordon Hull the philosopher.” I’ll spare you the details, but suffice it to say it produced a list of six things that I did not write. When I asked it where I might read one of them, it gave me a reference to an issue of TCS that included neither an article by me nor an article of that title.

So Bing doesn’t have to be spectacular to do better! I asked Bing the same question and got the following:

(more…)
ChatGPT Reread Kierkegaard but still makes up cites

March 17, 2023

Recall that ChatGPT a couple of months ago did a total face plant on the topic of Kierkegaard's knight of faith from the knight of infinite resignation. Well, with the fullness of time and an upgrade, it's a lot better now: (screen grabs below the fold)

(more…)
Some Reasons to be Skeptical of AI Authorship, Part 3: Capitalism and Social Justice

February 20, 2023

By Gordon Hull

In the previous two posts (here and here) I’ve developed a political account of authorship (according to which whether we should treat an AI as an author for journal articles and the like is a political question, not one about what the AI is, or whether its output resembles human output), and argued that AIs can’t be property held accountable. Here I want to argue that AI authorship risks social justice concerns.

That is, there are social justice reasons to expand human authorship that are not present in AI. As I mentioned in the original post, researchers like Liboiron are trying to make sure that the humans who put effort into papers, in the sense that they make it possible, get credit. In a comment to that post, Michael Muller underlines that authorship interacts with precarity in complex ways. For example, “some academic papers have been written by collectives. Some academic papers have been written by anonymous authors, who fear retribution for what they have said.” Many authors have precarious employment or political circumstances, and sometimes works are sufficiently communal that entire communities are listed as authors. There are thus very good reasons to use authorship strategically when there are minoritized individuals or people in question. My reference to Liboiron is meant only to indicate the sort of issue in the strategic use of authorship to protect minoritized or precarious individuals, and to gesture to the more complex versions of the problem that Muller points to. The claim I want to make here is that , as a general matter, AI authorship isn’t going to help those minoritized people, and might well make matters worse.

If anything, therre’s a plausible case that elevating an AI to author status will make social justice issues worse. There’s at least two ways to get to that result, one specific to AI and one more generally applicable to cognitive labor.

(more…)
ChatGPT Putting Words in Sartre’s Mouth

February 17, 2023

As if Sartre didn't produce enough words all by himself!

ChatGPT's response to the following prompt is instructive for those of us who are concerned about ChatGPT being used to cheat. Read past the content of the answer to notice the made-up citations. The "consciousness is a question…" line is in fact in the Barnes translation of Being and Nothingness, but is actually a term in the glossary provided by the translator (so it's not on p. 60 – it's on p. 629). Where did the AI find this? I'm guessing on the Wikipedia page for the book, which has a "special terms" section that includes the quote (and attributes it to Barnes. I should add as an aside that Barnes puts it in quote marks, but doesn't reference any source). The "separation" quote is, as far as I can tell, made up whole cloth. It does sound vaguely Sartrean, but it doesn't appear to be in the Barnes translation, and I can't find it on Google. It's also worth pointing out that neither quote is from the section about the cafe – both page numbers are from the bad faith discussion.

I don't doubt that LLMs will get better (etc etc etc) but for now, bogus citations are a well-known hallmark of ChatGPT. Watch it make-up quotes from Foucault (and generally cause him to turn over in his grave) here.

(more…)
Some Reasons to be Skeptical of AI Authorship, Part 2: Accountability

February 13, 2023

By Gordon Hull

As I argued last time, authorship is a political function, and we should be applying that construction of it to understand whether AI should be considered an author. Here is a first reason for doing so: AI can’t really be “accountable.”

(a) Research accountability: The various journal editors all emphasize accountability. This seems fundamentally correct to me. First, it is unclear what it would mean to hold AI accountable. Suppose the AI fabricates some evidence, or cites a non-existent study, or otherwise commits something that, were a human to do it, would count as egregious research misconduct. For the human, we have some remedies that ought, at least in principle, to discourage such behavior. A person’s reputation can be ruined, their position at a lab or employer terminated, and so on. None of those incentives would make the slightest difference to the AI. The only remedy that seems obviously available is retracting the study. But there’s at least two reasons that’s not enough. First, as is frequently mentioned, retracted studies still get cited. A lot. Retraction Watch even keeps a list of the top-10 most cited papers that have been retracted. The top one right now is a NEJM paper published in 2013 and retracted in 2018; it had 1905 cites before retraction and 950 after. The second place paper is a little older, published in 1998 and retracted in 2010, and has been cited more times since its retraction than before. In other words, papers that are bad enough to be actually retracted cause ongoing harm; a retraction is not a sufficient remedy for research misconduct. If nothing else, whatever AI is going to find and cite it. And all of this is assuming something we know to be false, which is that all papers with false data (etc) get retracted. Second, it’s not clear how retraction disincentivizes an AI any more than any other penalty. In the meantime, there is at least one good argument in favor making humans accountable for the output of an AI: it incentivizes them to check its work.

(more…)
ChatGPT Didn’t Do the Kierkegaard Reading

February 10, 2023

You know how sometimes your students don't do the reading? And then how, when you give them a writing prompt based on it, they try to guess their way to a good answer from the everyday meaning of the words in the prompt? And how, sometimes, the outcome is spectacularly, wonderfully wrong?

Well, I don't know what else ChatGPT can do, but it can do an uncannily good imitation of such a student!

(oh, and like that student, it blew through the wordcount, apparently on the theory that a lot of words would makeup for a lack of reading)

This was a prompt from my existentialism class (the instructions also tell them they have to quote the text, but I omitted that here, because we already know ChatGPT can't do that). It's two images because I am technically incompetent to capture the longer-than-a-screen answer into one image:

(more…)
Come Study Philosophy in Charlotte!

February 8, 2023

The MA Program at UNC Charlotte has a number of funded lines for our two-year MA program in philosophy. We're an eclectic, practically-oriented department that emphasizes working across disciplines and philosophical traditions. If that sounds like you, or a student you know – get in touch! You can email me (ghull@uncc.edu), though for a lot of questions I'll pass you along to our grad director, Andrea Pitts (apitts5@uncc.edu). Or, there's a QR code in the flyer below

(more…)
Some Reasons to be Skeptical of AI Authorship, Part 1: What is an (AI) Author?

February 6, 2023

By Gordon Hull

Large Language Models (LLMs) like Chat-GPT burst into public consciousness sometime in the second half of last year, and Chat-GPT’s impressive results have led to a wave of concern about the future viability of any profession that depends on writing, or on teaching writing in education. A lot of this is hype, but one issue that is emerging is the role of AI authorship in academic and other publications; there’s already a handful of submissions that list AI co-authors. An editorial in Nature published on Feb. 3 outlines the scope of the issues at hand:

“This technology has far-reaching consequences for science and society. Researchers and others have already used ChatGPT and other large language models to write essays and talks, summarize literature, draft and improve papers, as well as identify research gaps and write computer code, including statistical analyses. Soon this technology will evolve to the point that it can design experiments, write and complete manuscripts, conduct peer review and support editorial decisions to accept or reject manuscripts”

As a result:

“Conversational AI is likely to revolutionize research practices and publishing, creating both opportunities and concerns. It might accelerate the innovation process, shorten time-to-publication and, by helping people to write fluently, make science more equitable and increase the diversity of scientific perspectives. However, it could also degrade the quality and transparency of research and fundamentally alter our autonomy as human researchers. ChatGPT and other LLMs produce text that is convincing, but often wrong, so their use can distort scientific facts and spread misinformation.”

The editorial then gives examples of LLM-based problems with incomplete results, bad generalizations, inaccurate summaries, and other easily-generated problems. It emphasizes accountability (for the content of material: the use of AI should be clearly documented) and the need for the development of truly open AI products as part of a push toward transparency.

(more…)
Gatecrashers, Blue Buses and Speeding Drivers: Philosophy of Law as a way into AI Accountability (part 2)

November 3, 2022

By Gordon Hull

Last time, I introduced a number of philosophy of law examples in the context of ML systems and suggested that they might be helpful in thinking differently, and more productively, about holding ML systems accountable. Here I want to make the application specific.

So: how do these examples translate to ML and AI? I think one lesson is that we need to specify what exactly we are holding the algorithm accountable for. For example, if we suspect an algorithm of unfairness or bias, it is necessary to specify precisely what the nature of that bias or unfairness is – for example, that it is more likely to assign high-risk status to Black defendants (for pretrial detention purposes) than it is white ones. Even specifying fairness in this sense can be hard, because there are conflicting accounts of fairness at play. But assuming that one can settle that question, we don’t need to specify tokens or individual acts of unfairness (or demand that each of them rise to the level where they would individually create liability) to demand accountability of the algorithm or the system that deploys it – we know that the system will have treated defendants unfairly, even if we don’t know which ones (this is basically a disparate impact standard; recall that one of the original and most cited pieces on how data can be unfair was framed precisely in terms of disparate impact).

Further, given the difficulties of individual actions (litigation costs, as well as getting access to the algorithms, which defendants will claim as trade secrets) in such cases, it seems wrong to channel accountability through tort liability and demand that individuals prove the algorithm discriminated against him (how could they? The situation is like the blue bus: if a group of people is 80% likely to reoffend or skip bail, we know that 20% of that group will not, and there is no “error” for which the system can be held accountable). Policymakers need to conduct regular audits or other supervisory activity designed to ferret out this sort of problem, and demand accountability at the systemic level.

(more…)
Gatecrashers, Blue Buses and Speeding Drivers: Philosophy of Law as a way into AI Accountability (part 1)

October 25, 2022

By Gordon Hull

AI systems are notoriously opaque black boxes. In a now standard paper, Jenna Burrell dissects this notion of opacity into three versions. The first is when companies deliberately hide information about their algorithms, to avoid competition, maintain trade secrets, and to guard against gaming their algorithms, as happens with Search Engine Optimization techniques. The second is when reading and understanding code is an esoteric skill, so the systems will remain opaque to all but a very small number of specially-trained individuals. The third form is unique to ML systems, and boils down to the argument that ML systems generate internal networks of connections that don’t reason like people. Looking into the mechanics of a system for recognizing handwritten numbers or even a spam detection filter wouldn’t produce anything that a human could understand. This form of opacity is also the least tractable, and there is a lot of work trying to establish how ML decisions could be made either more transparent or at least more explicable.

Joshua Kroll argues instead that the quest for potentially impossible transparency distracts from what we might more plausibly expect from our ML systems: accountability. After all, they are designed to do something, and we could begin to assess them according to the internal processes by which they are developed to achieve their design goals, as well as by empirical evidence of what happens when they are employed. In other words, we don’t need to know exactly how the system can tell a ‘2’ from a ‘3’ as long as we can assess whether it does, and whether that objective is serving nefarious purposes.

I’ve thought for a while that there’s potential help for understanding what accountability means in the philosophy of law literature. For example, a famous thought experiment features a traffic accident caused by a bus. We have two sources of information about this accident. One is an eyewitness who is 70% reliable and says that the bus was blue. The other is the knowledge that 70% of the buses that were in the area at the time were blue. Epistemically, these ought to be equal – in both cases, you can say with 70% confidence that the blue bus company is liable for the accident. But we don’t treat them as the same: as David Enoch and Talia Fisher elaborate, most people prefer the witness to the statistical number. This is presumably because when the witness is wrong, we can inquire what went wrong. When the statistic is wrong, it’s not clear that anything like a mistake even happened: the statistics operate at a population level; when applied to individuals, the use of statistical probability will be wrong 30% of the time, and so we have to expect that. It seems to me that our desire for what amounts to an auditable result is the sort of thing that Kroll is pointing to.

(more…)

recent posts

about