New APPS

Study Philosophy in Charlotte!

February 15, 2024

The MA Program at UNC Charlotte has a number of funded lines (in-state tuition plus $14k a year) for our two-year MA program in philosophy. We're an eclectic, practically-oriented department that emphasizes working across disciplines and philosophical traditions. If that sounds like you, or a student you know – get in touch!

We also have a new Concentration in Research and Data Ethics is designed to prepare students for jobs in areas like research ethics and compliance offices, healthcare ethics, or other fields requiring training in the ethics of research, big data, and AI.

Feel free to email me (ghull@charlotte.edu) with questions about the program, or our Graduate Program Director, Lisa Rasmussen (lrasmuss@charlotte.edu). The flyer below has some information and a QR code. Or visit the department page or the graduate program page.

Note that you need to apply by March 15 to be eligible for funding.
New Paper: “Unlearning Descartes: Sentient AI is a Political Problem”

January 3, 2024

Just published in the Journal of Social Computing, as part of a special issue on the question of the sentience of AI systems. The paper is here (open access); here's the abstract:

The emergence of Large Language Models (LLMs) has renewed debate about whether Artificial Intelligence (AI) can be conscious or sentient. This paper identifies two approaches to the topic and argues: (1) A “Cartesian” approach treats consciousness, sentience, and personhood as very similar terms, and treats language use as evidence that an entity is conscious. This approach, which has been dominant in AI research, is primarily interested in what consciousness is, and whether an entity possesses it. (2) An alternative “Hobbesian” approach treats consciousness as a sociopolitical issue and is concerned with what the implications are for labeling something sentient or conscious. This both enables a political disambiguation of language, consciousness, and personhood and allows regulation to proceed in the face of intractable problems in deciding if something “really is” sentient. (3) AI systems should not be treated as conscious, for at least two reasons: (a) treating the system as an origin point tends to mask competing interests in creating it, at the expense of the most vulnerable people involved; and (b) it will tend to hinder efforts at holding someone accountable for the behavior of the systems. A major objective of this paper is accordingly to encourage a shift in thinking. In place of the Cartesian question—is AI sentient?—I propose that we confront the more Hobbesian one: Does it make sense to regulate developments in which AI systems behave as if they were sentient?
Iterability and Implicit Normativity

December 18, 2023

By Gordon Hull

In a couple of previous posts (first, second), I looked at what I called the implicit normativity in Large Language Models (LLMs) and how that interacted with Reinforcement Learning with Human Feedback (RLHF). Here I want to start to say something more general, and it seems to me like Derrida is a good place to start. According to Derrida, any given piece of writing must be “iterable,” by which he means repeatable outside its initial context. Here are two passages from the opening “Signature, Event, Context” essay in Limited, Inc.

First, writing cannot function as writing without the possible absence of the author and the consequence absence of a discernable authorial “intention:”

“For a writing to be a writing it must continue to ‘act’ and to be readable even when what is called the author of the writing no longer answer for what he has written, for what he seems to have signed, be it because of a temporary absence, because he is dead or, more generally, because he has not employed his absolutely actual and present intention or attention, the plenitude of his desire to say what he means, in order to sustain what seems to be written ‘in his name.’ …. This essential drift bearing on writing as an iterative structure, cut off from all absolute responsibility, from consciousness as the ultimate authority, orphaned and separated at birth from the assistance of its father, is precisely what Plato condemns in the Phaedrus” (8).

Second, iterability puts a limit to the use of “context:”

“Every sign, linguistic or nonlinguistic, spoken or written (in the current sense of this opposition), in a small or large unit, can be cited, put between quotation marks, in so doing it can break with every given context, engendering an infinity of new contexts in a manner which is absolutely illimitable. This does not mean that the mark is valid outside of a context, but on the contrary that there are only contexts without any center or absolute anchorage” (12)

It seems to me that Derrida’s remarks on iterability are relevant in the context of LLMs because they indicate that LLMs are radically dependent on iterability. This is true in at least three ways, each of which points to an important source of their implicit normativity.

(more…)
Shane MacGowan, 1957-2023

December 1, 2023

By Gordon Hull

I first listened to the Pogues late in high school. I had started moving beyond the music I could hear on the radio – basically top 40 and classic rock – and I discovered the Pogues’ Rum, Sodomy and the Lash at about the same time I discovered Midnight Oil’s Diesel and Dust. I didn’t know music could be like “Sally MacLennane” or “The Sick Bed of Cuchulainn” or “A Pair of Brown Eyes,” and I was hooked. I listened more and more, and even had a chance to see them perform in London at the Academy Brixton.

I say all of this of course because the Pogues’ lead singer and primary songwriter, Shane MacGowan, died yesterday. The Pogues managed to sound a little Irish and a little Punk without exactly being either, and their work is central to a lot of the contemporary Irish music community. A 60^th birthday tribute gala for MacGowan drew artists like Bono and Sinead O’Connor. The Cranberries’ Dolores O’Riordan praised the music (she also died hours before MacGowan’s gala; the entire who’s-who of Irish music paid tribute to her before switching to him). O’Connor and MacGowan were very close, and he credited her with getting him off of heroin. I remember that when she died, some reports were worried about what telling him would do to his very fragile health. The Pogues also spawned an entire genre of bands like the Dropkick Murphys, The Dreadnoughts and Flogging Molly.

(more…)
A Least-Bad Solution for Language Model Defamation?

November 29, 2023

By Gordon Hull

Large Language Models (LLMs) are well-known to “hallucinate,” which is to say that they generate text that is plausible-sounding but completely made-up. These difficulties are persistent, well-documented, and well-publicized. The basic issue is that the model is indifferent to the relation between its output and any sort of referential truth. In other words, as Carl Bergstrom and C. Brandon Ogbunu point out, the issue isn’t so much hallucination in the drug sense, but “bullshitting” in Harry Frankfurt’s sense. One of the reasons this matters is defamation: saying false and bad things about someone can be grounds to get sued. Last April, ChatGPT made the news (twice!) for defamatory content. In one case, it fabricated a sexual harassment story and then accused a law professor. In another, it accused a local politician in Australia of corruption.

Can LLMs defame? According to a recent and thorough analysis by Eugene Volokh, the answer is almost certainly yes. Volokh looks at two kinds of situation. One is when the LLM defames public figures, which is covered by the “actual malice” standard. Per NYT v. Sullivan, “The constitutional guarantees require … a federal rule that prohibits a public official from recovering damages for a defamatory falsehood relating to his official conduct unless he proves that the statement was made with ‘actual malice’ – that is, with knowledge that it was false or with reckless disregard of whether it was false or not” (279-80).

(more…)
Deepfake Research: ChatGPT can produce fake data

November 13, 2023

By Gordon Hull

There’s been a lot of concern about the role of language models in research. I had some initial thoughts on some of that based around Foucault and authorial responsibility (part 1, part 2, part 3). A lot of those concerns have to do with the role of ChatGPT or other LLM-based product and how to process that. The consensus of the journal editorial policies that are emerging is that AI cannot be an author, and my posts largely agreed with that.

Now there’s news of a whole other angle on these questions: a research letter in JAMA Ophthalmology reports that the authors were able to use ChatGPT-4’s Advance Data Analysis capabilities to produce a fake dataset validating their preferred research results. Specifically:

“The LLM was asked to fabricate data for 300 eyes belonging to 250 patients with keratoconus who underwent deep anterior lamellar keratoplasty (DALK) or penetrating keratoplasty (PK). For categorical variables, target percentages were predetermined for the distribution of each category. For continuous variables, target mean and range were defined. Additionally, ADA was instructed to fabricate data that would result in a statistically significant difference between preoperative and postoperative values of best spectacle-corrected visual acuity (BSCVA) and topographic cylinder. ADA was programmed to yield significantly better visual and topographic results for DALK compared with PK”

This is a very technical request! It took a bit of tweaking, but soon “the LLM created a seemingly authentic database, showing better results for DALK than PK,” P < .001.

The authors suggest some possible strategies to manage this but suffice it to say it is terrifying. There is already a longstanding, huge problem with fabricated, doctored or otherwise bogus scientific research out there. One report suggests that 70,000 “paper mill” (= almost completely faked) papers were published in the last year alone. In real papers, references are often inaccurate. Publishers already are having to grapple with lots of problematic doctored images, and Pharma has long tilted the entire scientific enterprise to produce results favorable to its products. At the end of last year, Stanford’s president was forced out over research misconduct in his labs. In an initial report into the Stanford investigation, STAT News reported data from Retraction Watch to the effect that a paper is retracted, on average, every other day for image manipulation. Retraction Watch had, at that time (Dec. 2022) 37,000 papers in its database. The top 5 most-retracted authors have at least 100 retracted papers each.

Into that mess, enter the ability to generate bespoke data on demand.
RLHF and Curation Transparency

November 9, 2023

By Gordon Hull

Last time, I followed a reading of Kathleen Creel’s recent “Transparency in Complex Computational Systems” to think about the ways that RLHF (Reinforcement Learning with Human Feedback) in Large Language Models (LLMs) like ChatGPT necessarily involves an opaque, implicit normativity. To recap: RLHF improves the models by involving actual humans (usually gig workers) in their training: the model presents two possible answers to a prompt, and the human tells it which one is better. As I suggested, and will pursue in a later post, this introduces all sorts of weird and difficult-to-measure normative aspects into the model performance, above and beyond those that are lurking in the training data. Here I want to pause to consider this as a question of opacity and transparency. I’m going to end up by proposing that there’s a fourth kind of transparency that we should care about, for both epistemic and moral reasons, which I’ll call “curation transparency.”

(more…)
Implicit Normativity in Reinforcement Learning with Human Feedback in Large Language Models

November 2, 2023

By Gordon Hull

This is somewhat circuitous – but I want to approach the question of Reinforcement Learning with Human Feedback (RLHF) by way of recent work on algorithmic transparency. So bear with me… RLHF is currently all the rage in improving large language models (LLMs). Basically, it’s a way to try to deal with the problem that LLMs aren’t referentially grounded, which means that their output is not in any direct way connected to the world outside the model.

LLMs train on large corpora of internet text – typically sources like Wikipedia, Reddit, patent applications and so forth. They learn to predict what kinds of text are likely to come next, given a specific input text. The results, as anybody who has sat down with ChatGPT for long knows, can be spectacular. Those results also evidence that the models function, in one paper’s memorable phrasing, as “stochastic parrots.” What they say is about what their training data says is most likely, not about what’s, say, contextually appropriate. But appropriate human speech is context-dependent, and answers that sound right (in the statistical sense: these words, in general, are likely to come after those words) in one context may be wrong in another (because language does not get used “in general”). RLHF is designed to get at that problem, as a blogpost at HuggingFace explains:

(more…)
More on AI and Copyright

September 29, 2023

Another case percolating through the system, this one about Westlaw headnotes. The judge basically ruled against a series of motions for summary judgment, which means that the case is going to a jury. Discussion here (link via Copyhype)
Cars are Privacy Disasters

September 14, 2023

This article from Gizmodo reports on research done over at Mozilla. Newer cars – the ones that connect to the internet and have lots of cameras – are privacy disasters. Here’s a paragraph to give you a sense of the epic scope of the disaster:

“The worst offender was Nissan, Mozilla said. The carmaker’s privacy policy suggests the manufacturer collects information including sexual activity, health diagnosis data, and genetic data, though there’s no details about how exactly that data is gathered. Nissan reserves the right to share and sell “preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes” to data brokers, law enforcement, and other third parties.”

Nissan’s response tells you everything that’s wrong with current privacy legislation:

““When we do collect or share personal data, we comply with all applicable laws and provide the utmost transparency,” said Lloryn Love-Carter, a Nissan spokesperson. “Nissan’s Privacy Policy incorporates a broad definition of Personal Information and Sensitive Personal Information, as expressly listed in the growing patchwork of evolving state privacy laws, and is inclusive of types of data it may receive through incidental means.””

Let’s translate. Nissan is probably compliant. Also, privacy compliance is a joke. Also, compliance apparently only requires that you receive NOTICE that they take your data AND CONSENT to that policy, probably merely by driving the vehicle. Also, they probably reserve the right to change their privacy policies unilaterally, at will. Also, they almost certainly do not let you opt-out of any of it while CONSENTING by driving the car. It’s a very special kind of “contract” and “consent.” Also, how do they know about your sex life? Also, even if you have sex in the car, there is basically no answer to that question that is not beyond creepy!

As you may have guessed, NOTICE AND CONSENT is an utter sham and has been for a while. The gizmodo article spells out some of the particular absurdities here – for example, you may not want to ride in one of these cars either, as passengers are “users” deemed to have CONSENTED to the privacy policy. Your driver should probably provide you NOTICE beforehand! “A number of car brands say it’s the driver’s responsibility to let passengers know about their car’s privacy policies—as if the privacy policies are comprehensible to drivers in the first place.” No wonder folks are cynical and resigned about corporate privacy – they’re manipulated into it by corporations. Also, they’re confused, frustrated and angry about the fact they don’t actually get to consent.

This is the best example I’ve seen of all that in a while, and a crystal-clear indicator of why we need not just new privacy legislation (we do!) but a new direction (more real regulation, less soft compliance and "notice and consent" fig-leaves).

PS – not picking only on Nissan:

“Other brands didn’t fare much better. Volkswagen, for example, collects your driving behaviors such as your seatbelt and braking habits and pairs that with details such as age and gender for targeted advertising. Kia’s privacy policy reserves the right to monitor your “sex life,” and Mercedes-Benz ships cars with TikTok pre-installed on the infotainment system, an app that has its own thicket of privacy problems.”

recent posts

about