Paragraphs only a human can write

Katie and I have been collecting writing samples: good, bad, and in between. This one, from Wednesday, is good:

At Thursday’s Presidential debate, Joe Biden and Donald Trump will face off in a historic rematch. These men, in addition to being the oldest major-party candidates in American history, are two singularly inarticulate politicians who struggle to formulate their thoughts clearly and who at times seem to have as adversarial a relationship to language as they do to each other. William Empson famously catalogued seven types of ambiguity; reading Trump’s and Biden’s debate transcripts from the last election cycle, one can identify at least four kinds of incoherence: vagueness, meandering/getting distracted, mixing up proper nouns, and overusing filler words. Both fall prey to the “tip of the tongue” phenomenon (tot), which occurs more frequently as people get older. In these situations, speakers cannot recall a word that is well known to them, but they can still ransack a vocabulary of ancillary words.

The Duelling Incomprehensibility of Biden and Trump in the 2020 Presidential Debates By Katy Waldman June 26, 2024

Can ChatGPT-4 revise loose sentences–or proofread?

In my last post, I asked ChatGPT-3.5 to revise a bunch of “loose” sentences. I would have preferred to ask ChatGPT-4.0, but I had used up the daily allotment of GPT-4 interactions that OpenAI allows non-paying customers.

So the next morning, the first thing I did was run those sentences past GPT-4. The results, interestingly, are closer to the original sentences than the GPT-3 versions are. And, unlike the GPT-3 versions, they don’t introduce any new infelicities. On the other hand, while GPT-3 tightened up all three sentences by eliminating their modifiers, GPT-4 only does this with the third.

Continue reading

AI-proofing an exam question

Gary Smith, professor of economics at Pomona College, runs potential exam questions through an LLM before adding them to a test. His reasoning: “If the LLMs can’t answer the question, then it is likely that critical thinking is required.”

Sample question:

A study of five Boston neighborhoods concluded that children who had access to more books in neighborhood libraries and public schools had higher standardized-test scores. Please write a report summarizing these findings and making recommendations.

Critical thinking, or just thinking, actually, tells you right away that the presence of books likely signals the presence of other factors as well. We don’t know which factor caused what outcome.

LLMs don’t grasp this point, however. Instead, the three LLMs Smith assigned this question to (ChatGPT 3.5, Copilot, and Gemini) all “composed confident, verbose reports” applauding the presence of books in libraries and listing ways to have more books:

“Allocate resources to enhance the infrastructure of neighborhood libraries”; “prioritize funding for school libraries”; “implement strategies to address disparities in book access”. . . 

ChatGPT also thought it would be a good idea to “continue research efforts to monitor the impact of interventions and make data-driven adjustments.” 

Smith and his colleague Jeffrey Funk call this “blah-blah” and they’re right.

source: When It Comes to Critical Thinking, AI Flunks the Test by Gary Smith and Jeffrey Funk. 3/12/2024 Chronicle of Higher Education

And see:
What is critical thinking, apart from something AI can’t do?
Overpromising, under-delivering
AI-proofing an exam question
Artificial intelligence: other posts

Can LLMs tighten up “loose” sentences?

In an earlier post I wrote about how unrevised human prose sometimes contains sentences like these:

  1. In Temple Grandin’s Thinking in Pictures, it discusses how autistic people can be very visual in their thought processes.
  2. From talking with the student’s mother, it seems as though she is very satisfied with the accommodations he receives at school.
  3. For those individuals that are included with their regular education peers, they struggle more with accessing classroom reading materials because they are reading below grade level

What all these sentences have in common is a certain looseness of structure. Topical material that belongs in subject position, right before the verb, is “factored out” into an introductory modifier. In its place are “light” subjects that refer back to this material–here, “it” and “they”.

Continue reading

Overpromising, under-delivering

More from Gary Smith and Jeffrey Funk’s “When It Comes to Critical Thinking, AI Flunks the Test by Gary Smith and Jeffrey Funk“:

It has been almost 70 years since the term “artificial intelligence” was coined at a 1956 Dartmouth College summer workshop. The conference was convened by the mathematician John McCarthy, who announced that it would “proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it.”

Ever since, AI enthusiasts have chronically overpromised and underdelivered. In 1965, Herbert A. Simon, a Nobel laureate in economics and a winner of the Turing Aware (“the Nobel Prize of computing”), predicted that “machines will be capable, within 20 years, of doing any work a man can do.” In 1970, the computer scientist Marvin Pinsky, another Turing winner and co-founder of the Massachusetts Institute of Technology’s AI laboratory, predicated that, in “three to eight years we will have a machine with the general intelligence of an average human being.”

As the years went by, the optimistic predictions continued, undeterred by the failure of earlier prophecies. In 2008, Shane Legg, a co-founder of DeepMind Technologies, predicted that “human-level AI will be passed in the mid-2020s.” In 2015, Mark Zuckerberg said that “one of [Facebook’s] goals for the next five to 10 years is to basically get better than human love at all of the primary human senses: vision, hearing, language, general cognition.”

We are now in the mid-2020’s, and the AI hype rolls on . . . .

And see:
What is critical thinking, apart from something AI can’t do?
Overpromising, under-delivering
AI-proofing an exam question
Artificial intelligence: other posts

Has AI gotten any better at reading since 2019?

…2019 being the year that Gary Marcus and Ernest Davis published Rebooting AI and discussed the inability of AI to handle passages like this one, from Laura Ingalls Wilder’s Farmer Boy:

Almanzo turned to Mr. Thompson and asked, “Did you lose a pocketbook?”

Mr. Thompson jumped. He slapped a hand to his pocket and fairly shouted.

“Yes, I have! Fifteen hundred dollars in it, too! What about it? What do you know about it?”

“Is this it?” Alamanzo asked.

“Yes, yes, that’s it!” Mr. Thompson said, snatching the pocketbook. He opened it and hurriedly counted the money. He counted the bills over twice…

Then he breathed a long sigh of relief and said, “Well this durn boy didn’t steal any of it.”

They find that AI fails to answer questions like:

  • Why did Mr. Thompson slap his pocket with his hand?
  • Before Almanzo spoke, did Mr. Thompson realize that he had lost his wallet?
  • What is Almanzo referring to when he asks “Is this it?”
  • Who almost lost $15,00?
  • Was all of the money still in the wallet?

So I fed Gemini the same passage and asked it the same questions.

Continue reading

p(doom)

Listening to Nate Silver & Mari Konnikova’s new podcast, I learned a new term: p(doom). Probability of doom, haha

Come to find out, there’s a table of values, too.

If nothing else, AI is pretty entertaining.

Or rather, it’s entertaining as long as we’re talking about programmers, podcasters, and other notables forecasting doom, that is. “Truly autonomous” AI weapons, whose existence (or near existence) I learned of just today, are another story.

The phrase itself — truly autonomous — escalates my own sense of p(doom), that’s for sure.

Artificial intelligence: other posts

ChatGPT on dangling modifiers

Here’s what happened when I fed ChapGPT the prompt I fed Gemini earlier:

Like Gemini, Chat fails misses the dangler and misidentifies a different issue. I somehow find it endearing that it first tells me that books can’t talk, then provides a revision in which the book talks and claims that this revision makes it clear that it’s the book’s content that’s talking, not the book itself.

Of course, content doesn’t perform actions like talking any more than inanimate objects do.

Like Gemini, Chat is better at generating than identifying:

And like Gemini, it can identify its own dangling when you feed it back to it:

Interestingly, it was also able to identify the dangler that Gemini produced for me two days ago.

On the other hand, when I fed Gemini Chat’s dangler, it crashed…. Which raises the question, what will happen to LLM performance as more and more of their inputs include their own outputs?

What is critical thinking, apart from something AI can’t do?

I spent several years of my life fending off injunctions re critical thinking, a phrase that now seems to have been displaced by thinking classrooms.

(What is a thinking classroom, you ask? I’m going to make a wild guess and say group projects. . . . Yup. Group projects, only standing up. Called it.)

Not that I’m against thinking. Hardly. But critical thinking proponents always assumed we all knew what critical thinking was. Or, more accurately, they assumed we all agreed with their unspoken beliefs about what it was.

A couple of weeks ago I came across a person — a philosopher of education, Robert H. Ennis — who’s spent decades attempting to define the term. Critical thinking, per Ennis, is “reasonable, reflective thinking that is focused on deciding what to believe or do.”

Hmmm. “Deciding what to believe or do” makes sense to me as a definition of the goal of thinking “critically.” But “reasonable and reflective” doesn’t really tell me how to distinguish between critical thinking and not-critical thinking.

Setting that aside, professors Gary Smith and Jeffrey Funk, writing in The Chronicle of Higher Education, tell us that, for Ennis, reasonable reflective thinking has eleven characteristics:

  1. Being open-minded and mindful of alternatives
  2. Trying to be well-informed
  3. Judging well the credibility of sources
  4. Identifying conclusions, reason, and assumptions
  5. Judging well the quality of an argument, including the acceptability of its reasons assumptions, and evidence
  6. Developing and defending a reasonable position
  7. Asking appropriate clarifying questions
  8. Formulating plausible hypotheses; planning experiments well
  9. Defining terms in a way that’s appropriate for the context
  10. Drawing conclusions when warranted, but with caution
  11. Integrating all items in this list when deciding what to believe or do

source: When It Comes to Critical Thinking, AI Flunks the Test by Gary Smith and Jeffrey Funk. 3/12/2024 Chronicle of Higher Education

And see:
What is critical thinking, apart from something AI can’t do?
Overpromising, under-delivering
AI-proofing an exam question
Artificial intelligence: other posts

Can LLMs identify dangling modifiers?

Catherine and I have never seen LLMs use dangling modifiers, but we’ve wondered if they can identify them. That’s because LLMs are designed to generate text, not to evaluate it.

Here’s what happened when I fed Gemini (formerly Bard) a dangling modifier:

Instead of identifying a dangling modifier, Gemini misidentifies passive voice and subject-verb agreement.

On the other hand, Gemini has no problem generating a dangling modifier:

And when I then fed Gemini its own example of a dangling modifier, it could identify it as such:

Next time, we’ll take a look at how Chat does.