Claudia Nuehauser and Brian Herman
Whether we like it or not or want it or not, ChatGPT, LLMs, and other Artificial Intelligence (AI) applications are with us for good. They are already ubiquitous and are starting to infiltrate our jobs to increase speed and efficiency. At times, they are annoying, especially when they pop up as chatbots when we ask for a service and can’t find a way to talk with a human. These platforms will get better and hopefully less annoying, and we will need to learn how to integrate them into our daily lives.
ChatGPT and other LLMs have their advantages. They can synthesize vast amounts of information on a particular topic and write an understandable and convincing summary. They assist us with crafting emails, creating lists, summarizing text, and even holding elementary conversations with us. On the downside, ChatGPT and other LLMs sometimes create false information in a persuasive fashion. In short, they hallucinate. They are not free of bias or allow user privacy and are limited to information that was available when they were trained. For instance, ChatGPT-4 can currently only access materials up until 2022. This will likely change as LLMs become commercial products and are trained more frequently to be current.
Higher education is not immune to the pervasiveness of ChatGPT, LLMs, and other AI applications. Their explosion of capabilities has left educators struggling to figure out how to teach students how to use them. Frequently, university administrators leave it up to instructors to figure out how to deal with them. Some, like the University of Minnesota, provided language that instructors could use in their syllabus to “embrace,” “allow limited usage of,” or “prohibit the usage of ChatGPT.” In essence, we do not know how to deal with a platform that could render much of what we teach and how we teach obsolete.
It is not only the classroom where we encounter LLMs. LLMs have joined professors in their labs, where they must grapple with how to make sure their graduate students and postdocs use it responsibly and understand the risks and limitations. Nobody wants someone on their team who makes up stuff or leaks sensitive information to the outside world. LLMs are guilty of both. To make matters worse, ChatGPT is participating in writing manuscripts and scientific grant applications: According to a recent survey published in Nature, “5% use AI to help them write manuscripts and that more than 15% use the technology to help them write grant proposals.”
Provenance and bias
Despite all the concerns, we already are and will continue to use LLMs. They are way too convenient to forgo them, even if we all know that we cannot rely on what they produce. We don’t know whether their hallucinations will go away with later models or whether we will have to put up for good with chatbots that make up stuff. Our guess is that since LLMs are trained on what humans have produced, and humans are known to make up stuff, it might be difficult to wean LLMs from making up stuff.
LLMs sound authoritative, which makes it difficult to spot when they hallucinate. When asked, they even provide references, further solidifying the claims they make. These references may be from unreliable sources or even made up entirely, so it becomes important to check outside of LLMs the reliability of claims that come from LLMs.
And herein lies the challenge. In our current educational pedagogy, students are taught throughout their educational journey where they can find reliable sources. This will not be sufficient when dealing with LLMs since LLMs will decide which sources to present. Therefore, we will need to teach students how to check whether the LLM sources are reliable. One way to approach this is to teach students how to determine the provenance, or source, of the information. We have started to teach about how to record the provenance of data in courses on data management. Now, we need to teach how to reverse engineer the provenance of any piece of information, that is, how to trace back where the information came from so that we can assess its reliability.
We not only have to teach our students how to figure out the provenance of where the information is coming from, but we will also need to teach them how to determine whether the information provided by an LLM is biased. Even if the corpus the LLM was trained on is unbiased (and we are likely still far from that), the way we ask questions will bias the answer because context matters. We tried this out on ChatGPT. Namely, we asked it to pretend to be a U.S. senator and to be the leader of a developing nation, respectively, and then list the benefits and risks of AI. There was overlap in the topics that ChatGPT listed, but the emphasis was different. For instance, ChatGPT saw benefits in healthcare in both scenarios, but while a US senator might emphasize the possibility of personalized health, a leader of a developing nation might focus AI on predicting disease outbreaks.
Making decisions without understanding
Another challenge is that LLMs will be infinitely more knowledgeable than us. Its corpus of knowledge is potentially everything ever written. This prompted Dr. Lloyd Minor, Dean of the Stanford University School of Medicine, to voice the following opinion in an interview with Jo Craven McGinty from the Wall Street Journal: “What we’re going to be doing as educators is deciding what fundamentals of knowledge students need to have in their active memory to be excellent practitioners. And that probably is going to be much smaller than it is today because the larger breadth of knowledge is going to be readily available.” This may be foolish. If we end up with experts who have an even smaller active knowledge base, we might risk that they will not even comprehend the solution offered by the LLM. We become mere tools in the hands of AI, carrying out tasks without understanding.
In pedagogical circles, we talk about scaffolding knowledge. It means to teach in a way that provides students with a firm framework to build their knowledge, like the way a scaffold in construction supports the building and the construction workers while the building is being built. The concept of scaffolding was introduced by Wood, Bruner, and Ross in 1976 in a paper on The Role of Tutoring in Problem Solving. It is a pedagogical tool that allows novices “to solve a problem, carry out a task or achieve a goal which would be beyond his unassisted efforts.” The assisted effort is provided by an expert. Wood, Bruner, and Ross argued in their paper that this approach to learning “may result, eventually, in the development of task competence by the learner at a pace that would far outstrip his unassisted efforts.” An important aspect of scaffolding, according to this paper, is that “comprehension of the solution must precede production.”
The current versions of LLMs provide answers without reference to the existing knowledge base of the questioner. Future versions may “know” their users better and adjust the answers accordingly. However, we expect experts to judge potential solutions and make informed decisions. Mindlessly relying on AI to drive decision-making may drive us into solution spaces that are sub-optimal without us even recognizing when the quality of decisions erodes, whether the models rely on biased data or don’t properly adjust its answers when new knowledge supersedes previous knowledge or simply get stuck in a suboptimal part of the high-dimensional search space without us realizing it because of our lack of understanding of how LLMs work. Should these AI tools develop any capability of learning and sentience, we then become mere tools in the hands of AI.
We would, therefore, argue that instead of a smaller knowledge base, we should aim for a broader knowledge base across different areas of expertise: LLMs will make connections among multiple areas of expertise, and humans will need to be able to judge the veracity and value of such statements unless they want to follow AI mindlessly. While specific detailed knowledge in a narrow field of expertise may become less valuable, a broad conceptual understanding across different areas will be needed to evaluate the responses of LLMs.