It’s time to rethink our relationship with AI
Flavio Coelho/Getty Images
It’s undeniable that the launch of ChatGPT was a historic event, but is it because it was the first great step towards a super-intelligent future, or because it was the start of a world full of AI snake oil salesmen? I’ve long thought that big language models, the technology behind AI chatbots, are fascinating but flawed, which puts me firmly in the snake oil camp. But a week of vibration programming revealed something surprising: both boosters and skeptics are wrong.
First I should explain. Vibe coding, if you’re not familiar with it, is a term coined about a year ago by Andrej Karpathy, an AI researcher who co-founded and previously worked at OpenAI. It refers to the process of developing software by “vibrating” with an AI model, giving it instructions in plain language while allowing it to generate actual code. Lately I’ve seen people say that the latest tools – Claude Code and ChatGPT Codex – have become surprisingly good at coding like The New York Times named “The AI disruption we’ve been waiting for has arrived“.
I decided to experiment with these tools and was amazed by the results. In a few short days, with only limited coding experience, I’ve built personally useful apps like an audiobook picker that checks what’s available at my local library and a combined camera and teleprompter app that runs on my phone.
This may sound boring to you, and that’s perfectly fine, for reasons I’ll explain later. What’s important here is that this process has led me to become more deeply involved in products like ChatGPT than before. I’ve tried smaller experiments before, got disgusted with generic writing, flattery, or inaccurate search results, and bounced back. With these new coding projects, my extended use has made me realize something I hadn’t before – the way LLMs were made creates a machine I’m destined to hate.
Very few of us have been exposed to “raw” LLM, by which I mean a statistical model that has been trained on a large collection of data to produce a credibly representative text. Instead, most of us use technology that has been mediated by a process called reinforcement learning from human feedback (RLHF). AI companies use humans to rate the text produced by the raw LLM, rewarding answers that are perceived as confident, useful and engaging, while penalizing harmful content or answers that are likely to deter most users from interacting with their products.
It’s this RLHF process that creates the generic “chatbot voice” you’re probably familiar with. It’s a process baked into the manufacturer’s implicit values, from the general “move fast and break things” attitude of Silicon Valley to the more specific Elon Musk-inspired ideology of Grok, X’s controversial chatbot.
Currently, it is very difficult to get a chatbot to express uncertainty, contradict the user, or stop forward movement. This was most obvious to me when I ran into an intractable problem with my teleprompter. I was trying to create an app that would overlay text on top of my existing camera app, I assumed it would be easier than creating a camera from scratch, but the code producing ChatGPT kept failing. It repeatedly suggested fixes and urged me to continue with the project. It wasn’t until I realized that the complexity of the Android OS, which I won’t bore you with, meant that creating an all-in-one app would be much easier. Once I asked ChatGPT to create it, it worked immediately.
I learned from this and began instructing ChatGPT to constantly question itself and me. I demanded vigilant skepticism. “Jacob wants the assistant to implicitly use evidence-first analysis: avoid extrapolation, explicitly mark inference against the evidence, and prefer to express uncertainty or stop when the evidence is weak unless the user asks for speculation,” is just one of the (self-generated) frameworks I put in his mind. In other words, I built a model uniquely designed to work with my psychological profile, carefully selecting OpenAI values and replacing them with my own.
It’s not perfect. It is very hard for LLMs to fight their RLHF training and they are constantly reviewing the default settings. But it does mean that I now have a tool that serves as a somewhat useful cognitive mirror. I didn’t use him to write this article, partly because his writing style is still terribly tense, partly because The new scientistquite right, it has strict rules against AI generated copy, but I used it to think about this article. I asked my cognitive mirror to examine the arguments and counterarguments, rejecting many of its conclusions as false or false. I got value, but it required caution and work, not letting the AI do the heavy lifting. Crucially, my brain remained fully engaged throughout.
This leads me to reinforce a conclusion I’ve already reached: plugging into someone else’s AI output is functionally useless in almost all cases. You can’t get anything out of an AI-generated text that wouldn’t be better received by challenging the AI yourself. I also continue to reject the idea that AI is actually intelligent in any way – instead I think of LLM as a cognitive tool, like a calculator or a word processor. With this framing as a private tool, not a world-conquering machine, I now see the benefit. For that reason, it is right that you should not be interested in my teleprompter app. What should excite you is being able to solve your own unique problems in your own unique way.
This is where our current AI paradigm presents another problem. In my opinion, the best LLM would be one that runs on your own computer, without being attached to a private company. It should be treated as a dangerous experimental tool over which you have full control. I am reminded of the meme that software engineers keep a loaded gun next to their printer in case it makes a noise they don’t recognize. Sadly, running your own cutting-edge LLM isn’t currently possible for a variety of reasons, not the least of which is that the AI boom is driving up the prices of the very hardware you need.
I also have to deal with LLM’s original sin: potential copyright infringement. By design, this technology can only be built on data received at scale, essentially the entire textual record of humanity. It is undeniable that firms like OpenAI built their models using copyrighted text without permission, although whether this was actually illegal is the subject of ongoing litigation. A private LLM would have the same problems, but I see solutions like public sector models, effectively pardoned by governments and freely distributed for the benefit of all, not private corporations. I also remain concerned about the environmental impact of data centers, but again this could be partially mitigated by a wider distribution of LLMs running on our own machines.
I agree that some people reading this will accuse me of selling out to techies. All I can say about this is that I have not changed my long-standing attitude towards the LLM as a technology that is fascinating, dangerous and sometimes extraordinary.
I realized that the main way we engage with this technology, through slick chatbots like ChatGPT, is where so much damage comes in and is allowed to spread out into the world. LLMs should not be settled and manufactured, forced into every part of our lives with sparkly emoticons that want to be your friend. It would be much better if we used these tools with deliberation, with increased friction and with full awareness and caution of the potential damage they can cause. Here his head with fangs rears up a useful metaphor. I don’t want OpenAI snake oil. I want snakes.
topics:

Leave a Reply