We are fast approaching two years of AI-driven change, fueled by the November 2022 release of ChatGPT by OpenAI. So far it’s been a mixed bag.
OpenAI recently announced that it has hit 200 million weekly active users – nothing to be trifled with, but it got its first 100 million within two months of release. A recent YouGov survey found that the introduction of AI into a product is as likely to turn off a potential customer as it is to make them give up their money.
However, money is still flowing into the sector, and progress is coming. OpenAI is still rounding up investors for money to support future development that will see the company valued at $150 billion. That would put it on par with Cisco, Shell and McDonalds. And last week, it unveiled its latest model, called the o1, which it says is a step change in the development of artificial intelligence.
The o1 model, formerly called Strawberry, is designed to reason about decisions, the same way humans do. The latest version of the model that supports ChatGPT is actually a step back when it comes to production speed and model size, which is small for the time. Think of it as GPT-4.5, as opposed to the next major iteration, GPT-5, which is reportedly in development.
Task: Impossible?
Although on paper the o1 is a wet squib, it does something that Alex had previously highlighted in this journal as an issue with LLM-based chatbots, and which he called the “Tom Cruise Problem”. The point was that researchers could ask a ChatGPT question in one way, but when asked a question directly related to the first one – for example, who is Tom Cruise’s mother? (Answer: Mary Lee Pfeiffer), then it is asked who is Mary Lee Pfeiffer’s son? (Answer: Tom Cruise) – would balk.
Ask o1 those questions and answer them. It even provides traces of how it arrives at the answer – which OpenAI clearly has, and inaccurately because AI models don’t have brains, called “thoughts”. (If you want to know why anthropomorphising AI models is an issue, check out this story I wrote in February.) When asked the second question, o1 “think” for four seconds, to including tracking family relationships and supporting information.
So far, so good. OpenAI says o1 can reason. Many are not sure of such a statement of such a declaration, but let them have it for the sake of marketing. That could mean a big change in how natural AI is used: instead of regurgitating information from its training data, or providing answers it thinks will please users, it can think about information and response.
“Can”, however, is the key word. We’re still pretty much in the dark about how these things work – and “we” includes the developers of such devices. OpenAI has said that this reasoning ability is a big deal – the company even made the dubious claim that o1 is its most dangerous model yet (see here how sometimes more marketing than anything else). Those who have tried to test the limits of the o1 model seem to agree with their point about reasoning, but not the dangerous part.
Pay no attention to the man behind the curtain!
Well, kind of. Because the test can go on until now. To try and understand the series of ideas behind the o1 – if you’re looking for a good primer, Simon Willison is always reliable – users who want to look under the hood are still trying to find out more about the “think ” of o1. performance is. The information users currently display is a brief summary of each step in the chain of thought.
And because of that, they’ve been asking the model themselves how it’s coming along with the answers – even though they’ve also received emails from OpenAI asking them to stop, or their accounts will be suspended. .
All this means that we are left somewhat in the dark. This looks like a step change in the world of AI, and something that can change the tool from a product that you should look at with a side-eye of suspicion to one that should be used.
What’s even more interesting is that OpenAI’s dominance effectively ended the coverage of all recent competitors. Mistral, a well-known French competitor, released its first multimodal model last week. The Pixtral 12B model adds image recognition to text production. It should have received a lot of praise. But OpenAI and o1 absorbed all the oxygen.
However, it all means that the AI train is moving forward, and is starting to deliver on its promise. Whether those who tried ChatGPT in its early days and found it lacking can be persuaded to return to try the new whizz-bang models is another question.
TechScape is broad
#OpenAI #latest #ChatGPT #thoughts