Calvin Wankhede / Android Authority
With competitors from Google’s Gemini and Anthropic’s Claude AI fashions heating up, OpenAI has discovered itself within the midst of an id disaster. As soon as the undisputed chief in massive language fashions (LLMs), it’s now scrambling to take care of its place on the high. New fashions like ChatGPT-4o and 4o mini have stemmed the exodus to competing AI chatbots, however OpenAI is beneath fixed strain to maintain innovating. The corporate has carried out simply that with o1-preview, a brand new AI mannequin collection that excels at complicated reasoning and emulating human thought. How good is it? I put it to the take a look at to seek out out.
What’s the new o1-preview ChatGPT mannequin all about?
OpenAI’s o1-preview and o1-mini are the newest fashions out there inside ChatGPT, designed for complicated reasoning duties and problem-solving. As their names recommend, these fashions aren’t generational successors to GPT-4 or any of OpenAI’s earlier language fashions. Actually, GPT-4o is not going to solely live on but in addition stay the default mannequin for all chats.
Not like prior fashions that responded to your prompts as shortly as attainable, the o1 collection has been designed to spend extra time considering by means of issues, just like a human’s thought course of. This naturally ensures larger accuracy in prompts associated to math and coding, however additionally it is helpful for real-world questions and situations, as I’ll showcase in my testing under.
We first heard concerning the o1 mannequin collection in July, when Reuters interviewed researchers conversant in a secretive inner venture codenamed Strawberry. The aim of the venture was to develop an AI able to performing “deep analysis,” according to the corporate’s mission to attain synthetic basic intelligence (AGI). The latter refers to an AI system that’s clever sufficient to outthink people throughout a number of topics. The Strawberry venture was rumored to reach forward of GPT-5, which continues to be being developed.
o1 is OpenAI’s newest mannequin household that may break down issues and motive like a human.
The brand new o1 collection continues to be a great distance off from attaining true AGI — OpenAI CEO Sam Altman admitted that “o1 continues to be flawed, nonetheless restricted, and it nonetheless appears extra spectacular on first use than it does after you spend extra time with it.” Nonetheless, it’s a giant leap ahead from the earliest ChatGPT launch that many believed would by no means succeed at fixing math issues or logical workout routines.
Whereas o1-preview is the most recent flagship mannequin, it’s additionally accompanied by a a lot leaner and sooner o1-mini. OpenAI discovered that the collection excels at coding, so it additionally launched a second mannequin that may precisely generate and debug code. Aimed largely at builders, o1-mini is 80% cheaper than o1-preview.
o1-preview vs GPT-4o examined: Is it actually higher?
For those who’re skeptical that o1-preview is leagues forward of prior fashions, there’s excellent news — the chatbot does pause to assume, typically upwards of a minute, earlier than responding. It breaks down complicated issues into chunks, which helps it right errors
Nonetheless, there’s additionally dangerous information — the o1 collection isn’t universally higher throughout the board. Specifically, it can’t search the web for brand spanking new info just like the older GPT-4o mannequin nor can it carry out superior information evaluation. You additionally can’t add information and pictures, which means you’ll should frontload every immediate with as a lot info and context as attainable. OpenAI even admits that many ChatGPT customers will wish to follow GPT-4o in the meanwhile.
Setting apart these caveats, although, how does it carry out? To seek out out, I posed a handful of complicated and sophisticated inquiries to each of OpenAI’s greatest fashions. Right here’s how o1-preview fared vs GPT-4o.
Immediate 1: What number of legs do I’ve?
Beginning with a simple one, I requested ChatGPT what number of legs I’d have if I had 4 cows, 3 canines, 2 cats. The reply is clearly two, which GPT-4o put forth however solely after saying I’d have 36 animal legs. In contrast, I watched the o1-preview mannequin “assume” for 5 seconds earlier than accurately (and confidently) saying I’d have two legs. It additionally acknowledged that the query was a riddle.
I additionally posed the identical query to OpenAI’s smaller GPT-4o mini mannequin and it failed miserably. It merely mentioned I’d have 38 legs, including mine to the animals’ rely.
Immediate 2: Funding return calculation, whereas accounting for foreign money depreciation
Since easy prompts solely require a couple of seconds of considering, I made a decision to take issues up a notch. On this immediate, I requested ChatGPT to seek out the higher funding between two property with differing returns and dangers. The chatbot took 11 seconds to assume earlier than it responded this time. As soon as once more, it delivered the right reply whereas explaining every step.
Apparently, GPT-4o additionally arrived on the similar conclusion nevertheless it didn’t compute the figures by itself. As an alternative, it generated the Python code essential to carry out the calculations and executed it by way of ChatGPT’s superior information evaluation characteristic. So whereas the output is identical, the complexity is larger. Coding as a workaround additionally has the potential to fail fairly spectacularly, as I might quickly discover out.
Immediate 3: Which is best, shopping for a home or renting?
For those who grasp round financially savvy of us, you’ll know that renting vs shopping for a home is a brilliant divisive subject that entails loads of variables, each monetary and in any other case. Fortunately, we are able to ask ChatGPT to do the maths for us — the o1-preview mannequin put 37 seconds’ value of thought into this query and broke it down into 12 totally different steps.
I offered a number of figures, together with my down fee quantity, rate of interest, anticipated return on funding if I rented as an alternative, and extra. This made the query much more sophisticated — ChatGPT needed to first compute the price of an $800,000 residence with a $200,000 down fee. The remaining quantity could be financed with a 20-year mortgage at 3.5% curiosity. If I rented as an alternative, I’d have the ability to make investments the whole $200,000 in an index fund and save any additional revenue after paying off the hire too.
The o1-preview mannequin responded with a 1,000-word breakdown of the issue, concluding that my internet value could be larger by roughly $716,620 after 20 years if I rented as an alternative of shopping for a house.
OpenAI’s prior GPT-4o mannequin cannot sustain with o1-preview in superior reasoning duties.
Feeding the identical immediate to GPT-4o yielded a way more disappointing end result. The mannequin tried to generate and run Python code to unravel this drawback, however failed twice earlier than succeeding on the third strive. Even then, it responded incorrectly and advised I’d lower your expenses by shopping for a house as an alternative. It solely admitted fault once I identified a discrepancy in its calculations.
Since there are much more variables that may be concerned, I additionally requested o1-preview to contemplate components like property appreciation, upkeep prices, and taxes if I purchased a house in addition to a possible 3% improve in hire payable yearly. This time, it took 142 seconds to assume earlier than responding with a believable conclusion, which I believe could be very spectacular.
Easy methods to use ChatGPT’s o1-preview and o1-mini fashions
As you will have guessed, the o1 mannequin collection requires copious quantities of computational energy. And provided that ChatGPT itself has been rumored to be unprofitable since its launch in 2022, it’s not stunning that OpenAI has locked o1-preview behind a paywall. In different phrases, you will have a ChatGPT Plus subscription to pick out the newest mannequin from the dropdown menu pictured above.
Actually, the mannequin is so costly that OpenAI has additionally positioned a tough cap of fifty messages per week on high of the $20 per 30 days paywall. When you exhaust this quota, your solely possibility is to attend or pay for a second ChatGPT Plus account. OpenAI has imposed such charge limits previously, particularly across the time GPT-4 was first launched, however this occasion is probably the most aggressive one but.
Fortunately, the overwhelming majority of ChatGPT prompts don’t profit from o1’s considering capabilities. And in case you are a programmer, the o1-mini mannequin inside ChatGPT can be rolling out to the free plan in a restricted capability.
No, you want to pay for a ChatGPT Plus subscription to make use of the o1-preview mannequin. Nonetheless, the o1-mini mannequin is offered on the free tier in a restricted capability.
All in all, ChatGPT’s new o1-preview mannequin could be very spectacular and price a glance you probably have math and programming questions. It won’t be your best option for many duties, and even the overwhelming majority of duties, nevertheless it’s the closest we now have to emulating human reasoning and thought. Nonetheless, the overwhelming majority of customers gained’t profit from o1-preview’s improved logical reasoning expertise or math capabilities so I can’t suggest switching to it full time. The weekly response restrict and missing internet looking assist additionally imply I’ll proceed utilizing GPT-4o going ahead. And for those who solely use ChatGPT a couple of instances daily, you possibly can simply get by with a free account.
Perplexity’s Professional Search characteristic additionally applied multi-step reasoning a couple of months in the past and it too delivered spectacular ends in my testing. If you want a peek at chain-of-thought AI reasoning with out paying for it, I’d suggest attempting it out because you get 5 Perplexity Professional searches each few hours on the free tier. I haven’t examined it towards OpenAI’s o1-preview head-to-head but, nevertheless it’s clear that competitors within the AI area has compelled ChatGPT to evolve and I can’t wait to see the place it’s headed subsequent.