This week OpenAI announced their newest version of ChatGPT: ChatGPT-4o. Our Founder, Dr Sarah Mitchell, shares her take on the model including an overview of its enhanced vision and voice capabilities, limitations, and how it compares to previous versions of ChatGPT.
ChatGPT-4o has arrived. It's the newest evolutionary leap in OpenAI's artificial intelligence technology. This model offers advanced intelligence, with enhanced conversational, vision, and text capabilities. Essentially, GPT-4o can talk and see, enabling a breadth of new possible use cases and applications. Let's dive in.
1. The Meaning Behind the Name
If you're curious about the name, the "o" in GPT-4o stands for - "omni". This is because GPT-4o is a truly multimodal AI, which means it can accept and respond using any combination of text, audio, image, and video.
This is a big deal in the AI world.
To achieve this, OpenAI trained a single new model end-to-end across text, vision, and audio. Compare that to previous versions of ChatGPT that use different models for text, image generation, and other tasks. A single model greatly enhances the tool's speed and capability, meaning you can do more things, more quickly with this latest version of ChatGPT.
2. We're Living in a Movie
If you've seen the 2013 movie HER, you'll find Monday's OpenAI preview demos are reminiscent of the AI assistant Samantha, voiced by Scarlett Johansson. It is fascinating, if a bit unnerving at times. The conversational tone of the GPT-4o sounds much like a real person. To give you an idea, check out this video of Rocky asking for help with interview prep.
The model can respond with pauses, changes in inflection, laughter, whispering, singing, and even sarcasm. GPT-4o can adapt the response tone and style at your request and will pause when you interrupt it. It can understand your own tone of voice and infer emotions.
With GPT-4o, voice conversations can take place in near real time. The model responds to audio inputs in roughly 320 milliseconds, which is similar to a human response, and significantly better than the 2-4 second lag of previous models. This means voice conversations with GPT-4o will flow more naturally, mimicking human conversation.
It's important to note that this technology is in a constant state of development. It is not perfect. GPT-4o can get things wrong, it may misinterpret instructions, and it could take several prompts for it to understand what you want. But used carefully, GPT-4o is a very powerful tool.
3. More Languages
OpenAI has dedicated effort to ensuring worldwide accessibility to their models, and that includes advanced language capability. GPT-4o supports more than 50 languages and near real time accurate translations. Think of the possibilities, not only in terms of language translation but also language education. Interested in a personalised French tutor? ChatGPT-4o can help with that.
4. Images with Words
The model also has improved image capability. Not only can it better understand and correctly interpret images you share with it, but GPT-4o can also accurately create images that contain text. Take a look at the screenshot below from OpenAI's press release. See how the text within the prompt appears in the generated image?
This wasn't possible with previous model generations. The words would show up jumbled or nonsensical. GPT-4o's ability to create images that include accurate text could be extremely helpful for graphic designers, content creators, and marketers alike.
5. GPT-4o Can See
Another interesting feature is GPT-4o's vision on desktop. Essentially you can share your screen with GPT-4o so it can "see" what you see, enabling a conversation with the AI about what's on your screen. Now, there are important questions around privacy and security that would need to be addressed, but the vision capability could easily transform the way we work and learn.
Take a look at this video of Sal Kahn, the founder of the Kahn Academy, and his son Imran who tested out the vision and voice capability in OpenAI's GPT-4o demo. The AI took on the role of a math tutor for Imran. Amongst a plethora of possible use cases, this technology will pave the way for a future of personalised tutors and education assistants, whether that be for school, university, or even the workplace.
My favourite GPT-4o vision demo was where the AI acted as sight for people who are vision impaired. Check out the video here. AI that can 'see' could provide accessible and useful assistance to those who need support to view the world around them.
6. Best of All - It's Free
One of the biggest wins for consumers is that OpenAI is making GPT-4o free to use. OpenAI are staying true to their mission of making advanced AI tools available to everyone. There will be message caps for the free version, which means ChatGPT will revert to GPT-3.5 once you've reached the message limit. Nonetheless, this is a massive leap in capability for those who have been using GPT-3.5 up to this point.
When using the free version of GPT-4o, you'll have:
GPT-4 level intelligence. This is a significant step up from the previous free model, GPT-3.5, in terms of speed and quality of responses.
Access to the web. Model responses can reference timely information such as current news or events, or you can even ask it to review a specific website in your prompt.
File upload. You can ask ChatGPT to review, summarise, or help you edit content in files that you upload. Note: make sure you don't include any sensitive information (e.g. names, financial, or health data) in the file as it may be used to train the model.
The use of custom GPTs. A custom GPT is essentially a version of ChatGPT that has been tailored by users for specific applications, such as creating recipes, learning gym exercises, or even turning photos of your dog into a cartoon.
The use of memory. You can ask ChatGPT to 'remember' facts, concepts, or instructions that you want to persist across conversations, such as your preferred response style or tone. You can even ask it to exclude specific words in responses, such as "unlock the power..." or "unleash the potential...", two catchphrases commonly generated by earlier ChatGPT models.
7. Benefits for Paid Users
ChatGPT Plus users will have message limits that are 5x greater than free users, while Team and Enterprise users will have even higher limits. If you're using GPT-4o via the API you'll find it is twice as fast, half the price, and has 5x higher rate limits to GPT-4 Turbo. Not all of GPT-4o's capabilities are available just yet. Voice and vision will be rolled out iteratively, including a new version of Voice Mode to ChatGPT Plus users in the coming weeks.
8. Beware the Risks
It's important to note that this technology is still fallible. I asked GPT-4o to summarise the OpenAI press releases and it hallucinated twice, which means it completely made up some facts and quotes, despite me specifically prompting GPT-4o to only include factual information from specific sources. GPT-4o identified that it had indeed constructed these 'facts', but only when I questioned it directly.
This means that even the latest and greatest AI technology will still pose risks, and GPT-4o will not eliminate the need for human oversight. I can't emphasise enough how important it is to layer your knowledge and reasoning on top of AI-generated outputs.
Wrapping Up
In summary, GPT-4o has some very interesting new features, particularly the soon-to-be-released vision and audio capability. GPT-4o is not perfect, and it is important to be aware of the inherent risks of using these tools, such as hallucinations. With that in mind, this new development of GPT-4o from OpenAI will open the doors for more natural, human-like conversations with AI. Watch this space.
Check out more info from OpenAI below:
Are you excited about the potential of GPT-4o but unsure how to best integrate it into your business operations or workflows? Here at Anadyne IQ, we're a no-nonsense AI consultancy that specialises in helping businesses leverage the latest AI advancements, from workshops to build AI literacy to working with advanced AI tools.
Contact us today to see how we can help you harness the power of GPT-4o to drive innovation and efficiency.