OpenAI GPT-4 Omni model can interpret audio, video, and text in real time

The latest iteration of ChatGPT promises to be the most advanced one yet.

OpenAI
46

OpenAI has issued an update for its ChatGPT bot. The GPT-4o update promises greater ease of use for all users, as well as increased speed across the board.

"GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs," reads the OpenAI website. "It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models."

OpenAI technology chief Mira Murati spoke during a livestream on Monday about the latest ChatGPT additions. She demonstrated some of its capabilities, including some new translation features. With the latest update, ChatGPT can now operate across 50 different languages.

As noted by CNBC, Murati made sure to thank NVIDIA CEO Jensen Huang for helping power OpenAI's technology. NVIDIA has a significant amount of money invested in the AI sector, which has helped power that company to better-than-expected earnings.

Senior Editor

Ozzie has been playing video games since picking up his first NES controller at age 5. He has been into games ever since, only briefly stepping away during his college years. But he was pulled back in after spending years in QA circles for both THQ and Activision, mostly spending time helping to push forward the Guitar Hero series at its peak. Ozzie has become a big fan of platformers, puzzle games, shooters, and RPGs, just to name a few genres, but he’s also a huge sucker for anything with a good, compelling narrative behind it. Because what are video games if you can't enjoy a good story with a fresh Cherry Coke?

Filed Under
From The Chatty
  • reply
    May 13, 2024 11:35 AM

    Ozzie Mejia posted a new article, OpenAI GPT-4 Omni model can interpret audio, video, and text in real time

    • reply
      May 13, 2024 10:33 AM

      https://openai.com/index/spring-update/

      No OpenAI thread?

      "Any sufficiently advanced technology is indistinguishable from magic"

      Yep, this ChatGPT demo.


      https://x.com/BenBajarin/status/1790070846473523390

      OpenAI announces Her.

      • reply
        May 13, 2024 10:38 AM

        What am I missing here…?

      • reply
        May 13, 2024 10:47 AM

        is there a demo or anything? OP is just same text as this post.

        • reply
          May 13, 2024 10:58 AM

          About 9m30s in is where it starts to get... Wild.

        • reply
          May 13, 2024 11:12 AM

          this is both amazing and also annoying as fuck when it pretends to be human by sighing when responding to rapidly changing instructions, or claiming "i got too excited" when it did something wrong and is then corrected

          the future of people falling in love with this is shit is closer than ever

          • reply
            May 13, 2024 11:16 AM

            [deleted]

            • reply
              May 13, 2024 11:17 AM

              oh no, global thermonuclear war is the worst thing that will emerge from this, the condescending dialog model that talks to you like you're in kindergarten is just background noise

              • reply
                May 14, 2024 5:48 AM

                Pfffftt. You WISH a nice clean fission blast and relatively quick radiation death was the worst thing.

            • reply
              May 13, 2024 12:33 PM

              Honestly, I'm very hopeful for that future. I can see a few situations where this is useful:

              - People who didn't have positive socialization skills growing up or never learned how to connect safely.
              - Poly/Mono couples.
              - Recovery from an abusive relationship.
              - People who just don't want to have a relationship with another human.

              Like all things outside assistance will be needed, but this bridges the gap between "I need someone to confide in" and "I can't face a therapist / cohort directly."

          • reply
            May 13, 2024 11:17 AM

            Yeah I already disliked seeing how chatgpt would add all kinds of bullshit fluff around it’s apologies and mistakes but not actually, say, change approaches

            (I’d get it in a loop where it would give two wrong answers and apologize and give the other one, and repeat infinitely)

          • reply
            May 13, 2024 11:51 AM

            you can obviously just ask it to have a completely flat affect and speak in a robot voice if you want an autistic computer assistant but they wanted to show off how much more massively human it's capable of sounding now which is useful for all kinds of stuff

            • reply
              May 13, 2024 12:18 PM

              Uh. That's... Not a great usage of the word autistic there, bud.

            • reply
              May 13, 2024 12:24 PM

              Yeah I get that it’s great for demos but it’s actively awful for real-world use. It doesn’t need to be monotone but it also doesn’t need to use 20 words when 3 will do. Time is money, and that shit is annoying.

        • reply
          May 13, 2024 11:22 AM

          That was a FAKE DEMO. The dialog was overlapping and she got ahead of the announcer!!!!!!!

        • reply
          May 13, 2024 11:45 AM

          I like that they always have to interrupt it because it just babbles on and on and on

      • reply
        May 13, 2024 11:18 AM

        Yep, we're not far from Her. This is so wild and awesome.

        • reply
          May 13, 2024 11:37 AM

          I'm unclear whether this is going to hasten the erosion of communication between people online, or help the loneliness problem with an always on friend that can do instant research & feedback for you.

          This thing is a few generations away from ushering us in to the post-people era where ya'll are obsolete and my assistant loves me more than my parents did.

          • reply
            May 13, 2024 11:48 AM

            Oh I definitely agree, I just try to take a neutral approach to it

            • reply
              May 13, 2024 11:59 AM

              I'm concerned because we saw the floodgates open with social media with barely any consideration towards the mental well-being of it users.

              It sounds like the ol' move fast & break things is back on the menu which bothers me.

              Her is totally a great story about giving up on people and leaning in to a digital companion.

              • reply
                May 13, 2024 12:11 PM

                Oh no, I have serious concerns, I've just decided there's not much I can personally do about it so fuck it

          • reply
            May 13, 2024 11:51 AM

            I think cellphonea have done or are doing that more than chat gpt. My kids, to my parents, all just pull out their phones whenever at a family gathering to browse shit. There’s no social norms anymore it’s fine to just completely zone out of wherever you are as a person. When it becomes just glasses or eye contacts people will just drop out even more

        • reply
          May 13, 2024 11:38 AM

          I'm looking forward to the point when the tech is mature enough to be a full-on digital assistant. The Voice conversations with ChatGPT are already remarkably helpful for me when it comes to brainstorming and organization, but if I could tie it into my calendar, all my documents, my home automation stuff, etc., that'd be awesome.

          • reply
            May 13, 2024 12:15 PM

            I think the concern is people wanting to have a parasocial relationship with it which will be the exact opposite of helpful. That is undoubtedly where this thing is heading.

        • reply
          May 13, 2024 12:12 PM

          [deleted]

      • reply
        May 13, 2024 11:33 AM

        [deleted]

      • reply
        May 13, 2024 11:38 AM

        ...her?

        • reply
          May 13, 2024 11:47 AM

          Probably a reference to the movie with the same name - about a female AI that the main character fell in love with. Great movie.

        • reply
          May 13, 2024 11:51 AM

          I understood that reference. I would say that often when people would mention the movie, and it would drive them crazy.

          • reply
            May 13, 2024 11:52 AM

            It is a reference as Ann as the nose on Plain's face

          • reply
            May 13, 2024 12:39 PM

            Yeah but correct me if I'm wrong but doesn't joaquin choose physical connection over digital because the AI runs off with the other AI or something?

            I mean I thought it was an allegory about how real life connection is superior because you can understand nuance and emotion better from a real person rather than a disembodied voice.

        • reply
          May 13, 2024 11:57 AM

          check out whos on the hog in the rearview mirror

        • reply
          May 13, 2024 12:10 PM

          is she funny or something?

        • reply
          May 13, 2024 5:06 PM

          They can call it Cortana instead.

      • reply
        May 13, 2024 12:07 PM

        [deleted]

      • reply
        May 13, 2024 12:11 PM

        [deleted]

        • reply
          May 13, 2024 12:32 PM

          That’s the impression I was getting from watching this

        • reply
          May 13, 2024 12:35 PM

          Feel like 1/3 of my prompts to ChatGPT are "please be more concise/less verbose"

        • reply
          May 13, 2024 12:36 PM

          You can coax it into the type of responses you need via prompts and memory. Tell it you're rarely interested in verbose responses and prefer to the point and will ask when you need more assistance.

        • reply
          May 13, 2024 12:54 PM

          they haven't actually released the new voice mode, they're afraid to.

          "In the future, improvements will allow for more natural, real-time voice conversation and the ability to converse with ChatGPT via real-time video. For example, you could show ChatGPT a live sports game and ask it to explain the rules to you. We plan to launch a new Voice Mode with these new capabilities in an alpha in the coming weeks, with early access for Plus users as we roll out more broadly"

          "Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities."

          • reply
            May 13, 2024 12:55 PM

            [deleted]

          • reply
            May 13, 2024 1:26 PM

            Thanks for the explanation because I was about to blow $20 to see if that demo was real or not. I'm cautiously optimistic about the future of this.

      • reply
        May 13, 2024 12:18 PM

        [deleted]

      • reply
        May 13, 2024 1:20 PM

        maybe google will show something similar tomorrow, might be why this presentation was thrown together and released today but they haven't actually released the functionality... maybe they don't have a good way to stop legions of halfwits from falling in love with it, or using it to make zany tiktoks that will cause reputational harm (*starts camera* "hey chatgpt sing me a song about deez nuts")

        • reply
          May 13, 2024 2:04 PM

          God I hope so.

        • reply
          May 13, 2024 2:07 PM

          Trying to remember the last time Google was ahead of anyone on anything and not just being a complete clusterfuck

          • reply
            May 13, 2024 2:24 PM

            I can't remember

          • reply
            May 13, 2024 5:09 PM

            Gmail

          • reply
            May 13, 2024 5:14 PM

            2004. Search.

          • reply
            May 13, 2024 5:25 PM

            [deleted]

          • reply
            May 13, 2024 6:48 PM

            Google Photos. Still the best out there, even after they fucked up the desktop client by removing 2-way sync.

          • reply
            May 13, 2024 6:51 PM

            You're thinking of their public products and I get where you're at with that.

            If you look at the tech ology built at Google, they have been instrumental in moving internet technology forward. For example, they are the ones who published the paper that lead to all this gen ai craziness

          • reply
            May 14, 2024 5:12 AM

            Gmail notifier was next level

        • reply
          May 13, 2024 3:11 PM

          [deleted]

        • reply
          May 13, 2024 3:14 PM

          releasing in a few weeks after an unveil isn’t some weird schedule and is in line with the usual rollout speed when they announce new stuff (excluding SoRa)

          • reply
            May 13, 2024 3:17 PM

            they might also be waiting til after wwdc if there actually is a major partnership coming down the pipe

            but regardless it’d be totally sane to release it slowly just to get a better handle on what bad actors will do with it, especially api access

            • reply
              May 13, 2024 3:20 PM

              It’s pretty rare for WWDC to have anything for broad release. Any potential iOS integration surely does not become available until the fall with the new iPhone launch.

              • reply
                May 13, 2024 4:03 PM

                i just meant something like apple might want to demo ‘smart siri’ or some other upcoming integration “powered by openai” without a million youtube chuds having posted 3 weeks of videos begging chatgpt to marry them

                • reply
                  May 13, 2024 4:48 PM

                  Adds more color to stories of frustration on the Apple Vision Pro team that Siri is so useless as a tool or interface for the device. Likewise can see why Zuck is so bearish on the Meta glasses given where they are on AI research

        • reply
          May 13, 2024 5:10 PM

          This will be bard tomorrow

          https://youtu.be/ksHw6ybWJHg?si=RaQGay9l8KOa8M9k

    • reply
      May 13, 2024 5:01 PM

      I had it read my palms.

    • reply
      June 1, 2024 12:32 AM

      ChatGPT är en fantastisk teknik
      Du kan använda den på: https://chatgptsv.se/

Hello, Meet Lola