OpenAI releases update to enable real-time reasoning across audio, vision, and text

2024-10-02 11:34

Odaily News OpenAI made four updates to its models in October, helping its AI models to have better conversations and improve image recognition. The first major update is the Realtime API, which allows developers to create AI-generated voice applications using a single prompt, enabling natural conversations similar to ChatGPT's advanced voice model. Previously developers had to "stitch together" multiple models to create these experiences. Audio inputs typically need to be fully uploaded and processed before receiving a response, which means high latency for real-time applications such as voices talking to each other. With the streaming capabilities of the Realtime API, developers can now achieve instant, natural interactions, just like voice assistants. The API runs on GPT-4, which was released in May 2024, and can reason across audio, vision, and text in real time. Another update includes fine-tuning tools for developers, allowing them to improve AI responses generated from image and text inputs. Image-based fine-tuners enable AI to better understand images, thereby enhancing visual search and object detection capabilities. The process includes feedback from humans, who provide examples of good and bad responses for training. In addition to the speech and vision updates, OpenAI also launched "model distillation" and "prompt caching," which allow smaller models to learn from larger models and reduce development costs and time by reusing processed text. According to Reuters, OpenAI expects revenue to rise to $11.6 billion next year, up from an estimated $3.7 billion in 2024. (Cointelegraph)