OpenAI reportedly discovered a new optimization method that could reduce inference costs by more than 50%
Odaily reports that OpenAI's engineering team has recently informed some colleagues that the company has found a new system optimization method capable of reducing the "inference" costs of AI models by more than half. Inference costs refer to the computational resources consumed when a model is actually running and responding to user requests. This optimization primarily stems from improved utilization efficiency of existing server resources, rather than relying on the deployment of additional computing chips. This development reflects that while AI companies are competing for computing resources, they are also improving the efficiency of existing infrastructure through software and system-level optimizations to alleviate the pressure of rapidly growing model operating costs. (The Information)
