OpenAI reportedly discovers new optimization method that could reduce inference costs by over 50%
Odaily Planet Daily News OpenAI's engineering team recently informed some colleagues that the company has found a new system optimization method that can reduce the inference cost of AI models by more than half. Inference cost refers to the computational resource cost consumed when the model is actually running and responding to user requests. This optimization mainly comes from improving the utilization efficiency of existing server resources, rather than relying on new computing chip investments. This progress reflects that while AI companies continue to compete for computing resources, they are also improving the efficiency of existing infrastructure through software and system-level optimizations to alleviate the pressure of rapidly growing model operating costs. (The Information)
