OpenAI Launches LifeSciBench: Measuring AI Systems' Capabilities in Real Scientific Research Scenarios
Odaily Planet Daily News OpenAI has launched a new evaluation benchmark, LifeSciBench, designed to measure the capabilities of AI systems in real scientific research scenarios. It is reported that LifeSciBench is based on 750 tasks written by experts, covering 7 types of research workflows and 7 biological fields. The tasks were contributed by 173 researchers with doctoral degrees and experience in the biotechnology or pharmaceutical industries. This benchmark emphasizes the assessment of complex scientific research capabilities, including evidence synthesis, experimental design, data analysis, scientific reasoning, and scientific communication, rather than simple factual questions. Over 79% of the tasks involve multi-step reasoning, requiring an average of approximately 4 reasoning steps per question, and include 1,062 real scientific research-related data attachments (such as papers, charts, sequence data, and structure files).
