OpenAI Releases LifeSciBench: Measuring AI Systems' Capabilities in Real-World Scientific Research Scenarios
Odaily Planet Daily News OpenAI has released a new evaluation benchmark, LifeSciBench, designed to measure the capabilities of AI systems in real-world scientific research scenarios. Reportedly, LifeSciBench is based on 750 expert-crafted tasks, covering 7 types of scientific research workflows and 7 biological domains. The tasks were contributed by 173 researchers with PhD backgrounds and experience in the biotech or pharmaceutical industries. This benchmark emphasizes the assessment of complex scientific research capabilities, including evidence integration, experimental design, data analysis, scientific reasoning, and scientific communication, rather than single factual questions. Over 79% of the tasks involve multi-step reasoning, requiring an average of approximately 4 reasoning steps per question, and include 1,062 real-world research-related data attachments (such as papers, charts, sequence data, and structural files).
