OpenAI Releases LifeSciBench: Measuring AI System Capabilities in Real-World Scientific Research Scenarios
Odaily, OpenAI has released a new evaluation benchmark, LifeSciBench, designed to measure the capabilities of AI systems in real-world scientific research scenarios. It is reported that LifeSciBench is based on 750 expert-crafted tasks, covering 7 types of scientific research workflows and 7 fields of biology. The tasks were contributed by 173 researchers with PhD backgrounds and experience in the biotech or pharmaceutical industry. This benchmark emphasizes the assessment of complex scientific research capabilities, including evidence integration, experimental design, data analysis, scientific reasoning, and scientific communication, rather than single factual questions. Over 79% of the tasks involve multi-step reasoning, with each question requiring an average of about 4 reasoning steps, and includes 1,062 real-world scientific research data attachments (such as papers, charts, sequence data, and structure files).
