Tether releases locally runnable medical AI model QVAC MedPsy

2026-05-07 12:03

Odaily Odaily Tether AI Research Group has released a new generation of medical AI model QVAC MedPsy, which is designed to run locally on low-power hardware such as smartphones and wearable devices, without relying on cloud servers. Simultaneously, it outperforms several larger state-of-the-art (SOTA) models across multiple medical benchmarks.

Official data shows that the 1.7 billion parameter version of QVAC MedPsy achieved an average score of 62.62 on seven closed-set medical benchmarks, surpassing Google's MedGemma-1.5-4B-it by 11.42 points, despite having fewer than half the parameters. In real-world clinical tests like HealthBench Hard, this model even outperformed MedGemma 27B, which has nearly 16 times more parameters.

Furthermore, the 4 billion parameter version achieved an average score of 70.54, surpassing several large models nearly 7 times its size in multiple medical reasoning evaluations. Tether stated that the model achieves "small model, high performance" through post-training medical reasoning optimization, reinforcement learning, and training on high-quality medical data.

Compared to traditional cloud-based AI architectures, QVAC MedPsy also significantly reduces inference costs. Its 4 billion parameter version generates approximately 909 tokens on average, far fewer than the 2953 tokens of comparable systems, enabling lower latency and lower computational costs. The model also offers a GGUF quantized version suitable for local deployment on mobile and edge devices.

Paolo Ardoino stated that the core goal of this model is to improve model efficiency, rather than simply increasing parameter scale, enabling medical AI to run directly on hospital local systems or terminal devices, thus avoiding uploading sensitive medical data to the cloud.