., Zhang, S., Zhang, J., & Dai, L. R. (2023). Semantic VAD: Low-Latency Voice Activity Detection for Speech
Interaction.... (2025). Minmo: A multimodal large language model for seamless voice
interaction. arXiv preprint arXiv:2501. [4] Défossez... - €1572 - 2096 per month -
Voir cette offre d'emploi