QAZVOICE - KAZAKH-LANGUAGE VOICE ASSISTANT
Журнал: Научный журнал «Студенческий форум» выпуск №35(344)
Рубрика: Технические науки

Научный журнал «Студенческий форум» выпуск №35(344)
QAZVOICE - KAZAKH-LANGUAGE VOICE ASSISTANT
Abstract. In recent years, voice assistants have revolutionized the way humans interact with technology, enabling hands-free control and intelligent conversation with digital systems. However, most available voice assistants such as Siri, Alexa, and Google Assistant are optimized for English and Russian, leaving Kazakh-speaking users with limited access to similar technological benefits. The "QazVoice" project seeks to bridge this linguistic gap by developing a localized voice assistant that understands and responds in the Kazakh language. Through the integration of open-source libraries such as Vosk for speech recognition and Silero for text-to-speech synthesis, QazVoice demonstrates how accessible technologies can empower linguistic inclusivity. This paper explores the project’s research process, technical implementation, user feedback, and the broader implications for the digital development of the Kazakh language.
Keywords: development, Kazakh voice assistant.
1. Introduction. Voice assistants represent one of the most transformative innovations of the 21st century, offering users efficient ways to perform everyday tasks using natural speech. Yet, despite the global proliferation of such systems, the representation of minority languages in artificial intelligence remains disproportionately low. In Kazakhstan, the dominance of Russian-language technologies has hindered the growth of native-language digital ecosystems. The QazVoice project aims to address this imbalance by developing a functional Kazakh-language voice assistant. The initiative began at Nazarbayev Intellectual School in Taldykorgan, where a group of students—motivated by the lack of linguistic diversity in technology—set out to create a prototype capable of understanding and responding to spoken Kazakh commands. The project combines linguistic research, AI integration, and user experience testing to deliver a foundational model for future development.
2. Background and Objectives
The technological marginalization of the Kazakh language is a growing concern. While Kazakh content on the Internet is slowly increasing, digital tools such as speech recognition, synthesis, and language understanding systems remain scarce.
This gap not only limits accessibility but also affects cultural and educational equity.
The primary objective of QazVoice is to create an assistant that allows users to perform voice-driven tasks such as browsing, checking weather updates, and launching applications. Beyond functionality, the project aims to promote the digital use of Kazakh, encouraging users to engage with technology in their native tongue.
Key objectives include:
1. Developing a lightweight, locally executable voice assistant that supports Kazakh.
2. Integrating open-source libraries for speech recognition and synthesis.
3. Evaluating user perception and linguistic accuracy through interviews and testing.
3. Methodology and Research Process
The project followed a structured process comprising three main phases: research, development, and evaluation.
During the research phase, the team studied available voice processing tools, identifying limitations in Kazakh language support. Among potential technologies, Vosk was selected for speech recognition due to its open-source accessibility and ability to be trained on new datasets. For speech synthesis, Silero TTS was chosen for its Cyrillic support and ease of integration. Development began with creating a dataset of Kazakh commands and responses. Each command corresponded to a predefined function such as opening a browser, checking the weather, or greeting the user. The system was programmed to activate upon hearing trigger words like “Әли” or “Ғали.” Once activated, the assistant analyzed the spoken input, matched it to its command database, and generated an appropriate response in Kazakh.
To evaluate the project’s social and educational relevance, the team conducted interviews with teachers and curators at the Nazarbayev Intellectual School. Respondents emphasized the necessity of such tools for cultural preservation and language promotion in technology. They also noted that the project’s potential extends beyond utility—serving as a model for student-led innovation in linguistic technologies.
4. Technical Implementation Overview
QazVoice integrates several core components to achieve voice-based interaction. The Vosk model processes incoming audio data and converts it into text, while a machine learning classifier (Logistic Regression) maps recognized phrases
to corresponding commands. Silero TTS then synthesizes natural Kazakh speech responses. For external data retrieval, such as weather information, the system utilizes the OpenWeatherMap API, translating outputs into Kazakh via Google Translate.
The assistant is designed to function offline after initialization, ensuring user accessibility even in low-connectivity regions. It operates efficiently on local devices with an average response time of under one second following activation.
The graphical interface, created using CustomTkinter, enhances usability by providing a simple and visually appealing layout.
5. Results and Discussion. Testing demonstrated that QazVoice could accurately recognize and respond to a wide range of commands in Kazakh.
Weather inquiries, application launches, and greetings were processed with a success rate of approximately 85%. Minor errors in speech recognition were primarily caused by regional accents and microphone quality variations. Interview feedback highlighted the cultural and linguistic value of the project. Teachers appreciated the initiative’s role in encouraging young people to interact in their native language through modern tools. Respondents also suggested that further development could enable educational applications—such as vocabulary practice or translation assistance—to support language learning. The project demonstrates how small-scale initiatives can make significant contributions to digital inclusivity and linguistic diversity. QazVoice serves as both a technical prototype and a symbolic statement: that the Kazakh language deserves equal representation in artificial intelligence.
6. Challenges and Solutions Several challenges emerged during development:
1. Limited Kazakh datasets: Few public speech datasets exist for Kazakh, complicating recognition training.
To address this, the team adapted existing multilingual models and manually tested pronunciations.
2. Text-to-speech limitations: Early synthesis attempts produced robotic or unclear speech. Using Silero improved clarity and fluency.
3. Interface design: Creating a user-friendly GUI required learning new frameworks. CustomTkinter provided a practical solution.
4. Performance optimization: Early prototypes processed commands slowly. Code restructuring and local caching reduced delays.
7. Conclusion and Future Work. The QazVoice project marks an important milestone in the development of AI tools for underrepresented languages. By combining open-source technologies and linguistic insight, it demonstrates how innovation can empower cultural identity and technological inclusion. The assistant successfully bridges the gap between Kazakh speakers and digital technologies, proving that language should never be a barrier to innovation. Future improvements include expanding the assistant’s functionality to control device hardware, incorporating noise reduction using AI filters, and developing a fully independent Kazakh-language speech recognition model. In the long term, the team aim to integrate QazVoice with educational platforms.
Reference:
1. Khassanov, Y., Varol, C., Mussakhojayeva, S., Mirzakhmetov, A., & Temirbekov, N. (2021). A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline. Proceedings of the 16th Conference of the European Chapter of the ACL (EACL 2021): https://aclanthology.org/2021.eacl-main.58/
2. Mussakhojayeva, S., Khassanov, Y., & Varol, C. (2021). A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English. arXiv preprint arXiv:2108.01280: https://arxiv.org/abs/2108.01280
3. Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., & Varol, C. (2023). A Study of Speech Recognition for Kazakh Based on Unsupervised and Semi-Supervised Learning. PLOS ONE, 18(1): https://pmc.ncbi.nlm.nih.gov/articles/PMC9863384/
4. OpenWeatherMap. (2024). Weather API Documentation. Retrieved from: https://openweathermap.org/api
5. Vosk AI. (2024). Offline Speech Recognition Toolkit. Alphacephei Documentation: https://alphacephei.com/vosk/
6. Silero AI. (2023). Silero Text-to-Speech Models Documentation. GitHub Repository: https://github.com/snakers4/silero-models
7. Google Developers. (2024). Google Translate API Documentation. Retrieved from: https://cloud.google.com/translate

