Transform ChatGPT into a Voice-Activated Chatbot Using 60 Lines
Written on
Introduction to Voice-Enabled Chatbots
I recently stumbled upon an article detailing a project that integrated Zapier with Alexa to utilize ChatGPT. While innovative, I realized a third-party platform wasn’t necessary for voice interactions with ChatGPT; Chrome offers everything needed natively.
Reflecting on my experience in 2017 with a voice-controlled product, I became quite familiar with the web speech recognition and synthesis APIs. By leveraging these technologies, along with some clever adaptations to interact with ChatGPT’s interface, I was able to enable voice communication with the AI.
In this guide, I'll show you how to transform ChatGPT into a voice-responsive chatbot using merely 60 lines of code. The good news is that anyone can easily replicate the script.
Steps to Create Your Chatbot
To construct this voice-enabled chatbot, follow these three straightforward steps:
- Launch the ChatGPT website in your Chrome browser.
- Access the developer console by hitting Ctrl+Shift+I or by right-clicking on the page and selecting "Inspect." Then, navigate to the “Console” tab.
- Paste the JavaScript code provided and start your conversation! You can converse naturally without needing specific keywords.
The JavaScript Code
First, you need to create a new instance of SpeechRecognition with your desired settings. You can choose any language you prefer, and for more information on this API, check here.
const SpeechRecognition = window.SpeechRecognition || webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.lang = "en-US";
recognition.continuous = true;
recognition.maxAlternatives = 1;
recognition.interimResults = false;
The ChatGPT interface lacks descriptive class names or element IDs, so we will use selectors to identify the input field and the submit button.
const formTextarea = document.querySelector("main form textarea");
const formSubmit = document.querySelector("main form button");
To manage the conversation flow, we will define a couple of global variables. The isSpeaking boolean ensures that the recognition doesn’t transcribe audio that the AI is generating.
let isSpeaking = false;
let intervalRcg, intervalUtr;
When you speak into the microphone, the recognition.onresult function is triggered. It stops the recognition process, and the recognized speech gets filled into the ChatGPT input and submitted.
recognition.onresult = (event) => {
recognition.stop();
fillAndsubmitForm(event);
setTimeout(pollResultStatus, 1000);
setTimeout(startVoiceSynth, 1000);
};
The fillAndsubmitForm() function populates the ChatGPT input with the transcribed speech and submits it.
function fillAndsubmitForm(event) {
const result = event.results[0][0].transcript;
if (result == "stop") return;
formTextarea.value = result;
formSubmit.click();
}
The onresult callback sets timeouts to execute two functions after a second: pollResultStatus() and startVoiceSynth().
As ChatGPT disables the submit button during its responses, we can frequently check its status to restart the recognition process.
function pollResultStatus() {
intervalRcg = setInterval(() => {
if (formSubmit.disabled || isSpeaking) return;
recognition.start();
clearInterval(intervalRcg);
}, 500);
}
The startVoiceSynth() function converts the AI’s responses into audible speech using the speechSynthesis API.
When invoked, it sets the isSpeaking variable to true, indicating the bot is speaking. An inner function, sayResult(), vocalizes the latest response.
function startVoiceSynth() {
isSpeaking = true;
function sayResult() {
if (this.innerText === this.spokenText) {
clearInterval(intervalUtr);
isSpeaking = false;
return;
}
speechSynthesis.speak(
new SpeechSynthesisUtterance(
this.innerText.slice((this.spokenText || "").length))
);
this.spokenText = this.innerText;
}
intervalUtr = setInterval(
sayResult.bind(document.querySelector(".result-streaming")),
500
);
}
It would be less engaging if the entire response had to be received before it began to speak. The sayResult() function compares spoken words to the complete response, allowing it to synthesize only the new words.
Finally, don’t forget to start the recognition process:
recognition.start();
Conclusion
Once you copy and paste the above code into the console, the script will begin to listen to your voice and transcribe your speech in real-time. When you finish speaking, it will send your transcription to ChatGPT and await a response.
In summary, transforming ChatGPT into a voice-enabled chatbot is quite straightforward using the speech recognition and synthesis APIs built into Chrome. The script presented here is a basic framework that can be expanded for more advanced applications.
You can find the complete script on my blog for easy copying and pasting. Check it out here: walkthrough.ai/voice-enable-chat-gpt.
Enjoy your experience!
This video tutorial demonstrates how to create a voice assistant using ChatGPT in just eight minutes with Python.
In this video, learn how I transformed ChatGPT into a voice-activated AI assistant, showcasing its potential for interactive conversations.