Lenny.is
← All posts

An Idea

Building a eyes-closed AI assistant on a Raspberry Pi with Go, TTS, and OpenAI


So I have an idea.

Often times when I am typing or really thinking, I need to close my eyes.

Maybe it’s that there is too much visual stimulation if I really need to lock in, but also I have trouble visualizing things when I am looking at something. So if I need to visualize or walk through a process in my head, or even a script, I actually prefer to do it with my eyes closed.

It just lets me lock in and have an uninterrupted train of thought.

At the same time I use my eyes a lot. I work on the computer, run my nonprofit on the computer, play games on the computer, etc.

I have recently taken a lot of enjoyment to listening to podcasts with my eyes closed, or while not using the computer at all.

And that’s the idea.

To basically connect a computer-use-capable AI into a Raspberry Pi in my pocket. Interact with it via keyboard and be able to do a bunch of tasks that way with my eyes closed.

Now while I can’t do everything that way, I can:

  • Plan my tasks
  • Think through projects
  • Read articles through TTS

While I probably could build this very fast in another language using AI, I want to make this with Go.

Getting Started — Text to Speech

Let’s get started with the basics. I need to be able to hear the AI.

Initially I thought I would just stream the AI audio from OpenAI’s voice capable models, but that gets costly.

So for now I am using this free local TTS process called hugo.

package main

import (
	htgotts "github.com/hegedustibor/htgo-tts"
	"github.com/hegedustibor/htgo-tts/voices"
)

func main() {
	speech := htgotts.Speech{Folder: "audio", Language: voices.English}
	speech.Speak("Hii wonder if this works")
}

Was all I had to do to get it to talk.

Taking Keyboard Input

Next I need to take keyboard input.

Apparently the built in Go input scanner with fmt will stop as soon as there is a space. But since we want to read full commands, we need to stop reading when there is a new line. This now takes an input from the user and will read it out loud.

package main

import (
	"bufio"
	"fmt"
	"os"
	"strings"

	htgotts "github.com/hegedustibor/htgo-tts"
	"github.com/hegedustibor/htgo-tts/handlers"
	"github.com/hegedustibor/htgo-tts/voices"
)

func main() {
	reader := bufio.NewReader(os.Stdin)

	speech := htgotts.Speech{Folder: "audio", Language: voices.English, Handler: &handlers.Native{}}
	fmt.Println("Initalized Speech Engine")
	for {
		text, _ := reader.ReadString('\n')

		text = strings.ReplaceAll(text, "\n", "")
		speech.Speak(text)
	}
}

Wiring in an AI

Now I want to wire in an AI. We will keep it simple for now, where I write the prompt, and the program speaks back the AI response.

Thankfully OpenAI has a pretty straightforward Go library that was a breeze to set up. The only issue I noticed is that some characters are probably breaking speech.Speak like numbers, %#@! etc. So that is something I will need to parse down the line.

But look at that, now we have an AI assistant in our headphones that I can type and make mistakes and it will still understand! This is pretty simple but hopefully will come together to be really useful.

Not today, but next I will need to make a server that will host the AI agent that will have various skills and abilities and interface over it with an API.

package main

import (
	"bufio"
	"context"
	"fmt"
	"log"
	"os"
	"strings"

	htgotts "github.com/hegedustibor/htgo-tts"
	"github.com/hegedustibor/htgo-tts/handlers"
	"github.com/hegedustibor/htgo-tts/voices"
	"github.com/joho/godotenv"
	"github.com/openai/openai-go/v3"
)

func main() {
	err := godotenv.Load()
	if err != nil {
		log.Fatal("Error loading .env file")
	}

	reader := bufio.NewReader(os.Stdin)
	client := openai.NewClient()

	speech := htgotts.Speech{Folder: "audio", Language: voices.English, Handler: &handlers.Native{}}
	fmt.Println("Initalized Speech Engine")
	for {
		text, _ := reader.ReadString('\n')

		text = strings.ReplaceAll(text, "\n", "")

		chatCompletion, err := client.Chat.Completions.New(context.TODO(), openai.ChatCompletionNewParams{
			Messages: []openai.ChatCompletionMessageParamUnion{
				openai.DeveloperMessage("You are a voice assistant. Your responses will be read aloud via text-to-speech. Rules: Use only plain, speakable text. No markdown, bullet points, numbered lists, code blocks, URLs, or special characters. Spell out abbreviations and acronyms. Keep responses concise and conversational. Use natural pauses by breaking ideas into short sentences. Never reference visual formatting like see below or as shown above."),
				openai.UserMessage(text),
			},
			Model: openai.ChatModelGPT5_2,
		})
		if err != nil {
			panic(err)
		}
		fmt.Printf("AI Response: %s\n", chatCompletion.Choices[0].Message.Content)
		speech.Speak(chatCompletion.Choices[0].Message.Content)
	}
}