Building My First Google Sheets Add-on

The unexpected challenges of building, testing, and publishing a Google Sheets add-on

May 27, 2025

Introduction

As part of my latest side-project release, I built and published my first Google Sheets add-on. I wanted to solve a problem I faced while evaluating AI prompts and answers at scale. At first, I thought it would be a quick coding project that would take a few hours. But as I went deeper, I realized the real work wasn’t in writing the code. It was in testing, debugging, deploying, and publishing it so that others could use it too.

This post is a look at that process. It’s about how “vibe coding” makes it easier to get started, but taking that code and making it work for others is still complicated. Here’s what happened along the way.

The Motivation Behind the Add-on

The motivation for building this add-on came from my work at Port. I needed to help the team move faster when adding AI features to our site. Specifically, I had to build a prompt for an AI agent and wanted to add example questions and answers for it (evals).

I did this manually - first writing the prompt, then listing expected questions and ideal answers, and finally running each through ChatGPT to see how it responded. It was a lot of manual work and didn’t scale when I wanted to test more than five questions. I found myself constantly updating the prompt, checking if the changes worked, and repeating the process over and over.

I wanted a way to automate this testing and make it easier to understand how well the prompt worked across a bigger set of questions. That was the starting point for this add-on.

What the Product Does

The add-on gives you a template in Google Sheets where you can define a prompt, a set of questions, and the expected answers. With a few clicks, it connects to an LLM that acts as a judge, running these questions through the prompt and returning the answers. It then generates a report showing the accuracy and how well the prompt performed.

The idea is to take what I was doing manually (adding questions, running them through ChatGPT, checking if the answers made sense) and make it scalable and repeatable. Now, instead of testing five questions by hand, I can test 30 or more at once and see a clear report of how each answer compares to what I expected.

Key Challenges and Hurdles

The first challenge was figuring out where to build this add-on. I wasn’t sure if Google Sheets was the right interface or if it should be something completely separate, like a custom app. I knew I needed something that felt like a database but was also flexible and easy to use.

The second challenge was all the backend work. Once the code was running for me, I had to make it work for others. This meant setting up a Google Cloud Project and dealing with all the permissions and APIs that come with Google Workspace add-ons. It wasn’t straightforward, and there was a lot of trial and error.

One thing that added friction was the lack of a native interface of Apps Script (the Sheets code platform) with Claude and Cursor. I had to do a lot of copying and pasting between tools, which was clunky and slowed things down.

I also ran into an error because I was logged into multiple Google accounts at the same time. It took a few deep research and visiting Stack Overflow to figure out this was a common problem and how to work around it.

Finally, getting it published in the Google Marketplace was more complicated than I expected. Google’s review process had its own limitations and took more time than I thought (2 weeks). At each step (building, testing, publishing), I ran into new issues that required me to dig deeper and understand how everything really works.

Exploring Existing Solutions

Before deciding to build my own add-on, I looked for existing solutions to see if they could do what I needed. I searched the Google Workspace Marketplace for AI extensions and found a few options, like Claude for Sheets and GPT for Sheets. Some of them had a formula-based approach, while others were more template-driven.

I installed a few of these add-ons to see how they worked. Some offered basic integrations, but none of them fit my use case without a lot of extra manual work. Some were paid, which I didn’t want to commit to for a quick experiment.

The fact that these add-ons existed showed me that the technology was there; it just didn’t fully solve the problem I had. That gave me more confidence that I could build something better suited to my needs.

Deciding on Google Sheets

I decided to build the add-on in Google Sheets because that’s where I was already doing this work. I was writing questions, expected answers, and running the tests manually in a spreadsheet. Google Sheets was the natural place for it as it’s a lightweight database and was already part of my workflow.

Another reason was that I had some experience with Apps Script, which made it easier to add custom functions and logic. I knew I could connect it to external APIs and build the backend around it.

After testing existing add-ons and seeing they were mostly built on top of Google Sheets, I felt confident that Sheets could be the right platform for my own add-on, too.

Defining the Product’s Scope

At this point, I had a clear picture of what I wanted the add-on to do. I’d say about 70-80% of the scope was already defined in my head. I needed to let users add a prompt, list questions, and expected answers, and then run them through an LLM to see how well the prompt worked.

It also needed to create a report that showed how accurate the responses were, so I could see if prompt changes actually improved the results. I didn’t want it to be complicated and instead be just a few simple steps to go from a rough prompt to a clear evaluation.

From there, the plan was to iterate: build something simple, test it myself, and then see what else was needed to make it work for others.

The Development Timeline

The timeline for building this add-on was pretty straightforward, though not without its hiccups. I started by exploring if there were any similar add-ons already in the Marketplace. Once I confirmed that nothing fit my needs, I started working on a rough prototype using Google Apps Script.

I worked with Claude to help write and refine the initial code. After that, I iterated on the structure and logic until I had something that worked for me. At that point, it was about 70-80% done in terms of the core logic.

From there, the focus shifted to getting it ready for others to use. This meant setting up the Google Cloud project, configuring APIs, and dealing with all the backend details that aren’t obvious at first. I also did some real-world testing with additional accounts to make sure everything worked as expected.

The final push was preparing it for publication in the Marketplace. That part took the most time, mainly because of Google’s review process and some unexpected requirements that came up from it.

Testing and Debugging

Once I had the core logic in place, I started testing it with more questions and different prompts to see how well it worked. Running tests manually was easy at first, but it got more complicated as I added more questions and tried different LLMs.

A big part of testing was checking how the LLM handled the questions and how close the answers were to what I expected. I found a few bugs right away: some answers weren’t being parsed correctly, and some of the logic in comparing answers didn’t work the way I wanted.

Debugging was about going step by step and making sure each part did exactly what it was supposed to. Sometimes that meant adding new functions in Apps Script, and sometimes it meant tweaking the API calls to make sure they were giving back the data in the right format.

Even small changes in the prompt or in how the LLM was called made a difference in the results. So I spent a lot of time going back and forth, changing one thing at a time to see what worked best.

Besides the prompt, I also fixed issues around rate limiting, clearer error handling, and made the user interface more consistent and appealing.

Publishing the Add-on

After testing and fixing bugs, the next step was to publish the add-on in the Google Workspace Marketplace. I thought it would be a quick step; just filling in some forms and clicking “Publish.” But it turned out to be more complicated than expected.

First, I had to learn how Google Cloud actually works. That meant spending over an hour researching how to set it up, including watching three different YouTube videos to make sure I didn’t miss anything. There were settings in the Google CLOUD that didn’t have clear documentation, and I spent time figuring out how to make the add-on accessible to others without giving away too many permissions.

Then, I had to record a demo and fill out all kinds of details I hadn’t thought about before. The first feedback from Google was about the scopes I was using and instructions to avoid certain scopes to get approved. It took three rounds of submissions before I got a link to a much simpler way of dealing with permissions. None of the LLMs I was using had been able to figure that part out.

As part of publishing, I also needed a landing page for the add-on. I built that using Bolt, but deploying it to my domain through Netlify wasn’t straightforward. It required a more complicated DNS setup that took some extra time to figure out. This alone took three days.

In the end, publishing took longer than the actual coding and testing (around two weeks). But once it was done, the add-on was live in the Marketplace and ready for others to use.

Lessons Learned

This project reminded me that building something useful is more than just writing code. While vibe coding made it easier to get started, making it work for others required real curiosity and a deep understanding of the tools and systems involved.

Testing and debugging were harder than expected. Even small changes in the prompt or the API calls changed the results in ways I didn’t always expect. The process of making sure everything worked smoothly—and worked the same way for other people—took time and effort.

Publishing was the biggest surprise. I had assumed it would be just a few clicks, but it turned out to be a real project on its own. Understanding Google’s requirements and going through their review process made me realize that sharing your work is its own skill.

In the end, the most important lesson was that while it’s easier than ever to start building, taking something from “just built” to “ready for real users” still takes real work.

Similar to my MCP building experience, with software, the unknown is always much larger than the known. But that’s also the beauty and challenge of it!

Harmonizing Complexity

Discussion about this post