ChatGPT Apps: Build AI Tools Inside Chat (Step-by-Step)

Check out the conversation on Apple, Spotify and YouTube.

What We’re Covering Today (0:00) Aakash: There might have been some news pieces that you read about what the ChatGPT App Store is, but nobody has broken it down in terms of what it means for product builders. So that’s what we’re gonna do today. I brought in my friend Colin Matthews. He is one of my go-to sources for technical topics on product management.

Colin: I’ve been building ChatGPT apps, probably 4 or 5 different ones this year. I actually built my own prototyping tool specifically for this. What’s cool is that this is a really underrated way to get more distribution. ChatGPT has 900 million weekly active users, and we’re seeing 26% higher conversion rates when users come from AI sources because they have higher intent.

Aakash: Today we’re gonna break down exactly what ChatGPT apps are, the architecture behind them, and we’re gonna build one live together so you can see the entire process from start to finish.

What Are ChatGPT Apps? (3:09) Aakash: Let’s start with the basics. Most people probably watched the announcement video about a month ago at OpenAI’s Developer Conference, but they’ve already forgotten about it. So let’s refresh everyone’s memory. Colin, how would you describe ChatGPT apps? What are they and what is the ChatGPT app store?

Colin: ChatGPT apps are basically a way for companies to bring in their own designs, their own kind of way users should interact with it, directly into ChatGPT. So rather than, you know, maybe giving a text summary of something or you’re recommending something from a web search, you can have this kind of built-in experience where you can interact with an application directly in your conversation.

There are a couple of companies that partnered with the initial release—Expedia, Sigma, Booking.com. Other ones coming soon like Uber is on that list as well, planning to release an app sometime in the near future.

Aakash: Why does this really matter? Because I guess it’s kind of hard to discover. When I think about the iOS App Store, for instance, the discoverability was there. It was one of the few apps that people actually got pre-bundled with their new $1,000 phones. They’re highly likely to open it up at least once or twice, and then when they open it up, it’s showcasing different apps. It feels like ChatGPT apps are kind of hidden. I haven’t heard about them.

Colin: I would agree. At the moment—and again, you know, we’re recording this in early 2025, so we expect it to change sometime soon—they are kind of hidden. But there are definitely plans to bring together a full app store experience, very similar to what you have in iOS or Android. You’d be able to browse apps, find ones that you like, and then download them and use them inside of ChatGPT.

There’s one other mode of discovery that we will get into a bit more later, but one thing that ChatGPT promises is that if you put in a request that relates to an app, they might actually decide to kind of surface that app to you. So for example, if I say, without installing the app, I’m looking for a hotel, I might get the Expedia app kind of surfaced in line, even though I didn’t install it or ask for it.

Aakash: OK, that would be super cool. So they might be building some sort of tool calling system that automatically figures out, OK, here’s a reliable app to help service this particular request. I’ve actually seen nowadays when I do search for a hotel that they’ll pull in some Expedia search results, although they don’t always pull in from the app, it seems like they pull in from web search.

Colin: Exactly. And it gives companies a little bit more control, right? There’s a very large kind of panic around getting your content into ChatGPT because people know that it converts well. There was some news I saw today that it’s like 26% increase in conversion when a user comes from an AI source because they have higher intent.

Aakash: I see that too in my own sites that the LLM traffic is much, much smaller in volume compared to SEO but really high conversion rates.

Colin: Exactly. So companies want to be present, but playing this game of whack-a-mole with ChatGPT web search is hard. And so now you have from an enterprise perspective, a deterministic way to show up in the application. Your app is gonna show up, especially if the user has it installed, but even if they’re trying to do something and it’s a relevant application, it can show up in their chat, and you’ll be able to control what that experience is like. It’ll be branded and you can even pop them back out.

A good example of this is Target. Target have recently announced that they’re building an app and you can’t finish a checkout with them inside of ChatGPT. You can’t actually purchase items, but you can build a cart. So you can say like, hey, help me find holiday items or Christmas presents for my siblings. It’ll build the cart for you and then you click out and it brings you into Target to complete the purchase.

So I think that’s another good example of building this kind of deterministic, catered experience that feels really great inside of ChatGPT rather than relying on web search to do the job.

What About Regular Builders? (7:06) Aakash: Makes sense. So in that example, at least, I highly grok it kind of like the Expedia example—OK, I’m offering some high-ticket service. I might have been getting a lot of traffic from search in many cases, so I need to get into ChatGPT, access its 900 million weekly active users. I understand that case. What’s the case for other products, regular people like you and me? What sort of ChatGPT apps have you been building?

Colin: Very similar to the App Store, you’re probably gonna see the eventual Ubers, right, which didn’t start at the very beginning, and then you’ll see the app-cellar-like flashlight app. If you remember back to the iOS beginning, it was like that was an app you could download.

And so I think we’ll follow similar paradigms here. For example, I’ve been messing around and building some apps and one you can build is like a spreadsheet. So ChatGPT can use spreadsheets and you can collaborate back and forth. It’s kind of like what you’d expect out of the AI experience in Google Sheets, but it doesn’t quite work correctly in Google Sheets yet.

You can build a little spreadsheet app inside of ChatGPT or like a to-do list app that gets pinned to the top. So if you want to complete multiple tasks with ChatGPT, maybe you have like three things you want to do, I can check off those tasks for you, so you get like a visual indicator.

It’s those little utility-type things that you could build and then you could build more fully featured experiences. Things like apps that have maps, navigation, search, and integrations with whatever you want on the backend.

I guess that’s the last thing I’ll mention here: there’s actually no strict limitations in terms of what the apps can or can’t do. It’s just that they’re so early that most companies are releasing very bare-bones versions of them to get into the marketplace, but I think we’ll see a lot more complex apps than what exists today.

Architecture: The MCP Protocol (8:25) Aakash: I love that you’ve been building these. You’ve even built a platform to build these. So you really understand the technical details. What do we need to know about how ChatGPT apps are made and built?

Colin: Underlying ChatGPT apps is this protocol called MCP or Model Context Protocol. This is invented by Anthropic, it’s about a year old. Basically what it allows is AI agents—so things like ChatGPT and Claude, as well as Gemini, Cursor, really anywhere where you would be talking to an AI—it allows those tools to reach out to other things over the internet, other tools, and use them for whatever purpose.

So you can think about web search as an example of this, like a common tool that would be built into a chat application. Any other tool that you might want to think of could also be defined as tools that ChatGPT or Claude or any other AI chat could call over this protocol of MCP. Things like booking a stay with Expedia or getting Sigma to maybe do some design work for you.

So a quick diagram here. This one basically just shows us what it might look like to help book a short-term stay. As the user, you would say something like, I want to book a stay in New York for this time period. ChatGPT is gonna decide first, does this request need an app or would it be beneficial to use an app? And if it does want to use an app, then what are the available tools that we can use in order to help facilitate this request?

The first thing it’s gonna do is actually gonna go ask for the list of tools that are currently available. And you can see that we have two tools available: we have “book a listing” and then “browse listings.” ChatGPT does cache this information, which basically just means that it holds on to it and it doesn’t refresh unless you kind of force it to refresh, but there is this kind of underlying need to know what tools are available before we actually go ahead and call the tool.

[Sponsor Break: Colin’s Course – 10:02] Aakash: If you’re enjoying this episode, Colin literally teaches a course on this. The next cohort starts January 30th. It is on Maven. You can use my code to get a special discount off this course. I highly recommend Colin’s content and courses. You all seem to love what he’s doing and I personally love reading it. I gained so many epiphanies just out of this one podcast recording. So think about what you could get if you’re working with Colin extensively over a live cohort course. Check out his course and now back to today’s episode.

[Sponsor Break: Vanta – 10:32] Today’s episode is brought to you by Vanta. As a founder, you’re moving fast toward product-market fit, your next round, or your first big enterprise deal. But with AI accelerating how quickly startups build and ship, security expectations are higher earlier than ever. Getting security and compliance right can unlock growth or stall it if you wait too long. With deep integrations and automated workflows built for fast-moving teams, Vanta gets you audit-ready fast and keeps you secure with continuous monitoring as your models, infra, and customers evolve. Fast-growing startups like LangChain, Writer, and Cursor trusted Vanta to build a scalable foundation from the start. So go to Vanta.com/Akash to save $1,000 and join over 10,000 ambitious companies already scaling with Vanta.

How Tool Calling Works (11:24) Colin: And the last thing is ChatGPT’s gonna decide which tool to use for this request. So the “browse listing” one might make sense to start because we need to know what listings are available so we can show that back to the user. And so we could ask for New York on a specific date and we get back a list of listings from our MCP server, and then ChatGPT would kind of describe that information.

So this is the bare-bones version of MCP, right? There’s no actual UI or app being involved here, but you could build this if you wanted to and just have it say, here are the top five short-term rentals in New York for that date, and it would literally just write it out as a description.

The addition here on top of MCP is this thing that actually OpenAI kind of invented and is now being incorporated into the MCP spec, which is the idea of widgets or these little interfaces.

So in addition to the raw data—the listings—it can also return a position or a URL for a widget that we want to return. And so the last thing that ChatGPT is gonna say is, OK, now that I know that there’s a UI element or a widget that goes with this, let me go get that code and then render the code inside the chat.

And so that’s how we end up with that code that shows up or your app that shows up inside the chat. And it’s still gonna respond with something, so it can say like, here are the best options, some small description. But what we’re really gonna be interacting with is the UI that you see here as the main interface rather than the text.

Building Your First App: The Easy Way (12:42) Aakash: Makes sense. So how do we build one of these ourselves?

Colin: There’s, let’s say, the easy way and the hard way. As you mentioned, I’ve been working on a platform to make this a little bit easier. It’s called Chippy, and we can go ahead and take a look at maybe an example really quick before we hop into building it.

Basically what Chippy does is it spins up everything you would need in order to build a ChatGPT app for you. So it spins up an MCP server for you, and when you prompt it, it’s gonna basically be specialized in building tools—so not full-stack web applications, but literally just what you need to build a ChatGPT app. And then there’s some nice UI/UX stuff that I built into it to help you build the app.

This example here is a coffee guide. I kind of wanted something where I can look at a map and see maybe a good place to get coffee. As you can see on the left-hand side, inside of Chippy, I just asked it to make me a quick location guide, and this is what it decided to build.

So on the right-hand side, we can see the component that it built. This is a tool, by the way. It has a little left-hand pane, we can click through on this, and it’ll pull up the right side for me and I can get directions.

The other nice thing about using this inside of Chippy is you can kind of get a preview of what it’s gonna look like inside a chat experience. So I can say, “Where should I get coffee?” And there’s an LLM working in the background that’ll actually call that tool and then throw it into the UI for us. So it’s actually a full-screen UI by default. It’s kind of like a simulated ChatGPT, right? A quick way to test what you built to see if you like it or not.

Aakash: OK, makes sense. So just to play that back, right, the easier way—what you’re doing is you’re bundling together an MCP server, which if we recall from the diagram, that’s how ChatGPT is gonna get connected. It’s the universal USB-C plug for LLMs to call tools like this. This is a tool, and then it’s got the right understanding of what needs to be built to create one of these tools. So it simplifies the tool process to basically just prompting with an LLM. That’s the easy way. What’s the hard way?

Colin: The hard way would be basically spinning up your own MCP server. That’s the first thing, getting that hosted on the internet somewhere, and then understanding how to write the code to build these tool definitions as well as to build the UI. And there’s something called bundling that has to happen, which is when it translates your UI code into something that ChatGPT can actually understand and render. The code that you write doesn’t just get downloaded and rendered in the same format inside of ChatGPT—it has to go through this little process, bundling.

And so you’d have to also bundle your UI code. And then the last thing is just understanding what the options are. So how to interact with the full-screen version of apps versus maybe the picture-in-picture or inline. All that kind of guidelines that ChatGPT provides or OpenAI provides on how apps should be built—all that stuff is kind of built into this agent. But if you want to do it the hard way, you kind of have to learn some of that stuff and then host it, build it, and then eventually connect it to ChatGPT would be the last step.

Aakash: OK, so you’d probably be using Cursor or Claude Code sort of format, versus here you have more of an AI prototyping interface to build that.

Colin: Yeah, exactly. The reason I built this actually is because I was working on a completely different app for ChatGPT and it was such a pain to go through the iterations of, every time I wanted to make a small UI tweak, I had to rebundle the code, make the change, go back into ChatGPT, update it, and then see if I liked it. Whereas here I can at least visually see it and then kind of play with it in the UI without having to go through that whole process every single time.

Live Demo: Building a Coffee Map (16:10) Aakash: Cool. Awesome. So what might be some interesting examples that we can build live?

Colin: Sure. So obviously we have this one here. We’ll just quickly spin this one up inside of ChatGPT. So in order to do this, there’s one last step which is just connecting. If we go up to “test” here, we’ll get a little URL that’s generated for us. I’ll copy that. And then we’ll head over into ChatGPT.

We’ll go into our connections in our settings. And you’ll see here that I have a bunch of enabled apps because I play with these all the time. So you can see I have a few that are built elsewhere and then a few that are built by myself as well.

But basically the last step here is click “create,” paste in the URL in this MCP URL field, turn off the authentication unless you really want authentication, and then also give it a name of some kind. So this one we’ll call “Coffee Map.” I think I already have one, so I’m gonna call this one “Coffee Map 2.” And then finally, click this little button.

So it’s a little bit involved when you’re testing. Obviously for installing apps, it’s a lot easier—the end consumer experience. You think about this more like the developer experience, right? I’m a developer, I want to build my own app. Those would be the steps that I go through to test it before I release it to everyone else. Obviously installing an app like Canvas isn’t as involved. You just go in, click the button, and click install, and that’s pretty much it.

And then we’ll give this one a try. Just to show you, there’s kind of two different ways to invoke apps. The first way is to just type out the name. So if I say “Coffee Map,” you’ll see that it pops up automatically, that it knows that there’s an app and I want to use this app.

The second way is to actually tag it manually. So if I go into my apps here, I can click Coffee Map, and again, it comes up.

And the last way, theoretically—we can give it a try afterwards—is if I don’t even say something alluding to it like “I want to get a coffee, where’s a good location,” ChatGPT may decide to use my app. And that’s a lot of where the finesse comes in in terms of getting it to be better. You want your app to show up on relevant queries, and so you’re gonna have to play with that and actually go through an eval process very similar to other AI tools.

The SEO Analogy for Tool Discovery (18:05) Aakash: Say a little bit more about that. I guess I was thinking it was almost like another SEO process. You need to somehow develop the reputation through queries over time that ChatGPT is feeling like you’re a good tool amongst the millions of other tools trying to get called for this query.

Colin: Yeah, that might become the case. There might even be ads for tools and stuff like that. But for now, really what it is is: when you type something in, is ChatGPT gonna do a good job of calling your tool? And that’s just based on the very limited set of tools that even exist, right? I mean, there’s less than 20 right now different apps, so can it…

Aakash: And anyone can get access to publish a public tool?

Colin: Right now there’s no marketplace where you can publish them publicly. You kind of have to be part of the launch partners—some of these large companies. But in the very near future, there’ll be this public marketplace where you can launch your own apps directly, very similar to what we’re doing here.

Aakash: OK, so right now if you build one of these, you can’t launch it.

Colin: Correct, not to the public. I mean, you can always do what I’m doing here, which is give someone a URL that they can play with. But yeah, ChatGPT has said—or OpenAI said—by the end of the year. So we’re getting there. It’s December now, so we’ll see if that comes through or not. But yeah, around the end of the year.

Aakash: So we’re basically learning how to build for a platform that’s about to become available, and so you’re kind of just on the bleeding edge of the distribution of this and making a bet that OpenAI will support it.

Colin: Yeah, exactly.

Testing the Coffee Map (19:39) Colin: Cool. So yeah, here is our little coffee map. Again, it’s a nice little demo application, doesn’t do too much. But yeah, why don’t we go ahead and build something new? So we’ll flip back over to Chippy here. Any thoughts on what might be interesting? Would give it a try?

Aakash: I feel like I want to do something in the healthcare space, healthcare or legal. I feel like those two spaces are just like infinite value for me on ChatGPT. It doesn’t really have anything to do with product management, but I could imagine, let’s say you’re a healthcare product manager at a hospital and you want to be able to give access to your customers like some information about your hospital system through ChatGPT. How would we think about it? What would be a good unit for an app for that product manager?

Colin: Actually, hospital reviews and surgeon reviews are a really big thing. There’s actually SAS companies that help hospitals and surgeons manage this because it’s related to revenue. Obviously, if you have really garbage reviews, you won’t have as many customers.

So maybe what we’ll say is something like: “Build a solution that helps hospitals manage and share their Google reviews.”

Something like this. And I’m actually gonna turn on plan mode so that we can kind of see what we get back before we kick it off, just so that we don’t end up building something that’s completely unrelated.

Aakash: Cool. So plan mode’s gonna give us that thinking, reasoning model that gives us the plan first before it executes.

Colin: Yeah, exactly. And under the hood here, I’m using Opus 4.5 for pretty much everything, so it’s a brand-new model that just came out last week. The nice thing about it is that it has this effort parameter that you can turn down, so you get a really high-quality model. You can reduce the amount of time that it spends on a task. And so for things like this, I’m using a higher quality model—so better thinking—but kind of low effort so that it doesn’t spend forever spinning its own wheels. It gets back the response pretty quickly.

Building the Hospital Reviews App (21:38) Colin: Cool. So here’s what we have. We proposed three different tools to build. So one is viewing reviews, one is sharing reviews, and then one is review analytics. So we’ll be able to see a dashboard of our reviews of some kind, a shareable card that we can share with other people, and then see some summary stats and so on. What do you think about that? Does that sound good?

Aakash: Yeah, I love it. OK, this is what I was trying to figure out—what is the takeaway for PMs? One thing that’s just kind of on my mind though, is the PM really gonna be building it? The PM is mainly probably gonna create the spec for this. So this could be like they could create their prototype here.

Colin: Yeah, exactly. I think it’s a little bit hard to really understand, if you’re a PM, how would you spec this out without ever using one of these or even testing how they might work? So yeah, I think using this as a prototyping tool is a great use case. And then in the long run I think that there’s an opportunity for solo builders to also build apps and distribute this exact same way as the iOS App Store, where they’ll be building their own apps.

So that’s the way I’m thinking about this platform: prototyping primarily for PMs and then for solo printers or people who want to build their own apps, you could do the whole end-to-end of hosting your application on here as well.

Aakash: Makes sense. So there’s a really big opportunity here, I think, for anybody who wants as a PM to create a side project or something like that, a portfolio project to improve their AI PM credentials. But in terms of actually coding up the production version of your ChatGPT app, you’re probably not gonna be doing that. You’re gonna be creating the prototype here. And then your engineering team is gonna take that and they’re gonna create the real version.

Colin: Yeah, exactly. And actually kind of funny—a lot of the things that you would do in a normal AI project, as you mentioned, you have to do those here as well. So things like running evals on the prompts that are triggering your tools to make sure that the right kind of phrases are triggering the right tools, and you might have to tweak the tool descriptions a little bit to try to improve that. This type of request should trigger this tool, or even have it where someone writes a request and it doesn’t trigger your tools at all because it’s not relevant.

So you need to kind of go through the very similar process of what you might choose to do to improve a regular AI application or AI agent—you do in the same case here.

Behind the Scenes (23:53) Colin: Cool. So it built us our three different tools. Again, I’m gonna just for fun share a little bit about the behind-the-scenes here so you can see it actually viewed some examples of code that I’ve built in the background. So this agent that I built, what it does is it can choose to look at relevant files to kind of get inspiration for what it should be doing to build the thing that we’ve asked it for.

And so it took a look at some technical stuff as well as a list example that covers some UI/UX for lists. And then it decided to build those different tools for us. So here’s a little preview. I don’t know if I love the UX, so I have to see how it looks inline when it actually has some data in it. It’s hard to tell when there’s no data in here.

But anyway, we have “view reviews,” we have “share”—nothing in there—and then we have “review,” and yeah, so we’ll give this a try in a second.

I’m wondering though, if this has data that’s gonna be passed in by the model or if it needs better mock data. So I’m gonna go ahead and ask that same question. We’ll say: “Does this rely on ChatGPT passing in the data or should we have mock data?”

And the nice thing about this is because the agent is an expert in ChatGPT apps, it can answer questions like this one. Maybe you’re unsure about the best pattern for how this should work. You can just ask the agent. It’ll read through, read your code, take a look at how everything works, and then make a decision for you.

So here it’s saying that it already has some built-in mock data and if it’s using the correct pattern, it should be passing in data from ChatGPT and then falls back to the mock data. So I’m just gonna tell it the mock data is not very good if there’s mock data. And so I’m gonna say, “Please improve the mock data,” just to get a little bit more information in here so we can take a look at these components.

Understanding the Build Interface (25:42) Colin: Cool. And then, yeah, the only other thing I’ll mention while we’re waiting here—this will just take a second—is you can kind of see that the UX for this building experience is a little bit different. I can see each individual tool. I can also modify the parameters, but these would be the same parameters that ChatGPT would be using when it decides to call this tool. It’s gonna be passing in data for these, so it would be deciding what the filter is, what the sort is, or any other parameters that are necessary.

And so, yeah, you can kind of mess around with the prompts directly in here or the parameters in order to get actually different experiences that might be rendered inside of ChatGPT depending on what gets passed in.

Aakash: OK, I’m keen to see this with the real data because then I want to actually—you mentioned you gave us that teaser that really perked up my ears around—was the evals for the prompt calling it, because I think that part is really interesting.

Connecting to ChatGPT (26:29) Colin: Cool, so we’ll hop over into ChatGPT and then we’ll get this hooked up. First thing we need to do is just go back over to our settings, our apps and connectors. And then create a new connector here, or a new app. We’ll paste in the URL. Again, we’ll turn off authentication for now, just keep things simple, and we’ll call this one “Healthcare Reviews.”

Cool, and connect. So this will just take one second to connect and then we’ll give it a try, and we’ll see if it works. And then after that, as mentioned, I’ll go back and look at the logs. We’ll create an eval really quick and see how that performs.

So we’re gonna spin up a new chat—you don’t have to, but I just don’t like to see the old ones—and I’m gonna tag it this time. So we’ll say “Healthcare Reviews,” we’ll say, “How are my reviews doing for Saint Mercy Healthcare?”

And we’ll see what happens here. I’m actually unsure—the tools are a little bit janky in that one didn’t have any mock data. It didn’t look like… But OK, so ChatGPT is actually generating the mock data and then filled it in. We saw for a second there, it popped back out for some reason.

Aakash: So yeah, I swear I saw it.

Colin: Yeah. So I think what happened maybe is that there was some underlying mock data that tried to override it there. Probably have to do a little bit of iteration on this one, but let’s see if we can try to call one of the other ones. So let’s say, “I want to share a review.”

And this should call the other tool, maybe, hopefully. Right, so the tool that kind of generates… Yeah, there you go, our reviews here. So you can see our Mercy Hospital. There’s a little problem with the underlying data there you saw for a second and then it disappeared. I guess we have to clean that up. But yeah, so you can see the different tool calls kind of in action.

[Sponsor Break: Mobin – 27:29] Before we dive deeper, let’s talk about something every PM faces: getting alignment on product decisions. You know that feeling when you’re trying to explain a user flow to engineering or justify a design choice to leadership and you’re just describing it with your hands? That’s where Mobin comes in. Mobin is the world’s largest library of real-world mobile and web app designs from industry-leading apps like Airbnb, Uber, and Pinterest. Instead of spending hours taking screenshots or hunting for inspiration, you can instantly find exactly how successful products handle onboarding, paywalls, checkout flows, whatever you’re facing. Over 1.7 million product builders use Mobin to benchmark against best-in-class products and show their teams proven solutions. Whether you need to convince stakeholders there’s a better way to handle user activation or research how top apps approach feature discovery, Mobin gives you the visual proof to back up your product decisions. Check out Mobin.com/Akash to get 20% off your first year.

[Sponsor Break: Jira Product Discovery – 30:01] Today’s episode is brought to you by Jira Product Discovery. As a PM, you’re constantly balancing what to build next with limited resources. Jira Product Discovery helps you capture ideas from across your organization, prioritize with confidence using customizable views and frameworks, and keep stakeholders aligned throughout the process. It seamlessly connects to Jira Software so delivery stays in sync with your product strategy. Whether you’re running discovery sprints or managing a quarterly roadmap, Jira Product Discovery gives you one place to organize the chaos. Plan with purpose, ship with confidence. Check out Jira Product Discovery at atlassian.com/software/jira/product-discovery.

Viewing Tool Calls in ChatGPT (30:11) Colin: And then one last thing I’ll show you is if we go back over into the connector here, we can actually see those different tool calls directly inside the connector. So we have our review analytics tool, we have our share review tool, and we have our view reviews tool. So those three different tools that we’ve set up.

Aakash: Oh wow. OK.

Colin: Cool. So yeah, let’s go ahead and look at the other side—the logs—and we’ll create a quick eval for this.

Aakash: Yes, this is like that full-stack AI product building, which is why I think this is a pretty cool portfolio or side or learning project for an AI PM. In two prompts, we spun up something, then we’re testing it. Now we’re already getting to the evals process, so it’s simulating a lot of the things. Some people may not have access to building AI features in their current job, but they want to get that job building AI features. This is one way to simulate those learnings.

Colin: Yeah, exactly. You kind of do the whole end-to-end, but it’s simplified. You don’t have to worry about as much complexity.

Aakash: Yeah, but yeah, so we’ll take a look—no worry about time in between the steps or somebody else is doing something.

Observability and Evals (31:13) Colin: So we head back over to our main view here. We do have an observability tab and this will show us all the various tool calls that we have so far for our tools. So you can see we have the hospital reviews manager who was called twice, one for view reviews and one for analytics, and then our other ones from before with our coffee guide and so on.

If we click on this one, we can see a little bit of information about what happened. So you can see that the input was, the sort was for “newest” and the filter was for “all.” So ChatGPT is the one who decided these. Just to be clear, those are parameters for the tool call, and then ChatGPT decided based on the user’s request that these are the correct, relevant parameters.

And then I also stored the user prompt here. So what I typed in was “I want to share a review,” and we have that tracked here. And this is what gives us the ability to run an eval. We can say, does “I want to share a review”—should that result in the tool call of view reviews or not? Is there a different tool call that would be better?

We can also see some of the output data and so on and so forth. But from here, there’s kind of two options that you have. So one is you can run some quick annotations. This is really just if you have a team of experts who you want to quickly label the data with, you can do that.

[Sponsor Break: Naya One – 32:30] Today’s episode is brought to you by Naya One. In tech buying, speed is survival. How fast you can get a product in front of customers decides if you will win. If it takes you nine months to buy one piece of tech, you’re dead in the water. Right now, financial services are under pressure to get AI live. But in a regulated industry, the roadblocks are real. Naya One changes that. Their air-gapped, cloud-agnostic sandbox lets you find, test, and validate new AI tools much faster—from months to weeks, from stock to shift. If you’re ready to accelerate AI adoption, check out Naya One at Nayaone.com/Akash.

Building Your Eval Set (33:15) Colin: And the second thing is to start to build up your evals or your golden sets. So let’s say for example that this was correct. We want this prompt to trigger this tool. I can add that to my set of evals, and I just click this button here, and I have three different types of evals, and this comes directly from OpenAI guidance.

So there’s a direct, an indirect, and a negative. Direct meaning that the user actually typed in the name of the product. So this would be like, “Canva, can you do X for me?” That’d be a direct request. Indirect would be they typed in something that’s relevant to the tool. So what we did here, “I want to share a review”—it’s not naming the tool or naming the application.

And then a negative eval would be where the user texts in something completely unrelated like, “I want to go shopping this weekend,” and if it called your tool, that’d be a bad thing because you don’t want that to happen.

And so in this case we’d say that this one is an indirect. The user describes the outcome without naming the tool, and we’ll go ahead and add that to our evals.

Cool. And then last thing we have is the actual eval. So now we have one eval that we can run. We have this one set up as an indirect, and the way that I built this is there’s two ways to run the evals. So the first way is if we open it up here, you can run it on auto.

So what this does is it literally sends the same prompt and the same set of tools to an LLM and asks the LLM to decide which tool to call, and it can also decide to call no tools. And so this is a very quick way to test a bunch of different prompts at the same time. You can basically run your whole eval set through auto and you’ll see what will happen.

And so, for example, this one failed when we passed it back over to GPT-4o. And we’ll take a look at the reason why. So, let’s take a look at… cancel out of here. OK.

So when it said “I want to share review,” we had the expected of “view reviews” because that’s what happened inside of ChatGPT, but this is telling us that the correct tool to use probably would have been “share review,” which makes sense. So we have these two different ones. One is called “view,” one is called “share.” Inside of ChatGPT what happened is it called this one, but what we would have wanted is for it to call “share review,” and it’s kind of picking up on that issue there for us automatically.

And so this is a good example of: OK, in order to fix this, we have to go back into our tools and probably modify the description of the tools to be more accurate so ChatGPT has a better idea about when to use this “view reviews” tool, because it accidentally used this in the case where the prompt was “I want to share a review.”

Improving Tool Descriptions (35:23) Aakash: Yep, so let’s—what does that look like? Maybe we can just look at the full cycle of improving performance.

Colin: Absolutely, yeah. So, now that we’ve run the eval, I’ll show you one more thing in here. So it was an auto eval. The auto evals are a great way to get a quick kind of directional input, but it’s not necessarily gonna match one-to-one with what ChatGPT provides.

And so if you want to actually manually run your evals, you could basically build an eval set and then go through and just type in—type in the prompt to ChatGPT, “I want to share a review,” and just log what happened. So that’s what this is here for—to just literally go through one at a time and log what happens with each one.

Aakash: Nice.

Colin: Yeah, so let’s say we want to make that change. What we do is we go back into our application, our hospital reviews manager, and we can either prompt our way through this or we can just edit it manually. So in this case I’m gonna edit manually. I’m gonna go into the config. I’m gonna find “view reviews”—that’s my tool call—and we can see that this is the description that my LLM or my agent decided to write for this, which is “display hospital reviews with filtering, sorting, and sharing options.”

And so what likely happened here is because it has the word “sharing” in the tool description, when I said “I want to share a review,” it decided to call this one by accident. And so we’ll just say get rid of the word “sharing” just to clean that up. So we’ll say “filtering and sorting options.”

Aakash: So basically improving the metadata. If we think about it from an SEO sort of standpoint—they type in a keyword, they’re using the title and the metadata to match it—ChatGPT is doing the same thing with these MCP tools it has available to it. So we’re trying to give it the right metadata here.

Colin: Yeah, exactly, and these descriptions can be pretty verbose. I mean there are character limits, but you can put in things like examples of how to use the tool. For example, I built a spreadsheet tool before and it supported formulas, but I needed to tell ChatGPT what those formulas were so that it knew how to use those formulas inside the tool if it was gonna be writing any data to that spreadsheet.

So it’s not just about necessarily SEO, it’s really just: how should ChatGPT use or behave with that tool?

Aakash: Makes sense.

Colin: Yeah. So clean that up a little bit and then I’ll probably just add something here like, “This is intended to fetch existing reviews for the purpose of showing the user…” It’s probably not the best description, but trying to get more at the idea that this is not for sharing, this is for retrieving information.

Aakash: Yep, makes sense, modifying that metadata to just get it called at the right time. And then we’re gonna keep iterating on that and that’s how we have this end-to-end cycle on evals, and we can run those auto evals as you said. So, is there another category of evals then about how effective the reviews were if we pulled the right reviews? How would we write that category of evals?

Colin: Yeah, so that’s less about “did the tool get called based on the prompt,” but more around “did the user get the expected result” or “was the behavior good?” So very similarly, if we go back over into our observability, we can take a look at the logs.

And that’s really the best way to get a good idea of what’s happening—you can see again what the user requested, what tool got called, and then what some of the data was that basically got filled in. And using these logs, we can kind of get an idea for what happened.

Again, you kind of have to have more context on reading through these logs—what did you want to happen? And it’s the same thing with any type of eval. You have to have an idea of what ground truth is—what do we want to occur when a user types something in. And so again, you look at this data to get a feel for what did happen, but separately from that you’ll have to decide what you wanted to happen in whatever case.

The PM Role & Prototyping (38:58) Aakash: This is kind of expanding my conception around what a good AI prototype is. I think some people might have the tendency to want to ship the AI prototype when we did right at the beginning—all right, two prompts in, we’re good to go. Let’s ship it. We had our initial prompt, we changed it to add in some dummy data, good to go. Let’s see it.

But it seems like actually going through this eval process along the major categories of evals—here, the major category was discoverability and then good result—tweaking it and improving it. This is gonna help you really understand the corner edge cases, some of the things we used to have in a deeper PRD, help you understand what’s gonna move the needle in this feature’s success or not.

Colin: Yeah, and a lot of these things, honestly, you can’t really necessarily predict how is ChatGPT going to interpret the way that you wrote your tool description. And so you could spend a very long time trying to figure out what the best thing is, but really you should just test it and run evals against it and see what works rather than thinking about it for a long time.

And so, I totally agree—getting into the process of this type of iteration provides a lot more information than thinking about it or writing it in a PRD and then handing off to the engineering team. Because eventually your team’s gonna go through this iteration anyway. It’s just a matter of: are you getting through some of that on your own quickly, or you can obviously bring in your counterparts, but you have a mechanism to do it quickly versus the full handoff between teams back and forth, back and forth, which can take weeks or months or even longer.

The Itamar Gilat Critique (40:19) Aakash: So, I consider you one of the leading experts on AI prototyping and since we’re talking about it, I wanted to bring up this alternative view that I saw from Itamar Gilat a couple months ago. Went pretty viral on LinkedIn, where he talked about, well, what are all the other things that a PM could be doing? Researching the market, talking to customers, talking to stakeholders, talking to partners, looking at user and business data, identifying opportunities and threats, setting goals, evaluating ideas.

Sometimes I wonder, are we just endlessly expanding the PM role? What is the right way to think about the prioritization of this work that we’ve been going over so far versus some of the other work that Itamar has listed here?

Colin: Yeah, so I think to start, just—we’ll skip on the ChatGPT app side of things and just address this first. I think it’s a skill, the same way that a PM who knows how to use Sigma is probably more useful in certain contexts, such as talking to design stakeholders or even spinning something up really quick to show to a customer or someone like that. You’re not dependent on other people to do every single kind of touch point or element for you.

And so I wouldn’t say that using Sigma should be an extra line item in here. Using Sigma is a skill that supports talking to customers and talking to stakeholders, you know what I mean? So it’s not—they’re not independent things. Yes, these are the responsibilities of a PM. And then they have a way to do that, the same way that talking to customers involves some type of skill around interviewing, and they needed to learn that skill. Talking to stakeholders involves a lot of skill around stakeholder management and managing up and stuff like that.

I would say using some of these tools is complementary, so I personally wouldn’t advocate for AI prototyping to be an extra line item or vibe coding to be an extra line item on here. I think these are tools that we can use to support these ideas or these tasks. And that to me at least, it’s obvious that if it’s very difficult for you to visually communicate something, that that’s a great use case for AI prototyping.

If I want to explain, hey, this is how I think our AI product should work, I’m building some type of agent that’s gonna do some task—it could be hard for me to kind of explain that to my stakeholders or to my customers. And so, spinning up a quick prototype in whatever prototyping tool you like is an easy way for me to start to have that conversation and improve the fidelity of the information that I’m sharing. I can be like, this is kind of what I was thinking, does this resonate?

And so that’s how I would kind of have a rebuttal to this: it’s not an extra line item, vibe coding is not something a PM should do for no purpose. It should be related to some reason that they’re building that prototype.

Aakash: So, AI prototyping enhances some of the activities on this list. It’s the way you should think about it, and you shouldn’t necessarily think about not doing the stuff on this list. This stuff is important, but how can AI prototyping help you do some of this stuff better?

Colin: Yeah, exactly, and this isn’t really new. PMs have been trying to brainstorm and communicate ideas forever. So Balsamiq was popular for a long time. It’s the same thing. It’s just helping someone who’s not a designer communicate something visually to get the idea out of their head and onto some form of paper that people can see.

And so, yeah, I would say it’s literally for the exact same purpose. And so, again, if someone’s vibe coding first—if you’re a PM and you don’t have these other skills and the only thing you know how to do is vibe coding, I don’t think that that will be a way to be super successful in the long run. Maybe there’s some short-term game because it’s popular at the moment, but these skills are super critical. And vibe coding can support some of those, or AI prototyping can support some of those.

Who Should Build ChatGPT Apps? (43:49) Aakash: Amazing. So I want to do some mind mapping together. What are the benefits for creating a ChatGPT app for your product? What would you put those major groups as?

Colin: I think we’ll classify this into two categories. I think there’s some benefits from the perspective of learning how to build agents basically or build tools that agents are interacting with. So there’s a career benefit or a skills benefit for an individual person—a PM, a designer, an engineer. That’s one classification.

I think the main one is enterprise-focused, to be honest. I think the vast majority of early adopters of this is actually gonna be large enterprises, not small companies. And the main thing is getting clicks or views. It’s basically growth. I want people to see my app, see my product, and ChatGPT has hundreds of millions of active users per week, and the intent when a user comes from ChatGPT is higher than the intent when they come in from SEO or another channel.

And so, every company on earth, given the proper tools to capitalize on that, will do so, I think. And so I think that’s really the main benefit: what is the right form factor for us to get in front of customers, get in front of users, and help them interact with our brand, interact with our company, so that we can pull them into our ecosystem.

Aakash: All right. And then the next part of this mind map I want to understand is who should be building a ChatGPT app. How would you create the major buckets or—if I’m a PM, how should I understand if I should be?

Colin: In a typical enterprise setting—we think about the Canva app or any of the ones that exist today—I would expect it to be like a pod, to be honest. So you’ll probably have a designer who needs to understand what are the form factors. And it’s actually for a designer, I think, an exciting place because it’s very unique. You have these little micro-apps that you can build. And each one can do a very small amount of things. You can build more than one if you want to, they can communicate back and forth. So understanding the form factor is pretty critical.

The second to that would be the engineering team, so how do we actually ship this thing. There’s a couple of technical complexities—authentication is very complicated as compared to regular authentication, so you need to make sure you get that right. And so engineering is gonna figure out how do we actually get this into the world, how do we support these different types of tool calls that are coming in.

And then lastly it would be the PM. And as a PM the reason I might choose to do this—what is the guiding light? It is growth. I would decide: is a ChatGPT app a good method for us to drive higher conversions from AI basically—AI search or AI chat. And maybe this is a priority that we have that we want to capture more market from OpenAI, from that type of search.

And so the PM would prioritize this as something that is relevant. And then they’ll also hopefully be involved in this process of building evals, shipping small incremental changes to the application, understanding how users are using that application, and then sharing back with anyone who cares about it internally, what’s happening with that application.

So, it’s really—it’s kind of its own form factor of software. It’s not like it belongs to one persona or group. In my mind, it would be a pod and then they’re gonna ship this together and each one should have some skill around this new form factor.

Evaluating the Opportunity (47:02) Aakash: So, if you’re a PM deciding whether this is an important opportunity, how do you decide that?

Colin: I think for now, definitely give it a try first of all, so that you have some familiarity with what the options are. So building full-screen applications, how that differs from building just a quick inline card.

And then the second thing I would say is really pay attention to what other people are doing. Larger companies like Target and Uber bringing—coming into the space. When you interact with ChatGPT, is it pulling up Target for you? Is it pulling up Uber for you? Is it pulling up Coursera for you? And if it is, you can see it happening in real time—what the benefit is of having these apps.

And then lastly, the thing I would think about is: is this an opportunity to re-engage customers off of your product? So for example, if 10% of your customers are using ChatGPT, they might not be logging into your application like Coursera. But they could be using your micro-app and still getting benefit from your product and still feeling like they’re connected to your product.

And so I think there might be a value out here around—you kind of think about retention. You have to think about how it affects retention a little bit more, but something around that space of consumers or users interacting with your brand and your product without necessarily having to go directly into your app experience.

Aakash: Makes sense.

Ideas for Solo Builders (48:18) Aakash: All right, I could play with that arrow infinitely. So we got a little bit of this package here of ChatGPT app if you’re building it for an existing product here on the right. Now, I want to go to the other side on the left and talk about: what are the good ChatGPT app ideas to build if you are a solo printer or a side project person?

Colin: I think, to start with, I would start thinking about unique ways that ChatGPT can interact with your application. So we saw a brief demo of that here—we had a little bit of data issue in the background, but ChatGPT actually can fill in the data for you. It is the one who’s deciding what to call your tool with.

And so a good example of this, I’ll kind of go back to when I referred to earlier, is a spreadsheet application where ChatGPT kind of partners with me on it. So for example, a spreadsheet app that has financial modeling support. Some person drops in some financial data into ChatGPT and says, “Hey, can you help me model this?” And it pulls up the spreadsheet app, puts in the relevant formulas, generates some nice charts and graphs—all that kind of stuff that’s deterministic so the user can go back and actually change the data on the spreadsheet and say, “Oh, you know, you got that number wrong, let me just quickly fix it.”

That’d be a small example of a utility or an application that is embedded with ChatGPT. It’s not just showing you stuff, it’s not a search tool—it actually has the ability to collaborate directly with ChatGPT in some form factor like a spreadsheet or a task list or a whiteboard or whatever.

You can imagine this ChatGPT that has memory of you and knows you really well and has access to all the tools that you want to use. Then you don’t need to hop into Miro or Google Sheets—you can do a lot of work very quickly directly with these embedded applications.

So I think there’s a lot of potential for these types of embedded apps to take over smaller use cases of where ChatGPT kind of falls short right now, but it’d be useful for ChatGPT to help you with these types of tasks.

Aakash: OK, so maybe a domain that ChatGPT is interesting in like healthcare or legal or productivity or writing, but maybe a use case within that that’s neglected.

Colin: Yeah, exactly. And you can think about any example you can find out in the world where there’s an AI company building a product for this. So a good another good example is Gamma. So Gamma is a very large company building a presentation tool, an AI presentation tool. Theoretically, we could build a ChatGPT app that also makes presentations. And so you could provide a really great experience in building presentation software inside of ChatGPT.

You maybe won’t be as good as Gamma. I expect probably not, but you also don’t have to be. You just have to be good enough that someone who’s already inside ChatGPT goes, “Oh yeah, this presentation is a good starting point.”

And so really the strength of the distribution with these embedded applications is what I think will kind of win the day there.

Aakash: Yeah, anything that might benefit from embedded distribution. What else? It seems like there was a lot of e-commerce examples.

Colin: Yeah, I think—well, I mean, there’s a ton of work going on right now in terms of shopping. So I think it’ll be pretty common that people like Target or other consumer-facing companies want to be in this space if people are searching for products inside of ChatGPT.

They want you to be able to build a cart, they want you to be able to check out, because they want to be in front of you the same way that they’re in front of you on Google or any other product.

It’ll be interesting to see if Amazon does this because Amazon has their own LLM activities going on, but I think Amazon would be an obvious example. Can you imagine just hop in ChatGPT, you say, “Hey, reorder me the Thursday order”—it fills out your cart for you from whatever you ordered last Thursday or whatever you recurringly get, and then you just buy it.

There’s a lot of good examples in the e-commerce space that I think would be consumer-friendly. And then we saw Figma and Canva in there. So I guess those are—if you have any sort of media or content creation tool.

Colin: These are still early days. So I think Canva is the best example from a functionality perspective, so I’d encourage you to give it a try. But basically the idea here is you can use some mini version of the Canva app directly inside of ChatGPT. So it’s more fully featured than just showing you information—you can actually interact with the application, move stuff around—a mini version of Canva.

And again, you’re reliant on ChatGPT to help you do that. So rather than me clicking through everything or all that, I use ChatGPT as an agent that understands Canva. And so in some ways we’re moving towards a future where ChatGPT is this operating system or it’s the universal agent. And then these are all different applications that I can call or use as needed rather than every company building their own agents.

MCP: The Universal Protocol (52:01) Aakash: Yep, and all this is built on MCP. So theoretically if Claude or Gemini win they could also pull into these, or what about that?

Colin: Yeah, exactly. So this is probably the best part, icing on the cake to some degree, is that this started as an OpenAI initiative, this kind of apps inside of chat. But using MCP, which Anthropic is responsible for, they pretty quickly amended the MCP protocol or standards. So now, Claude actually is—today—working on the same thing.

You can see screenshots of it if you look around on Twitter or LinkedIn of the team sharing how these apps are gonna work inside of Claude. And so you’re not just building for one distribution channel, you’re actually building for any distribution channel that supports MCP. Which right now primarily is Claude and ChatGPT, but Lovable actually supports MCP, Cursor supports MCP. There’s a lot of tooling that supports MCP already.

Gemini does not, interestingly enough, but maybe they will at some point in time. It was just not the focus. But I think that there’s a potential future here where you can get yourself plugged into multiple different chat applications through one app that you’ve built on top of MCP.

Aakash: OK, so it currently isn’t in Gemini. So that is one sort of downside here, but it is in Claude. They’re working on it. It’s not released yet. Maybe by the time the podcast goes live, it will be, but…

Colin: Yeah, there’s basically the engineering builds of it. They’re working on it. It has been approved. It’s part of the MCP spec. It just needs to go through the actual development process at this point in time.

Aakash: OK. Is there anything else people need to know or add to this mind map to get a good understanding of ChatGPT apps?

Colin: I think this is pretty comprehensive. Obviously, as we talked today, it’s still early days. I think maybe that’s the last thing I’ll mention: I don’t want to try to hype it up too much. It’s a cool form factor. It gives you a great experience in terms of testing and building AI apps without having all the infrastructure yourself, so it’s a great way to learn. But obviously there’s only less than a dozen companies that are currently partnered with ChatGPT.

And so I think a lot of this is gonna depend on OpenAI’s ability to execute. Can they actually get this marketplace over the line? Do people start to use these apps? What does the discovery experience really feel like and look like?

And so, yeah, just—I guess with a grain of salt, I’m super excited about this space, obviously. I think there’s a ton of potential, but it does depend on a couple of things getting across the finish line. But I’d say we’re like 70 to 80% of the way there, and I would guess by March we’ll know one way or the other if this is the case.

Aakash: So I think this is what you’re highlighting, that critical PM skill: is this an important opportunity? That’s what you guys all need to think about for yourself in your unique situation.

Colin’s Solopreneur Year (54:38) Aakash: That is our masterclass on ChatGPT apps for PMs—hopefully the best guide on YouTube that you have seen yet. Colin, I want to talk a little bit about you because you’re one of the most interesting men in the PM content and tech space.

You just finished a year as a solopreneur, so you were a PM for a long time. We all know you are very highly ranked on the Maven leaderboard, so that could be one thing you did with all your time, I imagine, and you’d be financially fine. But you didn’t really stop there. You did some experiments this year. You launched a podcast. You launched a couple of SaaS apps. Obviously Chippy, you’ve built. What is the pie chart of Colin’s time and attention and focus these days?

Colin: Yeah, so as mentioned, one year—almost maybe one week ago, two weeks ago—which is really exciting because I’ve actually wanted to work for myself for literally my entire career. So dream come true for me.

But yeah, in terms of time and attention, I would say it’s maybe 40% keeping the whole thing running, and I have some help with that, which is great. My wife actually helps me a ton with operations. I have a TA named Paulo who helps me with some of the core stuff. So I have some support with that.

And then the other like 60-ish, maybe 50% of my time is spent on new bets. And when I think about bets now, it’s actually changed over time. So partially it’s: is this gonna work commercially? And then the second part is: do I actually want to do it?

And sometimes I don’t know if I want to do it until I try. So podcast is a good example of that. I tried podcasting for a bit. I think I was creating interesting things, but the thing I think about is: am I gonna be in the top 5 to 10% of this thing? And if I’m not, I kind of drop it and I think of something else that I want to do.

And so for podcasting, I don’t think I’m gonna be in the top 5 or 10%. Maybe I’ll come back to it one day. But for now, it’s not really a bet that I have the same amount of conviction about. And so yeah, I tried podcasting for a bit. I’ve built probably 4 or 5 different SaaS apps this year. This is actually my second AI prototyping tool that I’ve built—my own prototyping tool. This one obviously catered to this use case, but I built one earlier in the year as well.

A lot of learning—obviously knowing how to build these tools takes a little bit of time, so learning how to build agents, learning how to run these systems, and yeah, testing other stuff to be honest. I spent time this year on a RAG app for a little while that I built from scratch. What else? I don’t know, a bunch of different stuff. I just try different things.

Obviously, I have a Substack, so it’s been a little bit scattered, to be honest, which I don’t like, but it is nice to try different things, fail quickly, and then move on, rather than just doing the one thing forever.

And I think—you didn’t ask this, but my end goal is to have a software product and continue teaching. And kind of balance those two things, but I’d love to have a software product that’s super cool, that’s valuable, that people want to use, so that’s what I’m trying to shoot for at this time.

Aakash: Got it. So you’re like a Mark Luvian plus course instructor.

Colin: Sure, something like that, yeah. I just mess around with stuff a lot. I write about stuff when I find it interesting, mostly. And then that’s pretty much it. I do have to get better at marketing. This is an aside, but just being transparent, that’s one muscle that in the next year I’m hoping that I improve on because my marketing activities are very haphazard at the moment. And so I need to get more consistent at marketing and just showing up so people know I exist. But yeah.

Colin’s Tech Stack (59:02) Aakash: And what’s your stack for building these SaaS apps? How do you build the app we just saw today?

Colin: Yeah, actually this might be really interesting for you all. I built the UX entirely on Relit. So with the Gemini 3, the new design mode they released recently. But in terms of actually shipping stuff, so I use VS Code and Claude Code for the vast majority of codegen. I’m obviously using Git for version control. I use a database provider called Neon. I use a hosting platform called Render.

And then there’s lots of different libraries depending on what I’m doing. For the RAG app that I built, RAG is a whole world on its own that’s kind of complicated and hard to optimize. And so I was using this vendor called Voyage for the embedding models and different vendors for different things. So you end up with this whole stack of random stuff that you learned about and try to build and then maybe it works, maybe it doesn’t.

But anyway, my core tools are Claude Code, VS Code, GitHub, and Render for building stuff.

Aakash: Why VS Code and not Cursor?

Colin: Yeah, so I don’t use Cursor. I use Claude Code predominantly. I use the Codex tool as well sometimes, and I just find the integration inside of VS Code is a little bit nicer for those tools than it is inside of Cursor. Cursor has its own AI obviously, and it tries to use that AI, and I don’t want to. I want to use Claude Code or Codex.

And so yeah, I don’t really use Cursor. I find Cursor at times kind of tempts me back because they release new features. So for example, they released embedded websites, so you can interact with your webs, whatever you’re building inside a Cursor, and the AI has some context on it—can actually debug for you. But more often than not, those things don’t really move the needle for me.

What really moves the needle is quality of codegen. That’s like 95% of what I need and care about, and all the other stuff is just bells and whistles. And so right now for me, Claude Code is the highest quality codegen with the fastest pace. And so that’s my daily driver.

Closing Thoughts (1:01:16) Aakash: Fascinating stuff, man. We’re gonna have to have you back again. Maybe you can show us the stack of how people can build stuff. I’m sure there’s a million different other episode ideas we could come up with. You guys leave a comment below. Should we have Colin back on for a third episode?

By the way, for those who don’t know, he used to have our number one episode of all time when he did the top 5 AI prototyping tools. Now since then we’ve managed to release some episodes better, thank God for us as a podcasting team. But hopefully we can break the records with this one.

Drop a comment below what you liked about this episode, whether we should have Colin back on. Colin, thank you so much for dropping all this sauce.

Colin: Yeah, yeah, happy to be here.

Aakash: All right, everyone, see you later.

Final Thoughts (1:01:42) Aakash: I hope you enjoyed that episode. If you could take a moment to double-check that you have followed on Apple and Spotify podcasts, subscribed on YouTube, left a rating or review on Apple or Spotify, and commented on YouTube, all these things will help the algorithm distribute the show to more and more people.

As we distribute the show to more people, we can grow the show, improve the quality of the content and the production to get you better insights to stay ahead in your career.

Finally, do check out my bundle at bundle.AkashG.com to get access to 9 AI products for an entire year for free. This includes Dovetail, Mobbin, Linear, Reforge Build, Descript, and many other amazing tools that will help you as an AI product manager or builder succeed.

I’ll see you in the next episode.

Leave your thoughtsCancel reply