What I Learned Losing the AI Builders Hackathon

Failure is simply the opportunity to begin again, this time more intelligently.
— Henry Ford

The Assignment

On May 8th, I competed at AI Builder Day, hosted by JustBuild in Salt Lake City. Around 100 builders showed up across multiple tracks. The JobNimbus track had one prompt: take a property address and produce roof measurements plus a quote-ready estimate. Stack was open. Prize was $10,000.

Simple brief. Genuinely hard problem. I was in.

What I Built

The UX was about as minimal as it gets: one address field, one button. You type an address, press Get Quote, and roughly four seconds later the roof area, pitch, material breakdown, and line-item estimate appear on the page with a Download Quote button at the bottom. The PDF it generates is two pages — a clean customer-facing quote on page one, a satellite image with the full per-segment calculation detail on page two.

Under the hood, the pipeline is: geocode the address → query the Google Solar API for measured pitch and area per roof segment → run deterministic Python math for materials and cost → render via ReportLab. Claude vision stays on standby as a fallback for any address the Solar API doesn't have coverage on. The live demo is still up at jobnimbus.ahcomputing.com if you want to try it. The full build journey and iteration log is on GitHub.

The Iteration Story

I joined about ninety minutes late — assignments went out at 1:30 PM, and I didn't start building until 3:00. Out of a roughly 24-hour build window, I used four hours. I hit my accuracy goal, called it done, and walked away. That decision is on me, and it's the clearest lesson I'm taking out of this. I ran four rounds of iteration in those four hours, each one diagnosing a different failure mode.

Round 1 was Claude vision on a single satellite image at zoom 19. Average area error: 53%. The Humble, TX house came back nearly double the reference because Claude was measuring the entire cul-de-sac cluster instead of just the target structure. Pitch was defaulting to 6/12 on almost everything.

Round 2 added a dual-zoom approach, a regional pitch floor table, and a sanity ceiling. Better — down to 34% average error — but the sanity threshold wasn't tight enough to catch the Humble outlier, and the 4/12 pitch floor overcorrected on steeper roofs.

Round 3 tightened the zoom levels, tightened the sanity ceiling, and gave Claude a more detailed climate pitch table baked directly into the prompt. Average error dropped to 16%. Pitch was still wrong on every test house. The fundamental issue became clear: pitch estimation from aerial shadows is not reliably solvable with vision models. Sun angle, cloud cover, time of capture, and roof color all corrupt the shadow depth. No amount of prompt engineering was going to fix it.

Round 4 was the pivot. Instead of asking a vision model to guess what Google had already measured with better data, I wired in the Solar API's buildingInsights endpoint directly. Real measured pitchDegrees and areaMeters2 per segment, from high-resolution aerial ML imagery. Average error: 2.1%. Three of five test houses landed within 1% of the commercial EagleView/Geospan reference data.

At that point, I considered it done. That was a mistake.

Why I Lost

The judging criteria had five categories: Accuracy, Product, Experience, Craft, and Demo. I was reasonably competitive on the first three. I did nothing for the last two.

Category	My position
Accuracy	2.1% avg error — strong
Product	Clean quote PDF, downloadable, live URL — solid
Experience	Fast, but bare. No visual feedback, no animation, no personality
Craft	Pipeline is clean; presentation of the code was not a priority
Demo	No showcase video. No story. Just the tool running.

The winning team's tool took about 30 seconds end-to-end versus my four. Their UI had loading animations, visual progress indicators, and a flow that made waiting feel intentional. Watching their showcase, I understood immediately: they hadn't just built a tool, they'd built an experience. The difference in output quality was polish, not accuracy. We were probably pulling from the same Solar API by the end.

I'm used to building tools. I build things that work, things that are fast, things that are correct. What I didn't prioritize — and what the judges were explicitly scoring — was how it felt to use. Those are different skills, and I underweighted them here.

Speed alone isn't enough. Slower is better, within reason, if the user gets a richer experience along the way.

What I'm Taking Forward

I don't regret entering. Losing with something I actually built and iterated on is worth more than not entering at all. I shipped something real, it's still running, and I walked away with a clear-eyed picture of where I need to grow.

At the next hackathon: I use all of my allotted time. Hitting the accuracy goal is the floor, not the finish line. The remaining hours go toward presentation — loading states, visual feedback, a demo video, a narrative that makes someone want to root for what I built. I'll also go above the baseline requirements instead of stopping at them.

I'm genuinely glad I was in the room. The experience was invaluable — not just the build, but seeing what everyone else made, watching the winners showcase, and calibrating where I actually stand. That calibration is only possible if you show up and compete. Even if you lose.

The tool is still live. Try it.