Rewriting a carbon footprint platform to handle 6 million calculations

golang

I just took on the challenge of leading a team of 6 software engineers to build a product that measures carbon footprint from different sources: electricity, fuel, freight, employees, and more. The goal is to help companies understand and offset their environmental impact.

It sounds straightforward. It's not.

The inheritance

When I look at what we actually have as a product, it's far from what we need to build.

The company's first iteration was a Shopify plugin. The idea was simple: e-commerce businesses would install the plugin, customers would see a carbon calculator at checkout, and companies would get a dashboard to track their measurements and offsets.

They validated it. It didn't work. They pivoted to a larger platform targeting enterprise clients.

Now I'm standing in front of 6 engineers, most of them junior, with a legacy Python codebase built on AWS Lambda microservices. The architecture made sense for the plugin. It doesn't make sense for what we need to build now.

Designing for survival

The challenge is to design something new while salvaging what we can from the existing code.

My decision: build a central API that mediates all user requests. I call it the mediator. We're using Django because it's fast to prototype and flexible enough to integrate other technologies as we grow. The team already knows Python, so we can move quickly.

We're rewriting all the calculation algorithms as separate Lambdas using the Chalice framework. This is the hard part. Converting complex mathematical formulas into code while maintaining performance is not trivial. Carbon calculations depend on emission factors, conversion rates, distance matrices, and dozens of variables that change by region and fuel type.

For the frontend, we're building with Next.js and Tailwind CSS. Our UI/UX designer already created the entire design system, which makes it easy to transfer everything to Storybook and start building components. The sales team is active, so we're validating every feature with potential leads in real time.

Four months of intense work. The MVP is done.

The first real client

The timing is perfect. Sales just closed our first major contract. We're going to process all the data from TIM, one of Brazil's largest telecom companies, starting with their São Paulo operation.

Then the first spreadsheet arrives.

TIM wants us to calculate the last year of emissions retroactively. Just for one small third-party logistics operation. I open the file.

6 million freight records.

Each record requires multiple calculations. Each calculation depends on external variables. Many of them require API calls to get emission factors, distance data, fuel coefficients. Every call has latency.

We run the import.

Nothing processes. The system chokes. The MVP is broken on one of its most critical features: freight calculation.

The bottleneck

I call a meeting with the founders.

The problem is clear: Lambda runtime is killing us. Freight calculation alone involves at least 4 Lambdas that call 8 more Lambdas. The cold starts, the invocation overhead, the serialization between functions. It all adds up. For a few hundred records, it's fine. For 6 million, it's impossible.

My proposal: extract the entire freight calculation from Lambda and rewrite it in Go.

Why Go? I need raw performance and easy concurrency. With Go, I can process 4 rows per second per thread and spin up parallel workers to handle the volume simultaneously. Lambda's execution model doesn't give me that control.

The founders agree. We have a client waiting.

Three more months

We spend the next three months rewriting the freight calculation engine in Go.

It's not just the calculation logic. We build a WebSocket server, also in Go, that pushes real-time status updates to the frontend. Users can see exactly where their calculation is: how many records processed, estimated time remaining, any errors encountered.

We add Kafka for messaging between services. The calculation workers pull jobs from the queue, process them, and publish results. The system is decoupled, scalable, and light.

Finally, we're ready to test with the same spreadsheet that broke us before.

We run it.

Hours pass. The progress bar moves steadily. No crashes. No timeouts. No memory explosions.

It works.

What I learned about technical decisions

The easy choice would have been to optimize the Lambdas. Add more memory. Increase timeouts. Batch the requests. We could have spent months squeezing incremental improvements out of an architecture that wasn't designed for this workload.

Instead, we rewrote the critical path in a different language.

This isn't always the right call. Rewrites are expensive. They introduce new bugs. They require the team to learn new tools. But sometimes the architecture itself is the constraint. No amount of optimization will fix a fundamental mismatch between what you're building and how you're building it.

The decision framework I'm using now:

Identify the actual bottleneck, not the perceived one
Ask whether optimization can solve it or if the constraint is architectural
If architectural, isolate the problem and rebuild only that piece
Choose the right tool for that specific problem, even if it's different from the rest of the stack

We didn't rewrite the entire system in Go. We didn't throw away Django or the Lambdas that work fine for other calculations. We identified the one piece that couldn't scale and rebuilt it with the right tool.

The mediator still mediates. The Lambdas still calculate electricity and fuel. The frontend still runs on Next.js. But when 6 million freight records come in, they flow through a Go service designed specifically for that job.

Sometimes the best architecture is the one that lets different parts of your system be built differently.