8080: The next-generation AI inference cloud

Intelligence, everywhere.

Introduction

“Give me a place to stand, and I shall move the Earth.” – Archimedes

Technology is the lever with which humanity moves the world, and AI might be our biggest lever yet. It has the potential to change everything.

But there’s one huge challenge facing the next Bezos, Page, Andreessen, or Zuckerberg of the AI era: intelligence is still too slow, too expensive, and too fragmented. Modern AI chains are multi-stage, multi-pass, and multi-modal, bouncing data across CPUs, GPUs, and network. Latency and cost — not model quality — now set the ceiling on what builders can ship.

We started 8080 to empower developers to create the world-moving companies of the AI era. 8080 is the first cloud built from the ground up for AI-native applications. We’re deploying mixed-compute racks that colocate purpose-built inference ASICs with CPUs, GPUs, and storage at the edge. By keeping every stage of a pipeline in-rack and just milliseconds from your core app servers or your end-users, we unlock applications that simply couldn’t exist before.

“If I have seen further, it is by standing on the shoulders of giants.” – Sir Isaac Newton

We envision a world where intelligence is everywhere, incorporated into every object, making human life easier and better every second of every day. Intelligence that is too cheap to meter and too fast to notice.

The Googles and Amazons of the AI era haven’t been built yet. They will be built on 8080, and because of 8080. We’ll be the shoulders upon which they stand.

High-performance inference

The cornerstone of our infrastructure are next-generation inference ASICs. These chips run inference 100x faster, and 2000x+ more efficiently on a token/watt basis. 8080 will be the first cloud built from the ground up around this transformative technology.

Intelligence, everywhere (the future of software)

We talk of companies being “AI-native,” but few have really scratched the surface. AI can do so much more. When model inference and the infrastructure surrounding it are fast and cheap enough, intelligence can be present in every layer of the stack and even in every function.

We want to support the most ambitious and perhaps crazy applications that will redefine “AI-native.” Somewhere, a dev is dreaming about adding AI into every frontend page render like some sort of intelligent CDN, another is dreaming about building a new Salesforce or Amazon that uses AI to do everything from search to generation to database operations, and a third is dreaming of something completely new that will change everything.

That future is ready to compile — build it on 8080.


Working at 8080

We are not building a normal company. We’ve done that before, and believe that in the AI era there must be a better way. We are designing 8080 to accomplish our mission and are capping headcount at 10 until we reach $100M in revenue. That might not be possible, but we are going to try. In order to do that:

Open Roles

If you’re interested in joining us, send us an email at join[at]8080[dot]io.

We’re looking for partners who want to build for building’s sake. Every partner is a full-stack builder, first, but also has, either by experience or passion, expertise that augments the rest of the team. We are looking for partners with expertise in the following areas:

Infrastructure & Systems

North Star: server utilization and latency. Constructing state-of-the-art LLM inference infrastructure from scratch that handles millions of requests per second, maximizes hardware utilization, and intelligently routes each request to the optimal edge PoP for the lowest possible latency. This includes designing and implementing the global routing engine that decides—in microseconds—where every request should execute. Leveraging expertise in high-performance, concurrent, and distributed systems; proficiency in system programming languages like Rust, C++, or Zig; and experience with Postgres, AWS, Redis, Kafka, Zipkin, or Jaeger to architect a robust, scalable backend that integrates seamlessly with novel hardware, edge datacenters, and API services.

Finance & Operations

North Star: cost per token and revenue capacity. Managing the operations and finances of a company that is rapidly scaling hardware infrastructure and obsessed with keeping customer costs as low as possible. Controlling the end-to-end flow of capital, from equity financing to debt leverage to capex to opex to pricing strategies and customer contracts. Building automated systems to scale to hundreds of millions in revenue with very few people.

Developer Experience

North Star: time to value. Crafting and enhancing all aspects of developer tooling and experience—from CLIs, documentation, and libraries to demos and community engagement. Building automation to support millions of developers, leveraging a passion for improving the ease with which they can build, thereby fostering a vibrant developer community.


FAQ

When will you launch?

As we build out capacity, we’ll start bringing on select customers in the Fall of 2025.

What kind of performance can I expect?

Metric Est.
Input Tokens Per Second 300,000
Output Tokens Per Second 30,000
Time to First Token (Metal) 50 µs
Time to First Token (Cloud) 20 ms

How much will it cost?

Our goal is to make intelligence too cheap to meter. As of now, we’re targeting to charge less than $0.05 per million tokens, regardless of input or output, fine-tuned or not.

Why are you called 8080?

How can I learn more?

We’ll be adding more detail here as we get closer to launch. Until then, you can add your email here to stay in touch, and follow us on X.