I recently built a Harvest clone in 18 minutes, a Trello clone in 19 minutes, and a Confluence clone in 16 minutes. All three were generated entirely by Claude Opus 4.6 from requirements documents. All run in Docker. All work.
But the interesting thing isn't that AI can write code. It's what happens to your relationship with code when regenerating an entire application costs less than a cup of coffee and takes less time than drinking one.
The Shift
Building apps and services is no longer about engineers writing code. It's about writing really good descriptions of apps and services -- requirements documents and technical design documents -- and then letting AI generate the entire application. The code becomes an intermediate artifact, like object files during compilation. You don't edit object files. You edit the source and recompile.
We're approaching the same inflection point for applications themselves. The requirements document is the source. The running application is the compiled output. The code in between is just a transient byproduct of the build process.
What I Mean by Ephemeral
I've almost started treating some of my applications as ephemeral -- meaning that today's version of the app is whatever was generated from the latest version of the requirements. Want to add a new screen? Don't open the codebase and hunt for the right component file. Update the requirements document and regenerate the entire application.
This would have sounded insane a year ago. But the economics have changed. When generating 7,000 lines of working code costs a few dollars in API tokens and takes 19 minutes, the expense of regeneration is no longer material. The code is disposable. The requirements are the investment.
Here's the workflow I've settled into: I have a requirements document and a technical design document for each application. When I want to change something, I update the requirements, hand both documents to Claude, and tell it to build the app. It does. I test it. If something's wrong, I refine the requirements and regenerate. The code never accumulates the kind of entropy that makes traditional codebases increasingly painful to maintain over time, because there is no long-lived codebase. Every build is a clean room.
The Requirements Are the Product
This inverts a relationship that's been stable for decades. In traditional software development, the code is the product and the spec is a planning artifact that gets stale the moment development begins. In ephemeral development, the requirements document is the product. The code is a disposable rendering of that document, regenerated as needed, potentially by a different AI model each time.
This means version control shifts too. You still version the code -- it's useful for debugging and diffing between generations. But the requirements document is your source of truth. It's what you invest time in, iterate on, and protect.
Writing Requirements Is the Hard Part
If the code is ephemeral, then the quality of the requirements determines the quality of the application. This is where the real skill lies now.
I posted on LinkedIn recently about my process for this:
When I am creating something new, something I really want to turn out well, I give the AI a short description of the app I want to build and then ask it to assume the role of an analyst and interview me. I have it ask me as many questions as needed to get a full and complete set of requirements including UI, accessibility, security, authentication, compliance, onboarding, workflows, admin UI, frameworks, and anything else. The AI then begins a very long and tedious process of questioning and it is very thorough and asks excellent questions. Often, it asks me about design details or edge cases I hadn't even thought of. This process can take hours but, in the end, it generates the finest requirements document you could hope for. That document is the deliverable. This is where I invest my time because I now know that I can regenerate the code at any time.
The AI-as-analyst pattern is powerful because it forces completeness. A human writing requirements alone will skip things they consider obvious -- error states, empty states, edge cases in date handling, what happens when the user clicks Back, what the loading state looks like. The AI asks about all of it. It's like pair programming, except for requirements instead of code.
For the Trello clone, I took this even further: I had Claude write the requirements document and the technical design document, then build the application from both. My total input was two prompts. The AI interviewed itself, designed the architecture, and then implemented it. The result was a 6,800-line application with 52 tests, drag-and-drop, markdown editing, dark mode, and a command palette. From two sentences of human input.
The Language Question
If requirements documents become the primary input to software development, an uncomfortable question follows: does this only work in English?
Today's leading AI models -- Claude, GPT, Gemini -- were trained predominantly on English-language data, with programming tutorials, Stack Overflow answers, API documentation, and open-source codebases that are overwhelmingly in English. When I write a requirements document in English, the model draws on that vast training corpus to infer intent, apply best practices, and generate idiomatic code.
But what about a product owner in Tokyo writing requirements in Japanese? Or a startup in S&o Paulo writing in Portuguese? The models do understand these languages and can generate code from non-English prompts. But the quality gap is real. English-language requirements benefit from tighter alignment with the model's training distribution. A requirement like "the dashboard should lazy-load widgets as the user scrolls" maps directly to patterns the model has seen thousands of times in English-language React tutorials. The same requirement in Japanese may produce correct code, but the model has seen fewer examples of that specific pattern described in Japanese, so it may make different -- sometimes worse -- architectural choices.
This is a temporary problem, not a permanent one. Training datasets are becoming more multilingual with every generation. And the requirements document itself is a structured, technical artifact -- closer to a specification than to prose literature. The more structured and precise the document, the less the natural language matters. A well-organized requirements document with clear data models, explicit workflows, and unambiguous acceptance criteria will produce good results regardless of whether the headings are in English, Japanese, or Spanish.
That said, for now, English remains the lingua franca of AI-assisted development, just as it has been the lingua franca of programming itself. If you're writing requirements in another language and getting inconsistent results, consider writing the technical sections -- data models, API specifications, architecture decisions -- in English, even if the feature descriptions and user stories are in your native language. This hybrid approach gives the model the best of both worlds: your domain expertise expressed naturally, and technical precision in the language the model knows best.
The Compilation Analogy
Think about how software compilation works today. An engineer writes source code in a high-level language. They click Build. A compiler transforms their code into machine code -- a lower-level representation that the machine can execute. The engineer never opens the compiled binary in a hex editor to make changes. If there's a bug, they fix the source code and recompile. If they want a new feature, they write more source code and recompile. The compiled output is ephemeral -- regenerated on every build, never manually modified, treated as a disposable artifact of the build process.
Now replace "source code" with "requirements document," "compiler" with "AI model," and "machine code" with "application source code." The workflow is nearly identical:
| Traditional Compilation | Ephemeral App Generation |
|---|---|
| Engineer writes source code | Designer/engineer writes requirements |
| Clicks "Build" / "Compile" | Clicks "Generate" / hands to AI |
| Compiler transforms source to machine code | AI transforms requirements to application code |
| Tests the compiled executable | Tests the generated application |
| Finds a bug → edits source, recompiles | Finds an issue → edits requirements, regenerates |
| Wants a new feature → writes more source, recompiles | Wants a new screen → describes it in requirements, regenerates |
| Never edits the compiled binary | Never edits the generated code |
| Build time: seconds to minutes | Build time: minutes (soon: seconds) |
The parallel is striking because it's not a metaphor -- it's the same pattern at a different level of abstraction. In both cases, a human works in a high-level representation (source code or requirements), a tool transforms it into a lower-level representation (machine code or application code), and the lower-level representation is treated as disposable output. The discipline is the same: invest your time in the input, not the output. Trust the build process. If the output is wrong, fix the input and rebuild.
The only real difference today is speed. Compilation takes seconds. Generation takes minutes. But that gap is closing fast.
The analogy isn't perfect, and one oddity deserves attention: compilation is deterministic; generation is not. If I take the same requirements document and generate an application twice, I get two different codebases. Different variable names, different component structures, sometimes different libraries. Both implement the same requirements. Both work. But they're not the same code.
Why LLMs Produce Different Code Each Time
This non-determinism isn't a bug -- it's a fundamental property of how large language models work. An LLM doesn't "look up" the correct code for a given requirement the way a compiler looks up the correct machine instruction for a given operation. Instead, it generates code one token at a time, with each token selected probabilistically from a distribution of likely next tokens. The model assigns probabilities to thousands of possible continuations at each step, and a sampling process -- controlled by parameters like temperature -- introduces randomness into which token is actually chosen.
When the model is deciding what to name a variable, there might be a dozen reasonable options: boardList, boards, allBoards, boardData. Each has a similar probability. The one that gets selected depends on the random seed at that moment. And that single early choice cascades: if the variable is called boardList instead of boards, every subsequent reference to it throughout the codebase will differ. Multiply this by thousands of naming decisions, structural choices, and library preferences, and you get a codebase that's functionally equivalent but structurally distinct on every generation.
Even with temperature set to zero (fully deterministic sampling), different runs can produce different outputs due to floating-point non-determinism in GPU computations and differences in how the model's attention layers process the context. In practice, no two generations of a non-trivial application will be identical.
This is genuinely strange if you come from a traditional engineering mindset. Compile the same C file twice and you get identical binaries. Generate the same app twice and you get two different apps that do the same thing. It's as if you had a compiler that produced functionally equivalent but structurally distinct machine code on every run.
In practice, this matters less than you'd think. If the code is truly ephemeral -- if you never edit it directly and always regenerate from requirements -- then the specific variable names and component boundaries don't matter. What matters is that the generated application satisfies the requirements and passes the tests. The code is an implementation detail of the build process, and like all implementation details, you shouldn't depend on its specifics.
But it does mean that diffing between generations is sometimes meaningless. You can't always look at a git diff between Tuesday's generation and Wednesday's generation and understand what changed, because the AI might have restructured half the codebase for no functional reason. The diff you care about is between the requirements documents, not the code.
The Consistency Problem
There's a practical tension here worth noting. While the code itself can vary freely between generations, certain aspects of the application shouldn't change unless the requirements change. Users expect branding to be consistent -- the logo in the same place, the same color scheme, the same navigation structure. If you regenerate the app to add a new report page and the sidebar navigation moves from left to right, or the primary button color shifts from blue to green, that's a regression even though nothing in the requirements changed.
Today's models don't guarantee this kind of visual and structural consistency between generations. A future improvement to the ephemeral model would be deterministic anchoring of certain application properties -- layout conventions, branding elements, navigation patterns, component styling -- so that these remain stable across regenerations unless explicitly changed in the requirements. Think of it as a design system that the AI must respect, baked into the requirements as constraints rather than suggestions. We're not there yet, but it's a solvable problem, and solving it would make the ephemeral model dramatically more practical for applications with real users.
The workflow pattern is still converging with traditional compilation. The layer of abstraction that engineers primarily work in is moving up from code to requirements.
The Timeline
Right now, regenerating a full-stack application takes 15-20 minutes with Claude Opus 4.6. That's fast enough to be practical -- I regenerate my apps today -- but it's not fast enough to feel like compilation. You can't iterate at the speed of thought when each cycle takes 20 minutes.
But consider the trajectory:
- 2024: Generating a working app from a prompt was unreliable. It usually needed significant manual fixes. You couldn't walk away.
- 2025: Generating a working app from a detailed requirements document takes 15-20 minutes and usually works on the first try. You can walk away.
- 2026-2027: Build times will compress as models get faster and inference costs continue dropping. Speculative: full app regeneration in 2-5 minutes.
- Beyond: Regeneration in the time it takes to refill your coffee mug. Seconds, not minutes.
When regeneration takes seconds, the workflow changes fundamentally. Product owners and designers will work directly in requirements documents. They'll describe a new screen, click Build, and see it running. The feedback loop between intent and result will be almost instantaneous. The distinction between "designing" an app and "building" an app will blur into nothing.
What Happens to Software Engineers?
Software engineers will still matter. But the job description shifts.
The engineers who thrive will be the ones who are also excellent writers of requirements. Not just "can write a user story" but genuinely skilled at specifying complex systems: data models, state machines, error handling, security boundaries, performance constraints, accessibility standards, edge cases. The ability to think precisely about what software should do -- and articulate it completely enough that an AI can build it -- becomes the core skill.
Engineers also remain essential for:
- Architecture decisions that requirements documents can't fully capture -- choosing between event-driven and request-response, deciding on consistency models, evaluating infrastructure tradeoffs
- Debugging generated code when the AI produces something subtly wrong -- understanding why the code doesn't match the intent requires deep engineering knowledge
- Performance optimization when the generated code is correct but slow -- profiling, identifying bottlenecks, and specifying performance requirements precisely enough that the next generation avoids them
- Security review of generated code -- AI models can and do produce code with security vulnerabilities, and catching these requires the same expertise it always has
- Infrastructure and deployment -- Terraform configurations, CI/CD pipelines, monitoring, alerting. The AI can generate these too (and does, in my workflow), but someone needs to understand what's being provisioned and why
The engineers who struggle will be the ones whose primary value is translating requirements into code -- because that's exactly the task being automated. If your job is taking a Jira ticket and writing the React component it describes, the timeline for that job is measured in years, not decades.
The Implications
If applications become ephemeral -- regenerated from requirements on demand -- several things change:
Technical debt disappears as a concept. There's no accumulated cruft when every build is a clean generation. No legacy code to maintain. No "we should refactor this someday" conversations. The requirements document either describes what you want or it doesn't. If it doesn't, you update it and regenerate.
Framework lock-in weakens. Your requirements document isn't coupled to React, or Flask, or PostgreSQL. Today you generate a React app. Tomorrow you might generate a SwiftUI app from the same requirements. The day after, maybe something that doesn't exist yet. The requirements are portable in a way that code never is.
Every new AI model is a free upgrade. When a new model is released -- faster, smarter, better at code generation -- you don't have to do anything special. You just regenerate your application using the new model. The output will be cleaner code, better performing functions, more idiomatic patterns, more thorough test coverage. Every improvement in AI code generation flows directly into your application the next time you rebuild it. In traditional development, benefiting from better tooling requires a conscious refactoring effort. In ephemeral development, you get the improvements for free simply by regenerating. Your requirements stay the same; the quality of the generated output ratchets upward with each model generation.
The "rewrite vs. refactor" debate ends. You always rewrite. Every generation is a rewrite. The cost of a rewrite drops to nearly zero, so the question of whether to invest in incremental improvement versus starting fresh answers itself.
Onboarding new team members gets easier. Reading a well-written requirements document is dramatically easier than reading a complex codebase. When the requirements are the source of truth, a new team member can understand the entire system by reading a document, not by spelunking through thousands of lines of code across dozens of files.
Testing strategy changes. You stop writing unit tests for implementation details (those change every generation) and focus entirely on integration and end-to-end tests that validate the requirements. The test suite becomes a machine-readable version of the requirements document -- which, come to think of it, is what tests should have been all along.
The translation chasm disappears. In certain domains -- financial modeling, actuarial science, quantitative research, scientific simulations -- there has always been a painful gap between the domain expert who understands the math and the software engineer who implements it. A financial analyst specifies a Monte Carlo simulation with stochastic volatility models, mean-reversion parameters, and correlation matrices. A software engineer translates that specification into Python or C++. Every step of that translation is an opportunity for misinterpretation. The engineer doesn't fully understand the finance. The analyst doesn't fully understand the code. Bugs hide in the gap between them, and they're the worst kind of bugs -- the ones where the code runs fine but produces subtly wrong numbers.
Ephemeral generation eliminates the middleman. The analyst who understands the Black-Scholes variations, the Greeks, the term structure models -- that person writes the requirements. They describe the formulas, the edge cases, the numerical precision requirements, the validation checks. The AI generates the implementation. The analyst can verify the output against known results without ever reading a line of code. If the numbers don't match, they refine the requirements and regenerate. The domain expert becomes the developer, not because they learned to code, but because the barrier between domain knowledge and working software has been removed.
This applies anywhere complex domain knowledge gets lost in translation to code: bioinformatics pipelines, structural engineering simulations, pharmacokinetic models, energy grid optimization. The people who understand the problem best are finally the ones who can build the solution directly.
Jupyter Notebooks Were a Preview
There's a predecessor to this model that's been hiding in plain sight: Jupyter notebooks.
Data scientists and researchers have been working this way for years. A Jupyter notebook interleaves prose descriptions -- explaining the methodology, the assumptions, the mathematical reasoning -- with executable code cells that implement each step. The notebook is the requirements document and the implementation simultaneously. You read the markdown cell that says "Apply a 30-day rolling average to smooth the signal" and immediately below it is the code that does exactly that.
Jupyter notebooks are essentially ephemeral development at the cell level. Researchers routinely delete a code cell, rewrite the description of what they want, and regenerate the implementation -- sometimes by hand, increasingly with AI assistance. The notebook is versioned and shared. The code cells are treated as somewhat disposable; the markdown cells explaining the intent are the durable part.
The ephemeral app model extends this pattern from individual code cells to entire applications. Instead of interleaving requirements and code in a single notebook, you separate them entirely: the requirements document is one artifact, the generated application is another. But the philosophy is the same -- the description of what you want is the primary artifact, and the code that implements it is secondary, regenerable, ephemeral.
If you've ever worked in a Jupyter notebook and found yourself spending more time on the markdown cells than the code cells -- making sure the reasoning is clear, the methodology is documented, the assumptions are explicit -- you've already been practicing ephemeral development. You just didn't have a name for it yet.
The Defect-Free Prerequisite
There's an elephant in the room: this entire premise depends on code generation that is defect-free.
If a button is clicked and nothing happens when it was supposed to show a popup, that's not a requirements problem. The requirements said "show a popup." The AI just didn't implement it correctly. You can't fix that by editing the requirements document -- the requirements were already right. You'd have to either debug the generated code (which defeats the purpose of treating it as ephemeral) or regenerate and hope the next attempt gets it right (which is unreliable if the model has a blind spot).
The ephemeral app model only works when the AI can reliably translate requirements into working code and verify that it works. That means the AI must be able to test and verify everything in the application -- from backend API responses to frontend user interactions. Not just "does the code compile" but "does clicking this button actually show the popup, with the right content, in the right position, dismissable by the right actions."
Today, we're partially there. Claude writes backend tests that cover API endpoints and data integrity. It writes Playwright E2E tests that cover navigation, form submission, and basic user workflows. But the test coverage isn't exhaustive. In my Harvest clone, Claude wrote 32 tests. A thorough QA engineer would have written 200. The gaps between what's tested and what's not are where defects hide -- and those defects break the ephemeral model because they require manual intervention to diagnose and fix.
The full vision -- where you never touch the generated code, where every regeneration produces a perfectly working application -- requires AI that can test every interaction, every edge case, every error state, every visual layout, every accessibility requirement, and every performance target. It requires AI that can look at a rendered screen and judge whether it matches the design intent. It requires AI that can simulate real user behavior and catch the bugs that only appear when you click things in the wrong order.
A year ago, this was unimaginable. Today, it's inevitable. The trajectory of AI-assisted testing -- visual regression testing, AI-driven E2E test generation, model-based testing that explores state spaces automatically -- is converging with AI code generation. When those two capabilities merge completely, the ephemeral app model stops being aspirational and becomes the default way software is built.
We're Not There Yet
I want to be honest about the gaps. Ephemeral app generation works today for a specific category of software:
- Single-user tools with straightforward data models (my Single Serving Applications)
- CRUD applications where the business logic is well-understood
- Greenfield projects where there's no existing data or integrations to preserve
It doesn't yet work well for:
- Large, complex systems with hundreds of screens and intricate business rules
- Systems with critical state where data migration between generations is non-trivial
- Real-time systems with demanding performance requirements
- Systems that integrate with many external services where the integration points are fragile
The gap will close. Models will get better at maintaining consistency across large codebases. Data migration between generations will become a solved problem (or databases will be generated alongside the code, with schema continuity enforced by the requirements). But today, ephemeral generation is practical for small-to-medium applications. For enterprise systems, we're still in the "traditional compilation" era -- editing source code directly and rebuilding incrementally.
The Bottom Line
The most important artifact in software development is no longer the code. It's the requirements document. The code is ephemeral -- generated, tested, deployed, and regenerated when the requirements change. The requirements are durable -- versioned, iterated, and maintained as the single source of truth.
If you're a software engineer, the most valuable skill you can develop right now isn't learning a new framework. It's learning to write requirements so precise and complete that an AI can build the entire application from them without asking a single clarifying question.
If you're a product owner or designer, the tools that will matter most in the next few years aren't Figma or Jira. They're whatever tools emerge for writing, managing, and versioning requirements documents that serve as build inputs for AI code generation.
The era of ephemeral apps is almost here. The engineers and organizations that adapt first -- shifting their investment from code to requirements, from maintenance to regeneration, from frameworks to specifications -- will have a significant advantage.
The requirements are the product. Everything else is a build artifact.
