Skip to main content

Agentic Coding and Decision Fatigue: The Cognitive Cost of Supervising AI

AI Claude Code Productivity Software Engineering

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

Recently during heavy Claude Code usage, I started noticing an uncomfortable trend. At 8 AM I could run three agent sessions at once, spot a bad abstraction in a 200-line diff, and push back on architectural shortcuts without hesitation. By 3 PM the same work felt like wading through concrete. My prompts got sloppy. I started approving diffs I would have questioned six hours earlier. Twice I caught myself closing a session just to avoid making a decision about it. Once I even prompted the following: "I know you can do better than this. Be thorough and just get it done, bro." The work had not gotten harder. My interest had not faded. I wanted to understand what had changed between 8 AM and 3 PM inside my skull.

So I started reading cognitive psychology papers, which is not how I usually spend a Saturday. Israeli judges denying parole before lunch. Australian GPs over-prescribing antibiotics at 4 PM. Wall Street analysts copying each other's forecasts by end of day. The same pattern, same mechanism, different profession. The thing happening to me in my terminal at 3 PM has a name and a neurochemical explanation, and it applies to agentic coding with a vengeance. We did not get rid of cognitive work by delegating code to AI. We concentrated it.

The Weight That Builds by Afternoon

Let me walk through what a typical session actually looks like. Agent finishes a task. I read the diff. Is this correct? Accept or reject. Now: is this the right architecture, or did the agent pick a pattern I would not have chosen? Do I intervene or let it keep going? Was my prompt clear enough, or did I set it up to fail? Each of these requires real judgment, and they come one after another without pause.

Traditional software engineering spreads these decisions across hours of implementation work. I write a function, run it, debug it, refine it. Decisions come between stretches of typing, and my compile cycle sets the pace. Agentic coding compresses the execution to near-zero and leaves only the decisions. A task that would generate 15 architectural decisions over a four-hour implementation now generates those same 15 decisions in eight minutes. The decision density per hour increases by an order of magnitude.

By afternoon, I have made more consequential technical decisions than a pre-AI engineer would make in a week. That volume does something to you. It is not tiredness, exactly. Not boredom. I can still read code. I can still type. I just cannot evaluate the 47th diff of the day with the same acuity I brought to the first.

The Science of Decision Fatigue

Roy Baumeister and John Tierney gave it a name in their 2011 book Willpower: Rediscovering the Greatest Human Strength. The core idea: your ability to make good decisions degrades the more decisions you make. Baumeister had been poking at this since his 1998 ego depletion experiments, where he made people resist cookies and then watched their problem-solving performance fall off a cliff.

The prefrontal cortex runs the show when you are making deliberate choices. Planning, inhibiting impulses, holding context in working memory, weighing tradeoffs. Pignatiello, Martin, and Hickman (2020) cataloged what drains it: too many choices, too much self-regulation, too much environmental noise. An agentic coding session checks all three boxes before lunch.

But the real breakthrough came in 2022, and it changed how I think about my own afternoons. Wiehler, Blain, and Pessiglione stuck people in an MRI scanner and used magnetic resonance spectroscopy to watch what happened to their brain chemistry over a full workday. The people doing hard cognitive tasks had measurably higher glutamate concentrations in the lateral prefrontal cortex by 5 PM. Glutamate at those levels is neurotoxic. It poisons the very neurons you need for careful judgment.

That is not a metaphor. Not a lack of willpower. Not me being lazy at 3 PM. There is a literal toxic chemical accumulating in the part of my brain responsible for evaluating code diffs, and the only thing that clears it is rest.

Factor Details
Brain region affected Lateral prefrontal cortex (lPFC)
Chemical mechanism Glutamate accumulation at synapses
Effect Neurotoxic impairment of executive function
Recovery Rest, breaks, sleep
First demonstrated Wiehler et al., Current Biology, 2022
Subjective experience Often imperceptible; feels like apathy, not exhaustion

That last row matters most. When I am physically tired, I know it. My eyes burn. My back hurts. Decision fatigue sneaks in sideways. It feels like not caring. Like this particular diff is probably fine, no need to look too closely. Like the default option is good enough. I am not tired. I just do not have the neurochemical capacity to engage my prefrontal cortex on demand anymore. The glutamate has built up and my judgment neurons are running hot.

The Parole Board Study

Everyone who writes about decision fatigue brings up the parole judges, and for good reason. Danziger, Levav, and Avnaim-Pesso analyzed 1,112 parole decisions made by eight experienced Israeli judges across 50 days.

What they found: right after a food break, about 65% of parole requests got approved. Then the rate slid. Case after case, it dropped. By the end of a session it could hit zero. Then the judges ate lunch, came back, and the rate jumped right back to 65%. Same judges. Same types of cases. Different time of day.

"After Break" "2nd Case" "4th Case" "6th Case" "8th Case" "Before Break" 0 10 20 30 40 50 60 70 Approval Rate (%) Favorable Ruling Rate by Position in Session
Parole approval rates across three daily sessions, based on Danziger et al. 2011. The pattern resets after each break.

Levav's explanation: "When judges make repeated rulings, they show an increased tendency to rule in favor of the status quo." In parole hearings, the status quo is continued imprisonment. Denial takes no effort. No reasoning required, no justification to write up. A brain running low on prefrontal cortex capacity will always gravitate toward whichever option requires the least cognitive work.

This study has attracted legitimate criticism. Weinshall-Margel and Shapard (2011) argued that case ordering was not fully random, with unrepresented prisoners scheduled later in sessions. Glockner (2016) demonstrated that a statistical artifact (favorable decisions take longer, pushing them earlier in sessions) could produce a similar pattern. The original effect size (Cohen's d of 1.96) was roughly eight times larger than previously observed in depletion research.

Academics have been fighting about causation for fifteen years now, and for my purposes the debate is irrelevant. Even if you throw out the causal claim entirely, the observed data still shows the same thing: favorable rulings drop within sessions and snap back after breaks. The weakest possible interpretation of that study still tells me everything I need to know about what happens to my code reviews between 9 AM and 4 PM.

Decision Fatigue Across Domains

The parole board study gets all the press, but the data extends across medicine, finance, consumer behavior, and now software engineering.

Domain Metric Early Session Late Session Source
Parole hearings Favorable rulings ~65% Near 0% Danziger et al., PNAS, 2011
Anesthesia Adverse event probability 1.0% (9 AM) 4.2% (4 PM) Healthcare review
GP prescribing Antibiotic over-prescription Baseline +8.7% per 15 encounters Australian BEACH data, 2024
Statin prescribing Appropriate prescription rate Baseline -21.9% per 15 encounters BEACH data, 2024
Financial analysis Forecast accuracy Independent analysis Herd with consensus Hirshleifer et al., JFE, 2019
Car customization Default acceptance Active selection Accept defaults (+EUR 1,500/car) Levav et al.
Cancer screening Screening orders Higher rate Lower rate Primary care studies

Same story every time. Fresh brains make active choices. Depleted brains pick whatever is easiest. Not random mistakes. A consistent, directional drift toward the default, the safe bet, the path that requires the least thinking. Deny parole, prescribe the antibiotic, accept the factory configuration, approve the diff without reading it.

The Medical Data

The medical numbers are worth lingering on because the sample size is enormous and the consequences are real. Across 262,456 encounters with 2,909 GPs, antibiotics got prescribed 8.7% more often for every additional 15 patients a doctor saw. Statins got prescribed 21.9% less. Osteoporosis meds dropped 25%.

Book your next doctor's appointment for 9 AM, not 4 PM. The 4 PM version of your physician will more readily reach for the antibiotic you do not need and skip the statin you do. Same doctor. Same medical school. Same knowledge. Just less willingness to do the cognitive work of deviating from the easiest prescription.

The Financial Analyst Data

Hirshleifer and colleagues (2019) tracked Wall Street analyst forecasts and found the same decay. As the day wore on and the forecast count climbed, analysts stopped doing original analysis. They herded with the consensus number. They reissued their own previous forecast. They rounded to nice even figures. Less work per forecast, worse forecasts overall.

I read that study and thought: that is me at 3 PM reviewing pull requests. Scroll past the diff, approve, move on. Copy the approval pattern from the last three reviews. Defer the complicated one to tomorrow. Same shortcuts, different job title.

Why Agentic Coding Is Uniquely Demanding

Traditional software engineering has a built-in throttle. I sit and think about a design for twenty minutes. Then I type for two hours. Run tests. Stare at a stack trace for thirty minutes. Most of the day is spent on activities that are cognitively cheap: typing, waiting, reading output. The actual decisions (what abstraction to use, how to handle errors, where to put the boundary) come at a human pace. Maybe one every fifteen minutes.

Agentic coding strips all that out. The agent types. The agent waits for builds. The agent reads the stack trace and fixes the bug. What is left for me is the part that cannot be delegated: judgment. Is this prompt right? Is this architecture sound? Do I accept this diff? Should I step in now or let it keep going? Am I shipping this?

The Decision Density Problem

Last Tuesday I ran exactly this experiment. I asked Claude Code to implement a new API endpoint with validation, error handling, database queries, and tests. In traditional development, this task takes roughly four hours. During those four hours, I make perhaps 15 meaningful decisions (function signatures, error handling strategies, query patterns, test structure) interspersed across 240 minutes of implementation work. That is one decision every 16 minutes.

Claude Code completes the same task in 12 minutes. Those same 15 decisions now arrive in the form of diffs and outputs I must evaluate. One decision every 48 seconds. The cognitive demand per minute increased by a factor of 20.

Mode Task Duration Decisions Decision Density Context
Traditional development 4 hours ~15 1 per 16 min Interspersed with implementation
Single agentic session 12 min ~15 1 per 48 sec Back-to-back evaluation
Multiple concurrent sessions 12 min x 3 ~45 1 per 16 sec Parallel evaluation streams

When I run three concurrent sessions for independent workstreams (which is a normal Tuesday for me), that is roughly 45 decisions crammed into 12 minutes. Read the diff, understand the context, decide if it is right. Multiply by three terminals. Nothing in traditional software engineering comes close to that pace.

Generative vs. Evaluative Decisions

Vohs and Baumeister's team ran a series of experiments in 2008 that landed squarely on this point. People who spent their energy actively choosing between options had less self-control afterward than people who just thought about the options without picking. And the kicker: choosing depleted them more than executing a decision somebody else had already made for them.

That maps onto agentic coding with uncomfortable precision. When I write code by hand, most of my time is implementation: executing decisions I already made. When I work with Claude Code, most of my time is evaluation: thumbs up or thumbs down on output somebody else produced. The Vohs data says evaluation burns more cognitive fuel than implementation. I moved my entire job into the most expensive gear and wondered why I was running out of gas by lunch.

The Speed Amplification Effect

Here is the part that makes agentic coding special. The research tracks decisions, not hours. The parole study measured rulings. The prescribing data counted patient encounters. The clock is a proxy. Decisions are the real currency being spent.

Agentic coding burns through that currency at a rate that no previous workflow matches. An engineer working by hand might accumulate 40 real decisions across an entire 8-hour day. I hit 40 by mid-morning. By afternoon the count can pass 150. The agent works fast, which means the decisions pile up fast, which means the fatigue curve bends hard.

The Supervisory Decision Loop

I started mapping where the fatigue actually comes from, and the same loop showed up in every session.

Yes Partially No Agent Prompt Define Intent Craft Prompt Agent Executes Review Output Accept? Next Task Refine Prompt Diagnose Issue I or Report Bug or Workaround
The agentic coding supervisory loop. Each cycle requires at least one accept/reject decision, and many cycles require multiple.

Every box in that diagram is a decision point. I started cataloging them after a particularly bad afternoon session, and the list got long fast.

Decision Type Frequency per Session Cognitive Cost Example
Prompt adequacy 5-15 Medium "Is this prompt specific enough for the agent to succeed?"
Output correctness 5-15 High "Does this diff implement what I asked for?"
Architecture approval 2-5 Very High "Is this the right abstraction for this problem?"
Intervention timing 3-8 Medium "Should I let the agent continue or redirect now?"
Error attribution 2-5 High "Did the agent misunderstand, or did I specify poorly?"
Scope management 3-6 Medium "The agent is adding features I did not ask for. Accept or revert?"
Quality threshold 5-10 Medium "Is this good enough, or should I ask for refinement?"
Context switching 3-8 High "Which session needs attention next?"

On a busy morning with two or three agents running, I cycle through this loop 20-40 times before noon. Add up the decisions per cycle and a four-hour morning can produce 80-120 non-trivial judgment calls. That is a week's worth of pre-AI engineering decisions before lunch.

Decision Quality Degradation Over Time

The degradation follows a pattern I can now predict after tracking it for weeks. Not random mistakes. A steady drift toward whatever is easiest.

Behavior Morning (Fresh) Afternoon (Fatigued)
Prompt specificity Detailed, constrained prompts with clear acceptance criteria Vague prompts: "fix the tests" or "make it work"
Diff review depth Line-by-line analysis, checking edge cases Scroll-and-approve, trusting the green checkmark
Architecture challenges Pushback on suboptimal patterns, alternative proposals "That approach is fine"
Scope control Rejecting agent-added features outside spec "It added logging, that is probably useful"
Error investigation Root cause analysis, proper fix "Just make the error go away"
Session management Strategic task ordering, fresh context per session Stacking unrelated tasks into exhausted sessions
Quality threshold "This needs to be production-ready" "This is close enough for now"

I recognize every row in this table from my own behavior. The afternoon version of me is measurably less effective as a supervisor of AI agents. I approve code I would question. I accept architectures I would challenge. I write prompts that are less likely to produce good first-pass results, which creates more iterations, which creates more decisions, which accelerates the fatigue cycle.

And it compounds. A vague 3 PM prompt produces mediocre output. Mediocre output requires correction. Correction means more decisions. More decisions mean more fatigue. More fatigue means even vaguer prompts. I have watched myself spiral down this loop in real time, usually without noticing until I look at the code the next morning and wonder what I was thinking.

The Context-Switching Tax

Gloria Mark's research at the University of California, Irvine, established that it takes 23 minutes and 15 seconds to fully regain focus after an interruption. Knowledge workers are interrupted every 6-12 minutes on average. Sophie Leroy's concept of "attention residue" (2009) showed that switching tasks leaves cognitive residue from the previous task that persists for 30-60 minutes.

When I run multiple concurrent agent sessions, context-switching becomes the workflow itself. Session one is building an API. Session two is refactoring a data model. Session three is generating test fixtures. Three different mental models, three different codebases worth of context. I bounce between them every few minutes, and that 23-minute refocus penalty from Mark's research never fully clears. I am permanently half-focused on everything, fully focused on nothing.

Concurrent Sessions Context Switches/Hour Attention Residue Effective Decision Quality
1 0 None 100% baseline
2 4-8 Moderate ~80%
3 8-15 High ~65%
4+ 15-30 Severe ~50%

Those percentages are my rough estimates from tracking my own output quality. Your numbers will differ. The direction will not. More sessions means worse judgment per session. At some point the rework from bad afternoon decisions eats the throughput gains from parallelism. I found that point at around three sessions.

Strategies for Managing Decision Fatigue in Agentic Coding

I tried willpower first. Just pay closer attention in the afternoon. That lasted about two days. Willpower is a depletable resource too, which is the whole problem. The strategies that actually stick are structural ones that do not rely on me remembering to be disciplined at 3 PM.

Front-Load Architectural Decisions

The decisions that hurt most when wrong are the architectural ones. Which abstraction. Where the API boundary sits. How the data flows through persistence. I got one of these wrong on a Thursday afternoon last month and spent Friday and half of Saturday unwinding the damage. The same decision at 8 AM would have taken five minutes and been correct.

I now start every morning with architecture. Before any agent session begins, I spend 30-60 minutes writing planning notes in a text editor: the day's technical direction, key constraints, data model sketches, API boundaries. These notes then become the raw material for agent prompts. A planning file that says "the validation layer sits between the API handler and the database client, rejecting malformed input before it touches persistence" gives the agent enough architectural context to produce a clean implementation on the first pass. The agent inherits my best thinking from the morning. It does not need my best thinking at 3 PM.

This workflow has a secondary benefit I did not anticipate. Planning notes in a text file are trivially convertible to agent task descriptions. A morning planning session that produces ten bullet points in a markdown file becomes ten well-scoped agent prompts that I can fire off in sequence or in parallel. The planning artifact itself becomes the dispatch queue.

Pre-Commit Decisions Through Configuration

A decision I encode in a CLAUDE.md file is a decision I never make again. Same for a linter rule, a test assertion, a CI/CD gate. Each one is a decision permanently removed from my afternoon.

Configuration Type Decisions Eliminated Example
CLAUDE.md style rules 5-10 per session "Use semicolons, not em dashes"
Linter configuration 10-20 per session Code formatting, import ordering
Test suites 5-15 per session "Does it pass?" replaces "does it work?"
CI/CD gates 3-5 per session Deployment approval becomes binary
Architecture decision records 2-5 per session Pattern selection already decided

Obama wore the same gray suit every day so he would not waste decisions on clothing. Same principle, applied to engineering. Kill the trivial decisions in config so the finite decision budget survives until the decisions that matter.

Batch by Decision Type

Every time I switch from reviewing code to writing a prompt to evaluating architecture to debugging a failure, I pay a cognitive tax. Those are four different mental modes, and switching between them carries that attention residue Leroy described.

So I batch. All code review in one block. All new feature prompts in another. Debugging gets its own slot. Less flexible than bouncing between whatever needs attention, but my reviews are sharper when I do twenty in a row than when I interleave them with prompting and debugging.

Enforce Strategic Breaks with Pomodoro

Parole judges make better decisions after lunch. My prefrontal cortex recovers glutamate levels during breaks. Taking a break is not lazy. It is maintenance for the only piece of hardware in the loop that cannot be replaced.

I have been using the Pomodoro Technique (Francesco Cirillo, late 1980s) for agentic coding sessions and the fit is surprisingly good. The standard protocol is simple: 25 minutes of focused work, 5-minute break, repeat. After four cycles, take a longer break of 15-30 minutes. Cirillo designed it for exactly the kind of sustained focus work that agentic coding demands, and the 25-minute interval happens to match the natural rhythm of an agent task cycle almost perfectly.

I adapted the timing slightly for agentic coding. A single Pomodoro aligns well with one or two substantial agent tasks: enough time to prompt, review output, iterate once, and ship. The 5-minute break forces me to step back before starting the next task, which prevents the common failure mode of chaining tasks together in a single exhausted session. After four Pomodoros (roughly two hours of agentic work), the longer break is mandatory regardless of momentum.

Interval Duration Activity
Pomodoro 25 min 1-2 focused agent tasks
Short break 5 min Step away from screen, no decisions
Long break (every 4th) 15-30 min Walk, food, full cognitive reset

The critical rule: the break happens whether I feel fatigued or not. The Wiehler glutamate research shows that I cannot accurately assess my own decision quality under load. Perceived freshness and actual prefrontal cortex function diverge. The timer is more trustworthy than my self-assessment.

Nutritional Support for Sustained Decision-Making

Once I understood the glutamate mechanism, the obvious next question was whether I could eat or supplement my way out of it. I spent a weekend reading the literature. Some of it holds up. A lot does not.

First, the bad news for anyone hoping to fix this with a candy bar. Baumeister's glucose model (2007) claimed self-control runs on blood sugar, but the Hagger et al. 2016 replication across 23 labs and 2,141 participants could not find the effect. A sugary soda will not restore your architectural judgment. Stable blood sugar matters (skip the 2 PM candy bar, eat actual food), but targeted glucose is not the answer.

The interventions with the strongest evidence base for sustained cognitive performance:

Intervention Dosage Mechanism Evidence Level
Caffeine + L-theanine 100-200mg each, together Adenosine blockade + glutamate modulation Strong: well-replicated in multiple RCTs
Creatine monohydrate 3-5g daily Increases brain ATP reserves for metabolically demanding tasks Moderate-to-strong: benefits clearest under fatigue/stress
Omega-3 (high DHA) 2000mg+ EPA/DHA daily Membrane fluidity, prefrontal cortex activation, anti-inflammatory Strong for brain structure; moderate for acute performance
Magnesium L-threonate 1.5-2g daily Natural NMDA receptor blocker; protects against glutamate excitotoxicity Moderate: multiple RCTs, but mostly industry-funded

The caffeine-plus-L-theanine combination is the standout. A 2025 double-blind crossover study showed the combination significantly improved selective attention and target discrimination. Caffeine alone masks fatigue without addressing the underlying glutamate accumulation. L-theanine modulates glutamate signaling directly, making the combination attack the problem from both sides: subjective alertness and neurochemical protection.

Creatine is relevant because of the Wiehler finding. If the cost of prefrontal activation is metabolic, more available ATP should extend the window before fatigue degrades decisions. A 2024 meta-analysis confirmed benefits for memory and processing speed, with effects strongest when the brain is under metabolic stress. An afternoon of agentic coding qualifies.

My honest ranking after reading the research and experimenting for a month: sleep first, Pomodoro breaks second, exercise third, real food fourth, caffeine-plus-L-theanine fifth. No pill compensates for five hours of sleep and no breaks. But given proper rest and enforced work intervals, the supplements in that table provide a real edge in the exact cognitive functions that agentic coding burns through fastest.

Reduce Concurrent Sessions

Four concurrent sessions look great on a leverage report. In practice, they quadruple the decision density and impose a context-switching penalty on every single judgment call. The per-session quality drops faster than the throughput increases.

Concurrent Sessions Throughput (Relative) Decision Quality Net Effective Output
1 1.0x High 1.0x
2 1.8x Good 1.4x
3 2.4x Moderate 1.6x
4 2.8x Low 1.4x

Two concurrent sessions is where I have settled. Three works if the tasks are simple and well-scoped. Four only makes sense for work that barely needs supervision at all: batch content generation, mechanical refactoring, formatting runs. Anything requiring actual architectural judgment gets one session and my full attention.

Use the Morning for Judgment, the Afternoon for Execution

I started sorting my agent tasks by how much judgment they require, and the split was obvious. Greenfield architecture, complex refactoring, new API design: these are decision-heavy. Documentation generation, batch content processing, test coverage: these are execution-heavy. My brain does not care which category a task falls in at 9 AM. By 2 PM it very much does.

Time Block Task Type Decision Intensity Examples
8-10 AM Architecture and design Very high New subsystems, API boundaries, data models
10 AM-12 PM Complex implementation High Feature builds, integrations, refactoring
1-3 PM Moderate implementation Medium Test coverage, documentation, bug fixes
3-5 PM Mechanical operations Low Batch processing, formatting, deployments

Hardest decisions get the freshest brain. Batch deploys and formatting runs get the 4 PM brain, which can still push buttons but should not be trusted with architecture.

Teaching the Agent to Compensate

Every strategy I listed above puts the burden on me. That feels wrong. If I am the one losing cognitive capacity over the day, why am I the one responsible for managing it? The agent is the one with stable, consistent judgment at any hour. Let the agent pick up the slack.

I implemented this in my global CLAUDE.md configuration. The agent checks how many leverage records have been logged for the current day. Each leverage record represents a completed task, and each completed task represents a block of supervisory decisions I have already made. The record count serves as a rough proxy for cumulative decision load.

Today's Completed Tasks Agent Behavior
0-3 Normal operation: present options, ask clarifying questions
4-7 Reduce questions, make reasonable defaults when intent is clear
8-12 Minimize decision points, choose conventional approaches unless told otherwise
13+ Maximum autonomy within stated constraints, present completed work rather than choices

The mechanics are dead simple. At session start, the agent counts how many entries today's date has in my leverage factor CSV. Zero records at 8 AM means it asks "which testing framework do you prefer?" Twelve records at 3 PM means it picks whichever framework the existing codebase already uses and moves on without asking. Same agent, different level of hand-holding based on how fried I probably am.

One morning decision, encoded in a config file, eliminates an entire category of afternoon decisions. That is the pattern at its best. My 8 AM brain tells the agent how to handle my 3 PM brain, and neither of us has to think about it again.

Setting this up changed something about how the collaboration feels. When the tool adapts to my cognitive state across the day, it stops being a tool. A good copilot compensates for a fatigued pilot. A good AI agent should do the same. By 3 PM, my Claude Code sessions are more autonomous than my 8 AM sessions, and the code quality does not suffer for it. The agent fills the gap that my prefrontal cortex leaves open.

The Unexpected Upside: Executive Function as a Trainable Muscle

Everything I have written so far frames decision density as a cost, and it is one. But there is a benefit I did not anticipate and only noticed after months of sustained agentic work.

My executive function got stronger.

Not in the afternoon, and not while actively fatigued. But as a baseline capability, the kind of thinking I bring to a fresh problem on a fresh morning. Almost every project or task I come into contact with now, whether it is a software system, a work initiative, or a weekend project around the house, is instantly decomposed in my head into a set of phases and discrete tasks. I see the dependency graph. I see which parts require my judgment and which parts can be delegated to someone else, to an AI agent, or to a tool. The decomposition happens automatically, without effort, like a reflex I did not used to have.

This makes sense if you take the exercise analogy seriously. Lifting heavy weights tears muscle fibers and leaves you temporarily weaker. But the recovery process builds them back stronger. The prefrontal cortex works the same way. Klingberg et al. (2005) demonstrated that sustained working memory training produces measurable increases in prefrontal cortex activation and transfers to untrained tasks. Jaeggi et al. (2008) showed that deliberate cognitive training on demanding tasks improves fluid intelligence, the general-purpose reasoning ability that governs exactly the kind of planning and decomposition I am describing.

Agentic coding is, inadvertently, an executive function training program. Every session forces dozens of rapid-fire planning, evaluation, and delegation decisions. Every day pushes the prefrontal cortex hard enough to trigger adaptation. Over weeks and months, the baseline capacity increases. I am not just getting better at supervising AI. I am getting better at the underlying cognitive skill that supervision requires: breaking a complex situation into parts, identifying the decision points, and routing each part to the right resource.

The practical effect extends well past the terminal. I walk into a meeting about a cross-team initiative and my brain immediately maps it: these are the three workstreams, this is the critical path, these two pieces can run in parallel, this one needs a senior decision before anything else moves. I look at a home renovation and see a Gantt chart. This did not happen before. The muscle was not trained.

It is a strange duality. Agentic coding depletes executive function within a session and strengthens it across sessions. The acute cost is real. The chronic benefit is also real. The strategies in this article address the cost. But if you are wondering whether this style of work leaves a mark on your thinking beyond the code, it does, and the mark is useful.

Key Patterns

If you take one thing from this article: the afternoon slump in agentic coding is not laziness. It is glutamate accumulating in your lateral prefrontal cortex. The same chemistry that makes judges deny parole before lunch makes you approve sloppy diffs before dinner.

Agentic coding did not eliminate cognitive work. It eliminated the easy parts (typing, building, debugging) and left behind the hard part (judgment). Then it compressed the judgment into a fraction of the time, spending your daily decision budget before noon.

My current protocol, for whatever it is worth: all architecture before 10 AM, written into CLAUDE.md so I never revisit it. Pomodoro timer I physically cannot snooze. Caffeine stacked with L-theanine in the morning. And a line in my CLAUDE.md that tells the agent to stop asking me questions after my eighth completed task of the day.

The engineers who keep getting value from agentic coding month after month will be the ones who treat their decision budget like a finite resource, because it is one. The agent can work all night. You cannot. Plan accordingly.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.