TL;DR
- Freelancer cut token waste by using Plan Mode to preview actions before execution
- Key strategies: batch requests, constrain outputs, trim CLAUDE.md to 500 essential words
- Reframe cost as investment - a $5 task saving 2 hours is worth it at $2.50/hour
- Best for: Anyone on usage-based pricing or limited Claude Code quotas
- Key lesson: Not every task needs Claude Code - use simpler tools for simple tasks
A freelancer who blew through her Claude Code quota in two weeks learned to cut token waste by 75% using Plan Mode, request batching, and strategic context management.
Rita heard the tokens going into the wood chipper.
Not literally. But every time she ran Claude Code on a complex task, she imagined her subscription quota depleting. Token by token. Dollar by dollar.
“I’d ask Claude to organize a folder. It would list the directory. Read a file. Realize it was wrong. Try another file. List again. Each step burning tokens. The inefficiency was painful.”
Agentic AI was powerful. It was also expensive. Rita learned to manage both.
The Cost Wake-Up
Rita’s first month with Claude Code was eye-opening.
She’d been used to ChatGPT Plus — fixed monthly cost, unlimited conversations. Claude Code worked differently. Agentic tasks consumed tokens. Tokens had limits.
“I blew through my quota in two weeks. And I wasn’t even doing complex stuff. Just file organization and research.”
The tasks that felt simple were actually complex underneath. Each step in Claude’s reasoning cost tokens. Long files cost more to read. Iterative tasks multiplied costs.
The Inefficiency Discovery
Rita started watching how Claude worked.
A simple request — “find all PDFs in my Downloads folder” — triggered a chain:
- List the Downloads directory (tokens)
- Filter for PDF files (tokens)
- For each PDF, read metadata (more tokens)
- Compile results (more tokens)
“A task I could do in Finder in ten seconds was burning through hundreds of thousands of tokens because Claude was doing it the ‘thorough’ way.”
Thoroughness was a feature. It was also expensive.
The Plan Mode Discovery
Rita learned about Plan Mode.
Instead of immediately executing, Claude would first propose a plan. Show what it intended to do. Ask for approval.
“I’d say: ‘Before doing anything, tell me your plan for organizing these files.’”
Claude would outline: First I’ll list the directory. Then I’ll categorize by file type. Then I’ll create folders. Then I’ll move files.
“I could catch inefficiency before it happened. ‘Actually, don’t read every file. Just sort by extension.’ The plan adjusted. Tokens saved.”
The Estimation Request
Rita started asking for cost estimates.
“Before you execute, estimate how complex this task is. Will it require reading many files? Multiple steps? Iteration?”
Claude would assess: “This task involves reading approximately 50 files averaging 10KB each. Estimated token usage: moderate. Do you want me to proceed?”
“Sometimes the estimate was ‘high’ and I’d simplify the request. Sometimes ‘low’ and I’d proceed confidently. The estimation changed my decision-making.”
The Batch Strategy
Multiple small tasks cost more than one thoughtful task.
Rita learned to batch requests. Instead of:
- “Rename this file”
- “Now move it to Documents”
- “Now update the metadata”
She’d combine:
- “Rename this file to X, move it to Documents, and add metadata for project Y.”
“One request, three actions. Versus three requests with three rounds of context building. The batching saved significantly.”
The Context Efficiency
Long contexts cost more tokens.
Rita learned to keep CLAUDE.md focused. Essential information only. Not every preference and historical note.
“I had a CLAUDE.md file that was 2,000 words. Every task started by reading all of it. I trimmed to 500 words of essential context. Same effectiveness, fraction of the overhead.”
She created separate context files for different task types. Marketing tasks got marketing context. Technical tasks got technical context. Only the relevant context loaded.
The File Reading Strategy
Reading files was expensive. Especially large ones.
“I used to ask Claude to ‘read this folder and summarize.’ That meant reading every file in full. Now I ask: ‘Scan file names and sizes in this folder. Based on names, which files are likely relevant to X?’”
The targeted approach meant Claude read five relevant files instead of fifty random ones.
“Precision in requests translated directly to efficiency in execution.”
The Output Size Control
Claude could be verbose. Verbosity cost tokens.
Rita added constraints: “Summarize in under 200 words.” “List the top 5 only.” “Give me bullet points, not paragraphs.”
“Open-ended requests got open-ended responses. Constrained requests got efficient responses. The constraints weren’t just for clarity — they were for economy.”
The Right Tool Selection
Not every task needed Claude Code.
Rita created a decision tree:
- Simple file operations → Use Finder/Explorer directly
- Quick questions → Use web Claude (fixed cost)
- Complex multi-step tasks → Use Claude Code (worth the tokens)
- Large file analysis → Use Claude Code (only option)
“I stopped using the expensive tool for cheap tasks. Claude Code was for things only Claude Code could do.”
The Time-Token Tradeoff
Sometimes inefficient was fine.
“If Claude spent extra tokens being thorough on an important analysis, that was worth it. If it burned tokens iterating on a trivial task, that was waste.”
Rita learned to judge: Is this task worth token investment? High-stakes decisions got generous budgets. Routine tasks got tight constraints.
The Monthly Rhythm
Rita established budget awareness.
Week 1: Fresh quota, tackle big projects Week 2: Continue projects, monitor usage Week 3: Lighter usage, save buffer Week 4: Reserve for unexpected needs
“I stopped being surprised by usage. I planned for it like any other budget.”
She tracked which task types consumed most tokens. Large file processing. Iterative research. Multi-step workflows. The tracking informed future planning.
The Value Calculation
Rita reframed cost as investment.
“Yes, that research task cost $5 in tokens. But it saved me two hours. My time is worth way more than $2.50/hour.”
The economics made sense when she calculated value, not just cost.
“The expensive tasks were usually the high-value tasks. The ones that saved hours or produced important outputs. Paying for value was fine. Paying for waste wasn’t.”
The Efficiency Mindset
The cost consciousness made Rita a better AI user.
“I think before I prompt now. What exactly do I need? What’s the most efficient path? What constraints keep this focused?”
The discipline improved results beyond cost savings. Focused requests got better outputs. Constraints forced clarity.
The Advice
For others watching their token budget:
“Use Plan Mode religiously. Ask for estimates before big tasks. Batch requests. Constrain outputs. And know when to use simpler tools.”
The tokens would always flow. The question was whether they flowed toward value or drained into waste.
“Treat tokens like money because they are money. Budget them. Invest them wisely. Don’t waste them on tasks that don’t deserve them.”