AI tooling
AI and LLM-powered assistants — Claude Code, Cursor, GitHub Copilot, Codex CLI, and the agent features built into modern IDEs — are now a standard part of the local development stack. Treat them as another layer alongside your shell, package managers, and editor: something you install, configure, and learn to use deliberately.
What this looks like in practice:
- Code in your editor — inline completions and chat in VS Code, Cursor, JetBrains, or Positron
- Code in your terminal — CLI agents like Claude Code or Codex that can read, edit, and run commands in your project
- Project-aware context — assistants read your repo, understand your environment, and follow conventions you commit to files like
CLAUDE.mdor.cursorrules
A few habits that keep this useful instead of chaotic:
- Keep AI suggestions inside version control. If you can’t
git diffit, you can’t review it. - Pin which model and tool a project expects, the same way you pin a Python or Node version.
- Never paste secrets, student data, or private credentials into a hosted assistant. Use
.envand a secret manager — see Environment Variables and Secrets. - Read the diff before accepting it. The assistant is a collaborator, not an authority.
Selecting models and token usage
AI assistants vary widely in capability, speed, and cost. Matching the right model to the task saves money and time.
Model tiers:
- Fastest/cheapest (e.g., Haiku 4.5, GPT-4o mini): Good for quick edits, inline completions, simple code generation. Use for routine tasks where speed matters more than nuance.
- Mid-tier (e.g., Sonnet 4.6): General-purpose work: refactoring, debugging, writing tests, explaining code. A solid default for most development.
- Most capable (e.g., Opus 4.7, Claude Pro): Complex tasks: multi-file refactors, designing systems, reviewing production incidents, teaching unfamiliar domains. Worth the cost when you need fewer iterations or better reasoning.
Token usage basics:
- Tokens are chunks of text (roughly 4 chars). Every message to the assistant uses tokens: your input and the generated output both count.
- Long conversations accumulate tokens fast. For APIs, track usage in logs; for hosted tools, check your account dashboard.
- Context windows vary: Haiku and Sonnet get 200K tokens, Opus gets 200K. Larger codebases or long chat histories consume context quickly.
- Cost scales with input + output tokens. A $0.005/1M input, $0.015/1M output model is cheap for reading code but expensive for generating long outputs.
Practical strategies:
- Start with a faster model for exploration; upgrade to a capable one if you need better results or fewer roundtrips.
- Summarize long conversations or close old chats instead of keeping them open forever.
- For recurring tasks (linting, PR reviews), use the most efficient model that does the job reliably — don’t overshoot.
- In CLAUDE.md files, specify which model the project prefers, just like pinning a language version. This prevents surprise costs when teammates use different tools.
- If you’re building an agent or API integration, consider a smaller model for high-volume tasks (logs, webhooks, summaries) and a larger one for code review or design.