AI tooling

AI and LLM-powered assistants — Claude Code, Cursor, GitHub Copilot, Codex CLI, and the agent features built into modern IDEs — are now a standard part of the local development stack. Treat them as another layer alongside your shell, package managers, and editor: something you install, configure, and learn to use deliberately.

What this looks like in practice:

Code in your editor — inline completions and chat in VS Code, Cursor, JetBrains, or Positron
Code in your terminal — CLI agents like Claude Code or Codex that can read, edit, and run commands in your project
Project-aware context — assistants read your repo, understand your environment, and follow conventions you commit to files like CLAUDE.md or .cursorrules

A few habits that keep this useful instead of chaotic:

Keep AI suggestions inside version control. If you can’t git diff it, you can’t review it.
Pin which model and tool a project expects, the same way you pin a Python or Node version.
Never paste secrets, student data, or private credentials into a hosted assistant. Use .env and a secret manager — see Environment Variables and Secrets.
Read the diff before accepting it. The assistant is a collaborator, not an authority.

Selecting models and token usage

AI assistants vary widely in capability, speed, and cost. Matching the right model to the task saves money and time.

Model tiers:

Fastest/cheapest (e.g., Haiku 4.5, GPT-4o mini): Good for quick edits, inline completions, simple code generation. Use for routine tasks where speed matters more than nuance.
Mid-tier (e.g., Sonnet 4.6): General-purpose work: refactoring, debugging, writing tests, explaining code. A solid default for most development.
Most capable (e.g., Opus 4.7, Claude Pro): Complex tasks: multi-file refactors, designing systems, reviewing production incidents, teaching unfamiliar domains. Worth the cost when you need fewer iterations or better reasoning.

Token usage basics:

Tokens are chunks of text (roughly 4 chars). Every message to the assistant uses tokens: your input and the generated output both count.
Long conversations accumulate tokens fast. For APIs, track usage in logs; for hosted tools, check your account dashboard.
Context windows vary: Haiku and Sonnet get 200K tokens, Opus gets 200K. Larger codebases or long chat histories consume context quickly.
Cost scales with input + output tokens. A $0.005/1M input, $0.015/1M output model is cheap for reading code but expensive for generating long outputs.

Practical strategies:

Start with a faster model for exploration; upgrade to a capable one if you need better results or fewer roundtrips.
Summarize long conversations or close old chats instead of keeping them open forever.
For recurring tasks (linting, PR reviews), use the most efficient model that does the job reliably — don’t overshoot.
In CLAUDE.md files, specify which model the project prefers, just like pinning a language version. This prevents surprise costs when teammates use different tools.
If you’re building an agent or API integration, consider a smaller model for high-volume tasks (logs, webhooks, summaries) and a larger one for code review or design.