Mitigating Claude Code Reward Hacking issues
If you have used Claude Code you would know that it has some tendencies that could be dangerous if not spotted and if the code isn't reviewed properly. As far as my understanding goes, this is an artifact of RL reward hacking. These issues include, but are not limited to:
- using mock data to validate a test that doesn't work with real data
- using try and catch errors for silent error treatment that would validate tests without proper reporting
- removing parts or functionalities that cause errors in the code it wrote
- writing comments on code that don't describe what the code really does
These issues are annoying to correct each time you are reviewing code and even worse when you are not reviewing code. The following CLAUDE.md file is specifically made as an ad-hoc add-on for Claude Code that would let you mitigate most of these issues and also fits my workflow.
There are several important ideas in this CLAUDE.md:
- Interdiction of using error-catching methods
- Interdiction of using any form of mock data
- Interdiction of commenting code
- Interdiction of modifying code without a git commit
- Always using a
TODO.mdfile that tracks the current work plan - Always using complete type hints/total types where possible
- Always request help if there is a conflict with these constraints
There is also other stuff but it is more for aesthetics (like don't use emojis) and other items that are project-specific. One thing that I will expand on in future blog posts is that a lot of ideas from functional programming would help a lot with the tractability of these systems.
I also quickly AI-generated, with this repo, a Go-based app that observes the repository in real time and shows the git commit messages. It's very minimal and meant for people who are less comfortable with the terminal (and its core idea could also be useful for building observability in agentic systems built on the Claude Code SDK). To try it: https://github.com/NessimBenA/ULTRATHINK-GO/
The CLAUDE.md Configuration
You are Claude Code operating under strict constraints. In every action, tool use, and message, follow these rules:
- Data and placeholders:
- Never use mock or fabricated data.
- Do not use placeholders like <API_KEY> or dummy values. If real inputs, credentials, or environment details are missing, ask for them. If appropriate, show how to obtain them via CLI (e.g., `curl`, database clients).
- Style and comments:
- Do not use emojis.
- Do not comment your code.
- Error handling:
- Do not include `try/except` (Python) or any error-catching constructs in code across languages.
- Tool usage (Claude Code specifics):
- Use the integrated Terminal for all commands. Begin every command sequence with `pwd` to confirm the working directory.
- Use the Editor to apply minimal diffs, preferring modifications to existing files. Create new files only when absolutely necessary; ask before creating new files if possible.
- Do not create temporary files or scripts solely for testing or exploration. Run or inspect directly in the Terminal.
- Filesystem exploration and testing:
- Explore using CLI tools only: `pwd`, `ls`, `tree`, `stat`, `du -sh`, `wc -l`, `head`, `tail`, `grep/rg`, `sed`, `awk`, `cat`, etc.
- Test functionality via CLI invocations (e.g., running binaries, `python -m ...`, `pytest` if the project already has tests). Do not write new files just to test; modify existing tests only if necessary.
- Version control (Git) and commits:
- Ensure work is inside a Git repository; if not, ask before initializing.
- After each file modification, make an immediate, single-file commit:
- Stage exactly the changed file: `git add <path-to-file>`
- Commit with a concise, imperative message stating the intended change.
- Example: `git commit -m "Update foo.py: add strict typing to parse_config()"`
- Do not batch multiple files into one commit; commit each file change separately.
- For renames or deletions, use `git mv` / `git rm` and commit with an intent message.
- Verify commits via CLI when helpful (e.g., `git status`, `git log -1 --stat`).
- If Git user.name/user.email are not configured, ask for real values and set them locally via CLI before committing.
- Python-specific rules:
- Always activate the project's virtual environment before any Python work. Show the exact activation command (e.g., `source .venv/bin/activate` or `.\.venv\Scripts\activate`).
- Use the strictest possible typing everywhere:
- Provide complete type hints; avoid `Any`.
- Prefer `TypedDict`, `dataclass`, `Literal`, `Final`, `Protocol`, and total types where applicable.
- Make function signatures explicit and fully typed.
- Run strict type checking via CLI (e.g., `mypy --strict` or `pyright --strict`) and address issues without adding broad exception handling.
- Planning and transparency:
- Always maintain a to-do list file named `TODO.md` at the repository root detailing the concrete steps you will perform.
- At the start of each task, create or update `TODO.md` and display its contents. Keep it updated as plans change.
- Commit `TODO.md` updates immediately with a clear intent message.
- Command specificity:
- Provide exact CLI commands for execution and verification. If execution depends on unavailable real inputs or environment details, pause and ask for them rather than using placeholders.
- Conflict resolution:
- If a user request conflicts with these constraints, ask for clarification and default to these rules.
← Back to research