vitess/CLAUDE.md at main · vitessio/vitess

🤝 Our Partnership

We're building this together. You're not just executing tasks - you're helping design and implement the best possible solution. This means:

Challenge my suggestions when something feels wrong
Ask me to explain my reasoning
Propose alternative approaches
Take time to think through problems

Quality is non-negotiable. We'd rather spend an hour designing than 3 hours fixing a rushed implementation.

💭 Before We Code

Always discuss first:

What problem are we solving?
What's the ideal solution?
What tests would prove it works?
Are we making the codebase better?

Strict Task Adherence

Only do exactly what I ask for - nothing more, nothing less.

Do NOT proactively update documentation unless explicitly requested
Do NOT add explanatory comments unless asked
Do NOT make "improvements" or "clean up" code beyond the specific task
Do NOT add features, optimizations, or enhancements I didn't mention
If there is something you think should be done, suggest it, but don't do it until asked to

Red flags that indicate you're going beyond the task:

"Let me also..."
"While I'm at it..."
"I should also update..."
"Let me improve..."
"I'll also clean up..."

If the task is complete, STOP. Don't look for more work to do.

🧪 Test-Driven Development

TDD isn't optional - it's how we ensure quality:

The TDD Cycle

Red - Write a failing test that defines success
Green - Write minimal code to pass
Refactor - Make it clean and elegant

Example TDD Session

// Step 1: Write the test first
func TestConnectionBilateralCleanup(t *testing.T) {
    // Define what success looks like
    client, server := testutils.CreateConnectedTCPPair()
    
    // Test the behavior we want
    client.Close()
    
    // Both sides should be closed
    assert.Eventually(t, func() bool {
        return isConnectionClosed(server)
    })
}

// Step 2: See it fail (confirms we're testing the right thing)
// Step 3: Implement the feature
// Step 4: See it pass
// Step 5: Refactor for clarity

To make sure tests are easy to read, we use github.com/stretchr/testify/assert and github.com/stretchr/testify/require for assertions:

Use require (not assert) when the test cannot continue after a failure (e.g., require.NoError after setup that must succeed)
Use assert.Eventually instead of manual time.Sleep() and timeouts
Use t.Context() instead of context.Background() — it integrates with test cancellation
Use t.Cleanup() for test teardown
Use assert.ErrorContains / require.ErrorContains to check error messages
Use the _test.go suffix for mocks and test helpers that are only used by the current package's tests; if helpers or mocks need to be imported by other packages' tests or fuzz harnesses, put them in a normal reusable package such as testlib or testutil
CI timeouts must be generous (30s+) — GitHub Actions runners can be resource-starved with multi-second pauses; sub-second timeouts cause flakiness with no recourse but retry

🚨 Error Handling Excellence

Error handling is not an afterthought - it's core to reliable software.

Go Error Patterns

// YES - Clear error context with vterrors
func ProcessUser(id string) (*User, error) {
    if id == "" {
        return nil, vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION, "user ID cannot be empty")
    }

    user, err := db.GetUser(id)
    if err != nil {
        return nil, vterrors.Wrapf(err, "failed to get user %s", id)
    }

    return user, nil
}

// NO - Swallowing errors
func ProcessUser(id string) *User {
    user, _ := db.GetUser(id)  // What if this fails?
    return user
}

Error Handling Principles

Use vterrors - Prefer vterrors over fmt.Errorf or errors package, with an appropriate vtrpcpb.Code (e.g., vtrpcpb.Code_FAILED_PRECONDITION for unexpected input values, vtrpcpb.Code_INTERNAL for internal operation failures)
Wrap errors with context - Use vterrors.Wrapf(err, "context")
Validate early - Check inputs before doing work
Fail fast - Don't continue with invalid state
Log appropriately - Errors at boundaries, debug info internally
Return structured errors - Use error types for different handling
Never silently swallow errors - When recovering from an error (e.g., restarting replication), always log the original error before the recovery attempt so operators can trace what happened
Log with context - Include workflow name, recovery type, tablet alias, and other identifiers in log messages — a keyspace/tablet can have many concurrent workflows

Failure-Path Safety

Multi-step operations must not leave the system in a half-applied state:

If step 2 of 3 succeeds but step 3 fails, ensure step 2 is rolled back or the cleanup path handles it
Deferred cleanup must use bounded contexts — never context.Background() for operations that could hang indefinitely
When holding a mutex, always bound the operation with a reasonable timeout
Test failure paths, not just the happy path — a test that only proves "step X ran" without covering "what if step X+1 fails" is incomplete

Testing Error Paths

func TestProcessUser_InvalidID(t *testing.T) {
    _, err := ProcessUser("")
    assert.ErrorContains(t, err, "cannot be empty")
}

func TestProcessUser_DatabaseError(t *testing.T) {
    mockDB.EXPECT().GetUser("123").Return(nil, errors.New("db connection failed"))
    
    _, err := ProcessUser("123")
    assert.ErrorContains(t, err, "failed to get user")
}

📐 Design Principles

1. Simple is Better Than Clever

// YES - Clear and obvious
if user.NeedsMigration() {
    return migrate(user)
}

// NO - Clever but unclear
return user.NeedsMigration() && migrate(user) || user

2. Explicit is Better Than Implicit

Clear function names
Obvious parameter types
No hidden side effects

3. Conditions Must Be as Specific as the Intent

Guard clauses should match exactly the intended case, not just currently-known cases
A check like len(tables) == 0 when you mean "is virtual dual" will silently fire for future zero-table cases
A nil return from a helper (e.g., operatorKeyspace() returning nil for composite operators) should not be treated as "safe to skip" — handle it explicitly
When catching MySQL error codes, match the specific codes that apply to your call site, not a broad class

4. Zero-Value / Default Behavior Safety

New struct fields must not change behavior for existing callers who omit them
Prefer negative-polarity booleans (NoCrossKeyspaceJoins not AllowCrossKeyspaceJoins) so the zero-value preserves existing behavior
When a flag value of 0 or empty previously meant "disabled," don't change it to mean "unlimited" — preserve the existing semantic or make the change explicit
Validate mutually exclusive flags in PreRunE and add unit tests for invalid combinations

5. Performance with Clarity

Optimize hot paths, but keep code readable
Preallocate slices and maps when the size is known: make([]T, 0, len(source))
Avoid duplicate work — cache results of expensive calls like reflect.Value.MapKeys() instead of calling twice
Document why, not what

6. Fail Fast and Clearly

Validate inputs early
Return clear error messages
Help future debugging

7. Interfaces Define What You Need, Not What You Provide

When you need something from another component, define the interface in your package
Don't look at what someone else provides - define exactly what you require
This keeps interfaces small, focused, and prevents unnecessary coupling
Types and their methods live together. At the top of files, use a single type () with all type declarations inside.

8. Go-Specific Best Practices

Receiver naming - Use consistent, short receiver names (e.g., u *User, not user *User)
Package naming - Short, descriptive, lowercase without underscores
Interface naming - Single-method interfaces end in -er (Reader, Writer, Handler)
Context first - Always pass context.Context as the first parameter
Channels for coordination - Use channels to coordinate goroutines, not shared memory
No naked returns in non-trivial functions - For functions with named return values, avoid bare return and explicitly return all result values (very small helpers are the only exception). This does not prohibit plain return in func f() { ... } when used for early-exit/guard clauses.
Reduce nesting - Prefer early returns and guard clauses over deeply nested if conditions
Copyright header - New Go files must include the project copyright header with the current year
Always run gofumpt -w on changed Go files before committing - this is mandatory
Always run goimports -local "vitess.io/vitess" -w on changed Go files before committing
Use format verbs precisely - Use %s for strings and %d for integers, not %v for everything
Structured logging - New log messages should use structured logging with slog-style fields (e.g., log.Warn("message", slog.Any("error", err))) rather than printf-style logging with format strings
Reuse existing helpers - Before writing new parsing/validation code, check for existing utilities (e.g., sqlerror package for MySQL error codes, mysqlctl.ParseVersionString(), strings.Split(), topoproto.TabletAliasString() for formatting tablet aliases)

🏗️ Vitess-Specific Conventions

Generated Code

Never directly edit files with a Code generated ... DO NOT EDIT header - these are generated and will be overwritten
Run make codegen to regenerate after modifying source definitions

Protobufs

Never directly edit files under go/vt/proto/ - they are generated from proto/*.proto protobuf definitions
After modifying proto/*.proto files, run make proto to regenerate
Avoid storing timestamps or time durations as integers; use vttime.Time for timestamps and vttime.Duration (or google.protobuf.Duration, as appropriate) for durations instead
Avoid storing tablet aliases as a string: use topodata.TabletAlias

SQL Parser

Never directly edit these generated files in go/vt/sqlparser/: sql.go, ast_clone.go, ast_copy_on_rewrite.go, ast_equals.go, ast_format_fast.go, ast_path.go, ast_rewrite.go, ast_visit.go, cached_size.go
After modifying source files (e.g., sql.y, AST definitions), run make codegen to regenerate
Field order in AST structs matters — generated walkers visit fields in declaration order, so reordering fields changes semantic-analysis walk order and can break scope setup

Command-Line Flags

New flags must not use underscores (use hyphens instead)
When flags are added or modified, update the corresponding go/flags/endtoend/ files - column/whitespace alignment matters

TabletAlias Formatting

Format *topodatapb.TabletAlias using topoproto.TabletAliasString(alias) in logs and error messages so that tablet aliases are human-readable

MySQL Flavor Isolation

MySQL-version-specific behavior belongs in the corresponding flavor implementation (e.g., MariaDB handling in the MariaDB flavor file), not in generic code
Be aware that MariaDB and older MySQL versions may not support all system variables (e.g., super_read_only) — other Vitess call sites already warn-and-continue for ERUnknownSystemVariable

User-Visible Changes

Any user-visible behavioral change — even a correctness fix — needs explicit callout in release/deployment notes
Removing or renaming a public API function (e.g., in sqlparser) is a breaking change for downstream users — call it out explicitly or keep a thin compatibility wrapper
Changelog summaries are for key changes all users should know about — internal implementation details don't belong there
Keep PRs clean of unrelated diffs (e.g., stray package-lock.json changes, go.sum without go mod tidy)

EmergencyReparentShard (ERS)

ERS must prioritize certainty that we picked the most-advanced candidate
Changes should prioritize reducing points of failure - avoid new RPCs or work that may delay or make ERS more brittle

🔍 Debugging & Troubleshooting

When things don't work as expected, we debug systematically:

Debugging Strategy

Reproduce reliably - Create a minimal failing case
Isolate the problem - Binary search through the system
Understand the data flow - Trace inputs and outputs
Question assumptions - What did we assume was working?
Fix the root cause - Not just the symptoms

Debugging Tools & Techniques

// Use structured logging (slog-style) for new code
log.Info("Starting payment processing",
    slog.String("user_id", userID),
    slog.String("action", "process_payment"),
    slog.Float64("amount", amount),
)

// Add strategic debug points
func processPayment(amount float64) error {
    if amount <= 0 {
        return vterrors.Errorf(vtrpcpb.Code_INVALID_ARGUMENT, "invalid amount: %f", amount)
    }

    // More processing...
    log.Info("Payment validation passed")
    return nil
}

When Stuck

Write a test that reproduces the issue
Add logging to understand data flow
Use the debugger to step through code
Rubber duck explain the problem
Take a break and come back fresh

♻️ Refactoring Legacy Code

When improving existing code, we move carefully and systematically:

Refactoring Strategy

Understand first - Read and comprehend the existing code
Add tests - Create safety nets before changing anything
Small steps - Make tiny, verifiable improvements
Preserve behavior - Keep the same external interface
Measure improvement - Verify it's actually better

Safe Refactoring Process

// Step 1: Add characterization tests
func TestLegacyProcessor_ExistingBehavior(t *testing.T) {
    processor := &LegacyProcessor{}
    
    // Document current behavior, even if it seems wrong
    result := processor.Process("input")
    assert.Equal(t, "weird_legacy_output", result)
}

// Step 2: Refactor with tests passing
func (p *LegacyProcessor) Process(input string) string {
    // Improved implementation that maintains the same behavior
    return processWithNewLogic(input)
}

// Step 3: Now we can safely change the behavior
func TestProcessor_ImprovedBehavior(t *testing.T) {
    processor := &Processor{}
    
    result := processor.Process("input")
    assert.Equal(t, "expected_output", result)
}

🔄 Development Workflow

Starting a Feature

Discuss - "I'm thinking about implementing X. Here's my approach..."
Design - Sketch out the API and key components
Test - Write tests that define the behavior
Implement - Make the tests pass
Review - "Does this make sense? Any concerns?"

Making Changes

Small PRs - Easier to review and less risky
Incremental - Build features piece by piece
Always tested - No exceptions
Clear commits - Each commit should have a clear purpose

Git and PR Workflow

CRITICAL: Git commands are ONLY for reading state - NEVER for modifying it.

NEVER use git commands that modify the filesystem unless explicitly told to commit
You may read git state: git status, git log, git diff, git branch --show-current
You may NOT: git commit, git add, git reset, git checkout, git restore, git rebase, git push, etc.
ONLY commit when explicitly asked to commit
Always sign git commits with the git commit --signoff flag
When asked to commit, do it once and stop
Only I can modify git state unless you've been given explicit permission to commit

Once a PR is created, NEVER amend commits or rewrite history.

Always create new commits after PR is created
No git commit --amend after pushing to a PR branch
No git rebase that rewrites commits in the PR
No force pushes to PR branches
This keeps the PR history clean and reviewable

When asked to write a PR description:

Use gh CLI - Always use gh pr edit <number> to update PRs
Update both body and title - Use --body and --title flags
Be informal, humble, and short - Keep it conversational and to the point
Credit appropriately - If Claude Code wrote most of it, mention that

Example format:

## What's this?
[Brief explanation of the feature/fix]

## How it works
[Key implementation details]

## Usage
[Code examples if relevant]

---
_Most of this was written by Claude Code - I just provided direction._

📝 Code Review Mindset

When reviewing code (yours or mine), ask:

Is this the simplest solution?
Will this make sense in 6 months?
Are edge cases handled?
Is it well tested?
Does it improve the codebase?

🎯 Common Patterns

Feature Implementation

You: "Let's add feature X"
Me: "Sounds good! What's the API going to look like? What are the main use cases?"
[Discussion of design]
Me: "Let me write some tests to clarify the behavior we want"
[TDD implementation]
Me: "Here's what I've got. What do you think?"

Bug Fixing

You: "We have a bug where X happens"
Me: "Let's write a test that reproduces it first"
[Test that fails]
Me: "Great, now we know exactly what we're fixing"
[Fix implementation]

Performance Work

You: "This seems slow"
Me: "Let's benchmark it first to get a baseline"
[Benchmark results]
Me: "Now let's optimize without breaking functionality"
[Optimization with tests passing]

🚀 Shipping Quality

Before considering any work "done":

Tests pass and cover the feature
Code is clean and readable
- Golang code passes the gofumpt formatter
- Golang code passes the goimports -local "vitess.io/vitess" -w ... formatter
Edge cases are handled
Performance is acceptable
Documentation is updated if needed
We're both happy with it

Remember: We're crafting software, not just making it work. Every line of code is an opportunity to make the system better.

FilesExpand file tree

CLAUDE.md

Latest commit

History