Skip to content

Testing Guide

Echomine follows strict Test-Driven Development (TDD) practices. This guide covers how to write, organize, and run tests.

TDD Workflow (Mandatory)

All development must follow the RED-GREEN-REFACTOR cycle:

graph LR
    A[RED: Write Failing Test] --> B[Verify Test Fails]
    B --> C[GREEN: Minimal Code to Pass]
    C --> D[Verify Test Passes]
    D --> E[REFACTOR: Improve Code]
    E --> F{More Tests?}
    F -->|Yes| A
    F -->|No| G[Commit]

1. RED Phase

Write a test that describes the expected behavior. Run it to verify it fails.

def test_search_returns_matching_conversations():
    """Test that search finds conversations containing keywords."""
    adapter = OpenAIAdapter()
    query = SearchQuery(keywords=["python", "algorithm"])

    results = list(adapter.search(SAMPLE_EXPORT, query))

    assert len(results) > 0
    assert all("python" in r.conversation.title.lower() or
               "algorithm" in r.conversation.title.lower()
               for r in results)

Run and confirm failure:

pytest tests/unit/test_search.py::test_search_returns_matching_conversations -v
# Should FAIL because feature isn't implemented

2. GREEN Phase

Write the minimum code to make the test pass. Don't over-engineer.

3. REFACTOR Phase

Improve the code while keeping tests green. Run tests after each change.

Test Organization

Test Pyramid

        /\
       /  \      Contract Tests (5%)
      /----\     - Protocol compliance
     /      \    - FR validation
    /--------\   Integration Tests (20%)
   /          \  - Component interaction
  /------------\ Unit Tests (70%)
 /              \- Fast, isolated, focused
/________________\

Directory Structure

tests/
├── unit/                    # Fast, isolated tests
│   ├── test_conversation.py # Model tests
│   ├── test_message.py
│   ├── test_search.py
│   └── adapters/
│       └── test_openai.py
├── integration/             # Component interaction
│   ├── test_cli_integration.py
│   └── test_search_pipeline.py
├── contract/                # Protocol/spec compliance
│   ├── test_fr_search.py    # Functional requirements
│   └── test_provider_protocol.py
├── performance/             # Benchmarks
│   └── test_search_performance.py
├── fixtures/                # Test data
│   ├── sample_export.json
│   └── conftest.py          # Shared fixtures
└── conftest.py              # Root fixtures

Running Tests

All Tests

# Run all tests with coverage
pytest --cov=echomine --cov-report=term-missing

# Run with verbose output
pytest -v

# Run with extra verbose (show print statements)
pytest -vvs

Specific Test Types

# Unit tests only (fast)
pytest tests/unit/ -v

# Integration tests
pytest tests/integration/ -v

# Contract tests
pytest tests/contract/ -v

# Performance benchmarks
pytest tests/performance/ --benchmark-only

Specific Tests

# Single file
pytest tests/unit/test_search.py -v

# Single test function
pytest tests/unit/test_search.py::test_search_with_keywords -v

# Tests matching pattern
pytest -k "search" -v

# Tests with specific marker
pytest -m "slow" -v
pytest -m "not slow" -v

Watch Mode

# Auto-run tests on file changes (requires pytest-watch)
pip install pytest-watch
ptw -- tests/unit/

Writing Tests

Test Naming

Use descriptive names that explain the expected behavior:

# Good: Describes behavior
def test_search_returns_empty_list_when_no_matches():
    pass

def test_search_ranks_results_by_relevance_score():
    pass

def test_stream_conversations_skips_malformed_entries():
    pass

# Bad: Vague names
def test_search():
    pass

def test_it_works():
    pass

Test Structure (Arrange-Act-Assert)

def test_search_filters_by_date_range():
    # Arrange: Set up test data and dependencies
    adapter = OpenAIAdapter()
    query = SearchQuery(
        keywords=["python"],
        from_date=datetime(2024, 1, 1, tzinfo=UTC),
        to_date=datetime(2024, 6, 30, tzinfo=UTC),
    )

    # Act: Execute the code under test
    results = list(adapter.search(SAMPLE_EXPORT, query))

    # Assert: Verify expected behavior
    assert len(results) > 0
    for result in results:
        assert result.conversation.created_at >= query.from_date
        assert result.conversation.created_at <= query.to_date

Using Fixtures

# conftest.py
import pytest
from pathlib import Path

@pytest.fixture
def sample_export() -> Path:
    """Path to sample export file."""
    return Path(__file__).parent / "fixtures" / "sample_export.json"

@pytest.fixture
def adapter() -> OpenAIAdapter:
    """Fresh adapter instance."""
    return OpenAIAdapter()

# test_search.py
def test_search_with_keywords(adapter, sample_export):
    query = SearchQuery(keywords=["python"])
    results = list(adapter.search(sample_export, query))
    assert len(results) > 0

Parametrized Tests

import pytest

@pytest.mark.parametrize("keywords,expected_min_results", [
    (["python"], 3),
    (["algorithm"], 2),
    (["nonexistent"], 0),
])
def test_search_keyword_variations(adapter, sample_export, keywords, expected_min_results):
    query = SearchQuery(keywords=keywords)
    results = list(adapter.search(sample_export, query))
    assert len(results) >= expected_min_results

Testing Exceptions

import pytest

def test_search_raises_on_nonexistent_file(adapter):
    query = SearchQuery(keywords=["test"])

    with pytest.raises(FileNotFoundError):
        list(adapter.search(Path("/nonexistent/file.json"), query))

def test_validation_error_on_invalid_query():
    with pytest.raises(ValidationError) as exc_info:
        SearchQuery(limit=-1)  # Invalid: must be positive

    assert "limit" in str(exc_info.value)

Mocking

from unittest.mock import Mock, patch

def test_progress_callback_invoked(adapter, sample_export):
    callback = Mock()
    query = SearchQuery(keywords=["python"])

    list(adapter.search(sample_export, query, progress_callback=callback))

    assert callback.call_count > 0

@patch("echomine.adapters.openai.adapter.ijson")
def test_handles_parse_error(mock_ijson, adapter, sample_export):
    mock_ijson.items.side_effect = ijson.JSONError("Parse error")

    with pytest.raises(ParseError):
        list(adapter.stream_conversations(sample_export))

Test Coverage

Running Coverage

# Terminal report
pytest --cov=echomine --cov-report=term-missing

# HTML report (detailed)
pytest --cov=echomine --cov-report=html
open htmlcov/index.html

# XML report (for CI)
pytest --cov=echomine --cov-report=xml

Coverage Requirements

  • Critical paths: >80% coverage required
  • Overall: >70% coverage target
  • New code: 100% coverage for new features

Excluding from Coverage

# pragma: no cover - for unreachable code
def _debug_helper():  # pragma: no cover
    """Only used in development."""
    pass

Performance Testing

Using pytest-benchmark

def test_search_performance(benchmark, adapter, large_export):
    """Search 10K conversations completes in <5 seconds."""
    query = SearchQuery(keywords=["python"], limit=100)

    result = benchmark(lambda: list(adapter.search(large_export, query)))

    # Verify result correctness
    assert len(result) <= 100

    # Performance assertion
    assert benchmark.stats.stats.mean < 5.0  # seconds

def test_stream_memory_efficiency(benchmark, adapter, large_export):
    """Streaming uses O(1) memory."""
    def consume_stream():
        count = 0
        for _ in adapter.stream_conversations(large_export):
            count += 1
        return count

    result = benchmark(consume_stream)
    assert result > 0

Running Benchmarks

# Run benchmarks
pytest tests/performance/ --benchmark-only

# Compare with baseline
pytest tests/performance/ --benchmark-compare

# Save baseline
pytest tests/performance/ --benchmark-save=baseline

Test Markers

# Mark slow tests
@pytest.mark.slow
def test_large_file_processing():
    pass

# Mark integration tests
@pytest.mark.integration
def test_cli_search_command():
    pass

# Skip conditionally
@pytest.mark.skipif(sys.platform == "win32", reason="Unix-only test")
def test_unix_permissions():
    pass

Run specific markers:

pytest -m "not slow"  # Skip slow tests
pytest -m "integration"  # Only integration

Best Practices

Do

  • Write tests FIRST (TDD)
  • Use descriptive test names
  • Test one behavior per test
  • Use fixtures for shared setup
  • Test edge cases and error conditions
  • Keep tests fast (<1s per unit test)

Don't

  • Write tests after implementation
  • Test implementation details (test behavior)
  • Use sleep() in tests (use mocks)
  • Share state between tests
  • Ignore flaky tests (fix them)

Debugging Tests

# Show print output
pytest -vvs tests/unit/test_search.py

# Drop into debugger on failure
pytest --pdb tests/unit/test_search.py

# Run specific failing test
pytest tests/unit/test_search.py::test_search_with_keywords -vvs

Next Steps


Last Updated: 2025-11-30