# Development Guide Welcome! This document is for contributors and maintainers of Context Builder. It covers how to set up a development environment, build, test, lint, benchmark, and release the project. For user-facing documentation and examples, see README.md. For performance work, see BENCHMARKS.md. For release history, see CHANGELOG.md. --- ## Prerequisites - Rust toolchain (stable) with support for the 2024 edition. - Install via rustup: https://rustup.rs - Keep your toolchain up-to-date: `rustup update` - Git Optional but recommended: - IDE with Rust Analyzer - Just or Make for local task automation (if you prefer) - Node.js (only if you plan to view Criterion’s HTML reports and serve them locally, not required for development) --- ## Getting the code ```bash git clone https://github.com/igorls/context-builder.git cd context-builder ``` --- ## Project layout - Cargo.toml — crate metadata, dependencies, features - README.md — user-facing documentation - CHANGELOG.md — release notes - DEVELOPMENT.md — this file - BENCHMARKS.md — running and understanding benchmarks - scripts/ - generate_samples.rs — synthetic dataset generator for benchmarking - benches/ - context_bench.rs — Criterion benchmark suite - src/ - main.rs — binary entry point - lib.rs — core orchestration and run() implementation - cli.rs — clap parser and CLI arguments - file_utils.rs — directory traversal, filter/ignore collection, prompts - markdown.rs — core rendering logic, streaming, line numbering, binary/text sniffing - tree.rs — file tree structure building and printing - samples/ — optional persistent datasets (ignored in VCS) for benchmarking --- ## Building and running Build: ```bash cargo build ``` Run the CLI: ```bash cargo run -- --help cargo run -- -d . -o out.md -f rs -f toml -i target --line-numbers ``` Notes: - By default, parallel processing is enabled via the `parallel` feature (uses rayon). - Logging uses env_logger; set `RUST_LOG` to control verbosity: - Linux/macOS: `RUST_LOG=info cargo run -- ...` - Windows PowerShell: `$env:RUST_LOG='info'; cargo run -- ...` --- ## Features - parallel (enabled by default) - Enables parallel file processing in markdown generation via rayon. - Disable defaults (sequential run): - Build/Run: `cargo run --no-default-features -- ...` - As a dependency in another crate: set `default-features = false` in Cargo.toml. - samples-bin - Exposes the dataset generator as a cargo bin (development-only). - Usage: - Linux/macOS: - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --help` - Windows PowerShell: - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --help` --- ## Testing Run all tests: ```bash cargo test ``` Tips: - Unit tests cover CLI parsing, file filtering/ignoring, markdown formatting (including line numbers and binary handling), and tree building. - Avoid adding interactive prompts inside tests. The library is structured so that prompts can be injected/mocked (see `Prompter` trait). - For additional diagnostics during tests: - Linux/macOS: `RUST_LOG=info cargo test` - Windows PowerShell: `$env:RUST_LOG='info'; cargo test` --- ## Linting and formatting Format: ```bash cargo fmt --all ``` Clippy (lints): ```bash cargo clippy --all-targets --all-features -- -D warnings ``` Please ensure code is formatted and clippy-clean before opening a PR. --- ## Benchmarks We use Criterion for micro/meso benchmarks and dataset-driven performance checks. - See BENCHMARKS.md for details, including dataset generation, silent runs, and HTML report navigation. - Quick start: ```bash cargo bench --bench context_bench ``` --- ## Environment variables - CB_SILENT - When set to “1” or “true” (case-insensitive), suppresses user-facing prints in the CLI. - The benchmark harness sets this to “1” by default to avoid skewing timings with console I/O. - Override locally: - Linux/macOS: `CB_SILENT=0 cargo bench --bench context_bench` - Windows PowerShell: `$env:CB_SILENT=0; cargo bench --bench context_bench` - CB_BENCH_MEDIUM - When set to “1”, enables the heavier “medium” dataset scenarios during benches. - CB_BENCH_DATASET_DIR - Allows pointing the benchmark harness to an external root containing datasets: - `//project` - If not set and no `./samples//project` is present, benches will synthesize datasets in a temp dir. - RUST_LOG - Controls log verbosity (env_logger). Example: - Linux/macOS: `RUST_LOG=info cargo run -- ...` - Windows PowerShell: `$env:RUST_LOG='info'; cargo run -- ...` --- ## Coding guidelines - Edition: 2024 - Error handling: - Use `io::Result` where appropriate; prefer returning errors over panicking. - It’s okay to use `unwrap()` and `expect()` in tests/benches and small setup helpers, but not in core library logic. - Performance: - Prefer streaming reads/writes for large files (see `markdown.rs`). - Keep binary detection lightweight (current sniff logic checks for NUL bytes and UTF-8 validity). - Keep language detection simple and deterministic; add new mappings in one place. - Cross-platform: - Normalize path separators in tests where string comparisons are used. - Logging: - Use `log::{info, warn, error}`; let `env_logger` control emission. - CLI: - Add new flags in `cli.rs`. Ensure you update tests covering parsing, and propagate options cleanly through `run_with_args`. --- ## Submitting changes 1) Fork and create a feature branch: ```bash git checkout -b feat/my-improvement ``` 2) Make changes, add tests, and keep the code formatted and clippy-clean: ```bash cargo fmt --all cargo clippy --all-targets --all-features -- -D warnings cargo test ``` 3) If you modified performance-sensitive code, run benches (see BENCHMARKS.md). 4) Update CHANGELOG.md if the change is user-visible or noteworthy. 5) Open a PR with: - A concise title - Description of changes and rationale - Notes on performance impact (if any) - Any relevant screenshots or benchmark snippets Suggested commit message convention: short, imperative subject; optionally scope (e.g., `feat(cli): add --no-parallel flag`). --- ## Releasing (maintainers) 1) Ensure the tree is green: - `cargo fmt --all` - `cargo clippy --all-targets --all-features -- -D warnings` - `cargo test` - Optionally: `cargo bench` 2) Update versions and docs: - Bump `version` in `Cargo.toml`. - Add a new entry to `CHANGELOG.md`. - Verify README.md and DEVELOPMENT.md are up to date. 3) Tag the release: ```bash git commit -am "chore(release): vX.Y.Z" git tag vX.Y.Z git push && git push --tags ``` 4) Publish to crates.io: ```bash cargo publish --dry-run cargo publish ``` 5) Create a GitHub release, paste changelog highlights, and attach links to benchmarks if relevant. --- ## Tips and pitfalls - Prompts during runs - The library uses a `Prompter` trait for confirmation flows. Inject a test-friendly prompter to avoid interactive I/O in tests and benches. - Output file overwrites - The CLI confirms overwrites by default. In tests/benches, use the injected prompter that auto-confirms. - Large datasets - Prefer parallel builds for performance. - Consider dataset size and line-numbering effects when measuring. --- ## Questions? Open an issue or start a discussion on GitHub. Thanks for contributing!