--- name: binary-size-analysis description: > This skill should be used when the user wants to analyze hermesvm binary size changes across a range of commits. Use when the user mentions "binary size", "size analysis", "size regression", "size increase", or asks to measure how commits affect the hermesvm library size. tags: - hermes - binary-size - analysis --- # hermesvm Binary Size Analysis Analyze per-commit binary size changes of the `hermesvm` shared library across a git commit range. Produces a markdown report with per-commit sizes and summary tables of significant increases and decreases. ## When This Skill Applies - Analyzing which commits caused hermesvm binary size growth - Investigating size regressions in the hermesvm library - Producing a size report for a range of commits ## Pre-Run Prompt Before starting, inform the user how size is measured and ask about verification: 1. **Explain the measurement method**: Tell the user that sizes are measured by summing ALLOC section sizes (via `readelf -SW` on Linux or `otool -l` on macOS), not by measuring the file size with `stat`. This excludes alignment padding and non-shipping sections (symbol tables, string tables, code signatures), matching what actually ships on-device after `strip`. 2. **Ask about parent-commit verification**: This verification is enabled by default. It builds each BUILD commit's parent to confirm no skipped commit changed the binary size. It catches miscategorized SKIP commits but roughly doubles the build time. If the user declines, skip all verification substeps in Step 2. ## Required User Input - **Commit range**: A base commit and an end commit (or HEAD). For example: `b48b6cc..HEAD` or `v0.290.0..v0.300.0` - **CMake flags** (optional): Additional CMake configuration flags. Defaults are listed below. - **Parent-commit verification** (optional): Enabled by default. Disable to halve build time at the risk of missing miscategorized SKIP commits. ## Default Build Configuration ``` cmake -B cmake-build-minsizerel -GNinja \ -DHERMESVM_GCKIND=HADES \ -DHERMESVM_HEAP_HV_MODE=HEAP_HV_PREFER32 \ -DCMAKE_BUILD_TYPE=MinSizeRel \ -DHERMESVM_ALLOW_JIT=2 \ -DHERMESVM_ALLOW_CONTIGUOUS_HEAP=ON \ -DHERMES_IS_MOBILE_BUILD=ON ``` If the user provides additional flags, append them to the cmake command. If the user provides a different value to flags in default build configuration above, replace it with the user-provided value (e.g., -DHERMESVM_ALLOW_JIT=0). ## Procedure ### Step 0: Verify Git Repository and Tools This skill requires a GitHub checkout of the static_h repo (commit hashes are git hashes). Before proceeding, verify the working directory is a valid git repo: ```bash git rev-parse --is-inside-work-tree ``` If this fails (e.g., the repo is a Sapling/Mercurial checkout or not a repo at all), stop and inform the user that this skill only works in a git checkout of the Hermes repository. **Bloaty setup**: This skill uses [bloaty](https://github.com/google/bloaty) for detailed per-section diffs on large size increases. Download and build it: ```bash git clone https://github.com/google/bloaty.git ~/bloaty cd ~/bloaty && cmake -B build -GNinja && cmake --build build ``` If the download or build fails (e.g., no network access, missing dependencies), ask the user to provide the path to a pre-built `bloaty` binary. ### Step 1: Categorize Commits Generate the list of commits in the range and categorize each as BUILD or SKIP. A commit is SKIP if it **only** touches files outside the directories that contribute to the hermesvm binary. The relevant source directories are: - `lib/` (VM, Parser, Sema, IR, BCGen, Support, Regex, etc.) - `API/hermes/` - `public/hermes/` - `include/hermes/` - `external/` - `CMakeLists.txt` (exact match, not filtered by `.txt` suffix), `cmake/`, `lib/config/` Files that never affect the binary: `test/`, `unittests/`, `.github/`, docs (`.md`, `.txt`, `.rst`), `tools/`, npm packages (`hermes-parser`, `prettier-plugin`, etc.), `utils/`, `benchmarks/`, `website/`, `blog/`. Use this Python script to categorize: ```python #!/usr/bin/env python3 import subprocess, sys RELEVANT_PREFIXES = [ "lib/", "API/hermes/", "API/jsi/", "public/hermes/", "include/hermes/", "external/", "cmake/", "lib/config/", ] # Files that match exactly (not as prefixes) and should always be BUILD. RELEVANT_EXACT = ["CMakeLists.txt"] ALWAYS_SKIP_SUFFIXES = [".md", ".txt", ".rst"] base_commit = sys.argv[1] end_commit = sys.argv[2] if len(sys.argv) > 2 else "HEAD" result = subprocess.run( ["git", "log", "--reverse", "--format=%H %s", f"{base_commit}..{end_commit}"], capture_output=True, text=True ) for line in result.stdout.strip().split("\n"): if not line: continue commit, subject = line.split(" ", 1) files_result = subprocess.run( ["git", "diff-tree", "--no-commit-id", "--name-only", "-r", commit], capture_output=True, text=True ) files = files_result.stdout.strip().split("\n") if files_result.stdout.strip() else [] has_relevant = False for f in files: if f in RELEVANT_EXACT: has_relevant = True break if any(f.endswith(s) for s in ALWAYS_SKIP_SUFFIXES): continue if any(f.startswith(p) for p in RELEVANT_PREFIXES): has_relevant = True break category = "BUILD" if has_relevant else "SKIP" print(f"{category}\t{commit[:11]}\t0\t0\t{subject}") ``` Save the BUILD-only lines to a file for the build loop. ### Step 2: Build and Measure **Critical**: Always run `cmake -B ...` before each `ninja hermesvm` to ensure new/removed source files are detected. Without reconfiguration, incremental builds miss new files (especially JS polyfills in `lib/InternalJavaScript/`), causing deltas to be misattributed to later commits. **Size measurement**: Use the sum of ALLOC section sizes, not `stat` on the library file. This is important for two reasons: 1. **Excludes alignment padding**: The linker aligns loadable segments to page boundaries (64KB on Linux, 16KB on macOS). A 1-byte code change can shift a segment boundary and cause up to one page of padding change in the file size, creating false deltas. 2. **Matches production**: ALLOC sections are exactly the sections that remain after `strip` and ship on-device. Non-ALLOC sections (`.symtab`, `.strtab`, `.shstrtab`, `.comment`, `.gnu.build.attributes`) are stripped in production. Use the following Python helper to measure size on both platforms: ```python import platform, re, subprocess def get_section_size(library_path): """Sum section sizes that ship in production, excluding symbol/string tables and inter-segment alignment padding. - Linux (ELF): sums all sections with the ALLOC flag via readelf. This excludes .symtab, .strtab, .shstrtab, .comment, and .gnu.build.attributes which are stripped in production. - macOS (Mach-O): sums all individual section sizes via otool. Sections are listed under __TEXT, __DATA, __DATA_CONST, etc. The __LINKEDIT segment (which contains symbol tables, string tables, code signatures) has no sub-sections so it is naturally excluded. """ if platform.system() == "Darwin": return _get_section_size_macho(library_path) else: return _get_section_size_elf(library_path) def _get_section_size_elf(library_path): output = subprocess.check_output( ["readelf", "-SW", library_path], text=True ) total = 0 for line in output.split('\n'): # Match section lines: [Nr] Name Type Addr Off Size ES Flg ... m = re.match( r'\s*\[\s*\d+\]\s+\S+\s+\S+\s+[0-9a-f]+\s+[0-9a-f]+\s+' r'([0-9a-f]+)\s+[0-9a-f]+\s+(.*)', line ) if m and 'A' in m.group(2): total += int(m.group(1), 16) return total def _get_section_size_macho(library_path): output = subprocess.check_output( ["otool", "-l", library_path], text=True ) total = 0 in_section = False for line in output.split('\n'): # Each section entry in otool -l contains both "sectname" and # "segname" lines, followed by "size 0x...". Only reset on # "cmd " (a new load command), NOT on "segname" — otherwise # the segname line inside each section entry clears the flag # before the size line is reached. if 'sectname' in line: in_section = True elif 'cmd ' in line: in_section = False elif in_section: m = re.match(r'\s+size\s+(0x[0-9a-f]+)', line) if m: total += int(m.group(1), 16) return total ``` Initialize the markdown results file immediately, then iterate through BUILD commits only: 1. Checkout the base commit, configure, build, record baseline size. 2. For each BUILD commit in chronological order: a. `git checkout --force --quiet` b. **Parent-commit verification** (if enabled): Before building this commit, build its parent (`~1`) to verify the categorization of intervening SKIP commits: - `git checkout ~1 --force --quiet` - `cmake -B ` - `cd && ninja hermesvm` - Record the parent's ALLOC size - Compare to the previous BUILD commit's ALLOC size. If they differ, a SKIP commit between the previous BUILD and this one changed the binary. Log a **VERIFICATION FAILURE** with the two sizes and the commit range that needs re-examination. Stop the analysis and report the failure so Step 1's categorization can be fixed. - `git checkout --force --quiet` (return to the BUILD commit) c. `cmake -B ` d. `cd && ninja hermesvm` e. Record ALLOC section size sum using `get_section_size()` on `/lib/libhermesvm.so` (Linux) or `libhermesvm.dylib` (macOS) f. Compute delta from previous BUILD commit's size g. **Write the row to the markdown file immediately** before proceeding h. **Bloaty diff** (if delta >= 3000 bytes): Run `bloaty` to get a per-section breakdown of the size increase. This helps identify whether the growth is in code (`.text`), data (`.rodata`), or elsewhere. - The previous BUILD commit's binary must be saved before building the current commit. Copy it at the start of each iteration (before step a): `cp /lib/libhermesvm.so /tmp/libhermesvm_prev.so` Note: when verification is enabled, the parent binary from step (b) can be used instead, but using the previous BUILD binary is simpler and works in both modes. - Run: `bloaty -- /tmp/libhermesvm_prev.so` - Filter the output to keep only ALLOC sections (section names are the last token on each line): `.text`, `.rodata`, `.data`, `.data.rel.ro`, `.bss`, `.tbss`, `.gcc_except_table`, `.eh_frame`, `.eh_frame_hdr`, `.init_array`, `.fini_array`, `.plt`, `.got`, `.got.plt`, `.rela.dyn`, `.rela.plt`, `.dynsym`, `.dynstr`, `.gnu.hash`, `.gnu.version`, `.init`, `.fini`, and the `TOTAL` row. - Append the filtered bloaty output to the end of the markdown file under a `## Bloaty Diff` section, with a heading for each commit. i. If build fails, record BUILD_FAIL and continue Do **not** include SKIP commits in the output table. ### Step 3: Generate Summary After all commits are processed, append to the markdown file: - Overall summary (baseline, final, net change, percentage) - **Significant Size Increases** table (delta >= +1000), sorted by delta descending - **Significant Size Decreases** table (delta <= -1000), sorted by delta ascending (largest decrease first) **Important**: When parsing the markdown table to extract deltas, handle pipe characters (`|`) in commit subjects carefully. Use regex anchored to the table structure rather than naive pipe splitting. ### Step 4: Restore Branch After analysis, restore the original branch: ```bash git checkout --force --quiet ``` ## Output Format The markdown file (`hermesvm_size_analysis.md` in the repo root) contains: 1. Build configuration header 2. Per-commit table: `| # | Commit | Subject | Size (bytes) | Delta | Total Delta |` 3. Summary section with significant changes tables 4. Bloaty diff section with per-section breakdowns for commits with delta >= 3KB ## Notes - The build directory is `cmake-build-minsizerel` inside the repo. - Each cmake reconfiguration + incremental build takes ~20-30 seconds per commit. For 500+ BUILD commits, expect ~3-5 hours total. - Run the build loop as a background task and monitor with periodic tail checks. - Build failures near HEAD are common when commits depend on later fixes; record them and continue.