See https://cosmo.zip/pub/cosmos/bin/ for interesting binaries to analyze.
As for chronicling refer to META.md.
Main goals:
- Implement using
zig - Be cross-platform
- Be performant
- Be idiomatic
- Be useful
- A reusable code (library) module.
- A CLI/program module that uses the library.
Provide a single, cross-platform API to inspect binary executables and libraries, independent of CLI concerns. The module should:
- Detect file format (ELF, Mach-O, PE, APE, possibly “unknown”).
- Extract structural info (architecture, endianness, bitness, sections, segments).
- Extract metadata (build ID, entry point, PIE/ASLR flags, RELRO, NX, stripping status).
- Extract dynamic info (imported libraries, exported symbols, dynamic relocations).
- Offer a stable, format-agnostic data model so callers don’t deal with ELF/Mach-O/PE oddities directly.
Organize the module into the following logical submodules:
-
File Abstraction
- Responsibilities:
- Open file by path or existing handle.
- Memory-map or buffered read with safe bounds checking.
- Provide unified interfaces to read ranges, translate offsets, and get file size.
- Goals:
- Hide OS-specific file IO details.
- Facilitate unit testing with in-memory buffers.
- Responsibilities:
-
Format Detection
- Responsibilities:
- Inspect magic bytes and minimal header fields to determine:
- Binary format (ELF, Mach-O, PE).
- Word size (32/64-bit).
- Endianness (little/big).
- Return a clear result type:
- Recognized(format, hint-arch, hint-os) or
- Unknown(reason).
- Inspect magic bytes and minimal header fields to determine:
- Goals:
- Fast, minimal reads (only what’s necessary).
- Avoid fully parsing until required, to keep overhead low.
- Responsibilities:
-
Format-Specific Parsers
- Submodules: ELF Parser, Mach-O Parser, PE Parser.
- Responsibilities (each):
- Validate headers (sanity checks on offsets, counts, sizes).
- Parse:
- File/optional headers.
- Section headers.
- Segment/program headers (or equivalents).
- Symbol tables (static/dynamic).
- Dynamic linking info (imports/exports, relocation tables).
- Flags related to PIE/ASLR/NX/RELRO, etc. (translated into format-agnostic fields).
- Record warnings for non-fatal irregularities.
- Goals:
- Robustness against malformed/broken files (never crash, always fail gracefully).
- Clear mapping from native flags to generic capability fields.
-
Unified Binary Model
- A central “binary description” data structure produced by parsers.
- Contains:
- General Info:
- Format (ELF/Mach-O/PE/Unknown).
- OS/ABI (Linux, BSD, macOS, Windows, Unknown).
- Architecture (x86, x86_64, armv7, aarch64, etc.).
- Bitness (32/64).
- Endianness.
- File type (executable, shared library, object, core, etc.).
- Entry point address (if applicable).
- Security/Runtime Features:
- PIE enabled (yes/no/unknown).
- ASLR support (derived from PIE + platform knowledge).
- NX/DEP (yes/no/unknown).
- RELRO type (none/partial/full/unknown).
- Stripped status (yes/no/partially/unknown).
- Sections:
- List with name, type (code/data/debug/other), size, file offset, permissions.
- Segments (or equivalent):
- List with type (load, dynamic, etc.), virtual address, permissions, alignment.
- Dynamic Linking:
- Imported libraries (names, possibly version info).
- Exported symbols (names, types, visibility, binding).
- Imported symbols (names, libraries).
- Debug Info Indicators:
- Presence of debug sections / symbols (e.g., DWARF, PDB hints).
- Anomalies/Warnings:
- Non-critical issues (odd alignment, overlapping ranges, truncated sections).
- General Info:
- Goals:
- Single, format-neutral object that higher layers can render however they like.
- Extensible without breaking existing consumers.
-
Feature Detection Layer
- Responsibilities:
- Infer higher-level properties from low-level fields:
- Derive effective ASLR support from PIE and OS-ABI.
- Determine “stripped” by checking for symbol/debug presence.
- Summarize “security posture” into a small set of flags or scores.
- Infer higher-level properties from low-level fields:
- Goals:
- Centralize the logic so the CLI and any other consumers don’t re-implement it.
- Responsibilities:
-
Error and Warning Model
- Clear types for:
- Fatal errors (cannot parse, truncated file, unsupported variant).
- Non-fatal warnings (suspicious field values but still interpretable).
- All public APIs should:
- Never panic.
- Return either:
- A result with parsed binary plus warnings, or
- A descriptive error with type and message.
- Clear types for:
-
Public API Surface
- Core operations (expressed in words, no code):
- “Inspect file at path and return a full binary description.”
- “Inspect from an in-memory buffer and return a binary description.”
- “Detect format only without full parsing.”
- Configuration options:
- Parsing depth (minimal headers only vs. full parse).
- Symbol loading options (skip symbols for speed, or full symbol parse).
- Security-analysis toggle (enable/disable additional inference work).
- Core operations (expressed in words, no code):
Provide a single, static, user-facing command-line tool that leverages the library to inspect binaries uniformly across platforms.
The CLI should:
- Accept multiple input files.
- Offer different levels of detail (summary vs. verbose).
- Present a consistent, human-friendly output for all formats.
- Provide machine-readable output modes for scripting.
-
Argument Parsing Layer
- Responsibilities:
- Parse:
- File paths (one or more).
- Output mode (human summary, verbose, machine-readable).
- Filters (only show sections, only show imports, only show security flags).
- Global options (color on/off, quiet, help, version).
- Validate arguments and produce a clean configuration object.
- Parse:
- Goals:
- Provide predictable, POSIX-like flags (short and long).
- Avoid surprises in exit codes and error messages.
- Responsibilities:
-
Command Dispatch / Modes
Use a single binary with subcommands or flags that alter behavior.
Core modes:
-
Default (no explicit subcommand):
- Per-file concise summary:
- Format, arch, OS/ABI, bitness.
- File type (exe/shared/object).
- PIE/RELRO/NX/ASLR summary.
- Stripped / debug info status.
- Count of sections, imports, exports.
- Minimal but useful for quick “what is this file?” checks.
- Per-file concise summary:
-
“Details” / “Verbose” Mode:
- Everything in summary plus:
- Full section table.
- Segment layout.
- Full list of imported libraries.
- High-level security posture (e.g., describing protections).
- Any warnings or anomalies.
- Everything in summary plus:
-
“Sections” Mode:
- Only section list with:
- Name, type, size, permissions, offset.
- Optional filters: only code, only writable, name patterns.
- Only section list with:
-
“Deps” / “Imports” Mode:
- Dynamic dependencies (libraries).
- Imported symbols (optionally).
- Basic resolution hints (where they might load from on typical OS defaults, if known).
-
“Symbols” Mode:
- Exported symbols (by default).
- Optional switch to show imported, or all.
- Ability to filter (e.g., by name substring or type).
-
“Security” Mode:
- Focused output on protections only:
- PIE, ASLR, RELRO, NX/DEP, canary hints if derivable, stripped/debug status.
- Possibly present a summarized “score” or rating.
- Focused output on protections only:
-
“Format” Mode:
- Just quickly output format + architecture + file type for scripting.
-
-
Output Formatting Layer
- Human-readable output:
- Clean, consistent headings and ordering across formats.
- Align columns where possible.
- Highlight important security flags.
- Color support (enabled by default on TTY, disabled otherwise, with overrides).
- Machine-readable output:
- Optional JSON or another structured format:
- Direct serialization of the unified binary model.
- Stable field names for scripting and integration.
- Optional JSON or another structured format:
- Error reporting:
- Clear messages including file path and reason.
- Non-zero exit code if any file fails, with summary of failures.
- Human-readable output:
-
Integration with Library
- For each input file:
- Open using the library’s “inspect file” function.
- Handle errors:
- Print per-file error and continue with remaining files (unless configured to stop).
- Map library’s unified model to:
- The selected CLI output mode.
- Respect user config:
- Use parsing-depth settings depending on mode:
- Summary/security modes may use medium depth.
- Verbose/symbols modes request full parsing.
- Use parsing-depth settings depending on mode:
- For each input file:
-
Performance and UX Considerations
- Handle many files efficiently:
- Optionally process in parallel (if feasible with file IO).
- Ensure deterministic output ordering (e.g., sorted by input order).
- Sensible defaults:
- Default to summary mode with colorized, human-friendly output.
- Stable behavior:
- Once options are established, avoid changing semantics between versions in breaking ways, especially for machine-readable mode.
- Handle many files efficiently:
-
Extensibility
- Designed so that adding support for:
- New binary formats (e.g., WebAssembly, COFF variants) requires:
- New parser submodule.
- Extended detection logic.
- Minimal to no changes in CLI logic.
- New binary formats (e.g., WebAssembly, COFF variants) requires:
- Adding new CLI modes:
- Build them atop the existing unified model, not format-specific logic.
- Designed so that adding support for:
-
Testing Strategy (Conceptual)
- Library:
- Unit tests with artificial/minimal binaries for each format.
- Fuzz tests against parsers to ensure robustness.
- CLI:
- Snapshot tests of text output for known binaries.
- Tests of machine-readable JSON output for stability.
- Library:
This design separates a robust, reusable inspection library from a flexible, user-friendly CLI, while ensuring that all binary formats are presented through a single, coherent abstraction.