Luth Logo

Version Build Status
Language Platform License

C++ game engine built to explore high-performance architecture.
Currently under active development, serves as both a learning platform and research project.

Or it might just be a playground to test my sanity.

> [!IMPORTANT] > My original Bachelor's Thesis version is archived in the thesis branch.

Engine Screenshot

--- ## Why Luth? Honestly? I just really love this stuff. It started with my Bachelor's Thesis, where I designed a dual-renderer engine to benchmark Vulkan path tracing against traditional OpenGL PBR. The focus was purely on real-time graphics, so the underlying architecture was single-threaded. It worked, and I had a blast building it! Then I watched Christian Gyrling’s GDC talk on *[Parallelizing the Naughty Dog Engine Using Fibers](https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine)*. Seeing how they saturated every single CPU core made me realize how much was left to explore. So, I started Luth from scratch to explore high-performance architecture: fiber-based job systems, lock-free memory models, and bindless Vulkan rendering. It is absolutely over-engineered for a solo project, but that’s the point. --- ## Shuddup! how build?? **Prerequisites:** - **OS**: Windows 10 / 11 - **Compiler**: MSVC (v143+) or Clang (C++20-compliant) - **GPU**: Hardware ray tracing required (`VK_KHR_ray_query` + acceleration structures) — NVIDIA RTX 20-series+, AMD RX 6000+, or Intel Arc - **SDK**: [Vulkan SDK 1.3+](https://vulkan.lunarg.com). Needs `dynamicRendering`, `timelineSemaphore`, descriptor indexing with UBO update-after-bind, and the KHR ray-tracing extensions **Steps:** 1. **Clone with submodules** ```bash git clone --recursive https://github.com/Hekbas/Luth.git ``` 2. **Generate the VS solution** ```bash scripts/setup/setup_windows.bat ``` 3. **Build** — either open `Luth.sln` in Visual Studio 2022, or run the headless script: ```bash scripts/build/build_windows.bat ``` The editor binary lands at `bin/windows-x86_64/Debug/Runtime/Luthien.exe`. --- ## Technical Architecture ### 1. The Fiber Job System Instead of dedicated OS threads per task ("Render Thread", "Audio Thread"), Luth treats the CPU as a generic worker pool. * **N:M Threading:** One Worker Thread per CPU core. Logical tasks are wrapped in **Fibers** aka lightweight user-mode stacks that migrate freely between workers. * **Zero Blocking:** When a job waits on a dependency (or the GPU), it yields to the scheduler, which swaps in another fiber. CPU saturation stays near 100%. * **Synchronization:** **SpinLocks** (test-and-set + `_mm_pause()`) and **Atomic Counters** keep critical sections short, never blocks the OS. ### 2. Pipelined Frame Execution Three stages overlap. At any frame `T`, the engine is processing three frames at once: ``` time ──► ┌──────────┬──────────┬──────────┬──────────┐ CPU game │ N │ N+1 │ N+2 │ N+3 │ ├──────────┼──────────┼──────────┼──────────┤ CPU render │ N-1 │ N │ N+1 │ N+2 │ ├──────────┼──────────┼──────────┼──────────┤ GPU exec │ N-2 │ N-1 │ N │ N+1 │ └──────────┴──────────┴──────────┴──────────┘ ``` 1. **Game (N):** Transform / animation updates, then captures a `RenderSnapshot` POD into the frame's `LogicMemory` arena — the immutable handoff to the next stage. 2. **Render (N-1):** Reads frame N-1's snapshot, builds the render graph, dispatches per-pass secondary cmd buffer recording in parallel, submits. 3. **GPU (N-2):** Executes the commands submitted previously. Game and render run concurrently on worker fibers from frame 2 onward (frames 0/1 are a sync warm-up against the current frame). The frame boundary is the snapshot, not shared mutable state — Game writes to one `FrameContext` slot, Render reads from another. Stage-isolated subsystems that retain mutexes (`MaterialSystem`, `BoneMatrixBuffer`) `assert` they're only mutated from the game stage. ### 3. Memory Strategy `new` / `delete` are forbidden in the hot path. Two allocators handle everything that churns: ``` Page Pool (2 MB virtual pages) ├── TaggedPageAllocator — CPU side, tagged lifetime, bulk free │ └── per-thread cache — lock-free hot-path allocations ├── GPUTaggedPageAllocator — host-mapped device pages, freed when GPU N-2 retires │ └── per-frame UBO/SSBO regions, descriptors rebind via UPDATE_AFTER_BIND └── LinearAllocator — per-frame, reset on Begin() ``` * **Tagged Page Allocator** — Naughty Dog–style. Allocations carry a tag (`LevelGeometry`, `Frame_N`, …) and are freed in bulk by tag. * **GPU Tagged Page Allocator** — sibling of the CPU side. Vends 2 MB pages from host-mapped device backings; bulk-freed when the GPU N-2 timeline value retires. * **Linear Allocator** — bump-allocate transient frame data (command lists, UI state); resets each frame, no per-object destructors. Persistent SSBOs (Material Set 2, Light Set 3, Object Set 5) are triple-buffered so frame N writes never overlap frame N-1 GPU reads. ### 4. Vulkan 1.3 Backend Modern hardware, minimal driver overhead. * **Bindless Descriptors:** `VK_EXT_descriptor_indexing` binds all engine textures to one global array (`Set 1`), alongside a 32-slot sampler array and buffer device addresses for the RT geometry table. Materials store an integer index — any draw call can sample any texture without rebinding. * **Dynamic Rendering:** No `VkRenderPass` / `VkFramebuffer` — passes use `vkCmdBeginRendering` directly. * **Timeline Semaphores:** Replace `vkWaitForFences`. A dedicated **Poller Job** queries semaphore values and wakes dependent fibers only when the GPU finishes their workload. * **Update-After-Bind:** Per-frame UBO/SSBO descriptor sets are rewritten each frame as their backing GPU pages cycle, eliminating CPU-GPU sync on those bindings. * **VMA:** [Vulkan Memory Allocator](https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator) handles all device-memory placement (buffers, images, staging). ### 5. Render Graph Each frame, Luth builds a **DAG** of render passes. Passes declare reads and writes through a `RenderPassBuilder`; the graph solves pipeline barriers, culls unused passes, and computes resource lifetimes automatically. ```cpp graph.AddPass("GeometryPass", [&](GeometryPassData& data, RG::RenderPassBuilder& builder) { data.depthTex = builder.WriteDepth(sceneDepth, ...); data.outputTex = builder.Write(sceneColor); data.indirect = builder.ReadIndirectBuffer(indirectBuffer); }, [=](GeometryPassData& data, RG::RenderPassContext& ctx) { // record draw commands on ctx.commandBuffer }); ``` Passes execute in topological order; command-buffer recording inside each pass parallelizes across worker threads. --- ## Features ### Rendering | | | |---|---| | **Real-Time Ray Tracing** | Hardware KHR ray query — RT sun shadows, ReSTIR DI + GI (Bitterli 2020 / Ouyang 2021), stochastic RT reflections; per-frame TLAS, bindless geometry table | | **Path-Traced Reference** | rayQuery megakernel — multi-bounce NEE + GGX-VNDF lobe MIS, progressive fp32 accumulation; ground-truth A/B against the raster path | | **Denoising** | SVGF (Schied 2017) — three channels (diffuse DI / indirect GI / specular) behind an `IDenoiser` interface | | **Clustered Forward+** | Olsson log-slice clusters, slim G-buffer prepass; 1 directional + clustered point lights, ECS-driven | | **PBR** | Cook-Torrance BRDF, metallic/roughness, render-mode variants (Opaque/Cutout/Transparent) | | **Shadows** | RT ray-query sun shadows (default); 4-cascade PSSM CSM retained as an A/B toggle (per-cascade GPU cull, PCF) | | **Volumetric Fog** | Wronski froxel grid — light injection → integrate → temporal resolve; optional per-froxel RT fog shadows | | **Ambient Occlusion** | GTAO half-res compute (prefilter → integrate → bilateral denoise) | | **IBL** | HDR skybox, diffuse irradiance + pre-filtered specular + BRDF LUT, split-sum ambient | | **Anti-Aliasing** | TAA (Karis14 YCoCg-clip recipe) + specular AA (Tokuyoshi 2019) | | **GPU Culling** | Compute frustum cull per cascade + main scene, indirect draws everywhere | | **Bindless** | Buffer device address + one global 16384-texture array + 32-slot sampler array; integer material/texture indices | | **Post-Processing** | HDR pipeline, bloom, ACES + AgX / AgX Punchy tonemap operators, vignette, grain, chromatic aberration | | **Shaders** | Single-stage SPIR-V asset pipeline with UUIDs, hot-reload, SPIRV-Cross reflection | | **Pipeline Cache** | Disk-persisted, lazy variant creation, targeted hot-reload invalidation | | **Mipmaps** | Per-texture pipeline with sampler maxLod control | ### Animation | | | |---|---| | **Sampling** | Fiber-parallel keyframe evaluation | | **GPU Skinning** | Bone matrix SSBO, vertex shader skinning | | **Blending** | SQT interpolation, crossfade transitions, layered override with bone masks | | **Root Motion** | Automatic extraction and application to entity transform | | **Debug** | Bone overlay visualization in editor viewport | ### Physics | | | |---|---| | **Backend** | Jolt Physics 5.5.0, jobified onto the fiber scheduler | | **Rigid Bodies** | Static / Kinematic / Dynamic with CCD, primitive + ConvexHull + Mesh shapes | | **Materials** | UUID-keyed friction / restitution / density with hot-reload | | **Character Controller** | Kinematic capsule via `JPH::CharacterVirtual`, default stair + stick-to-floor | | **Queries** | Raycast + Overlap (box / sphere / capsule), layer-mask filtered | | **Events** | Contact + trigger Add / Remove, drained per frame | | **Debug Draw** | Wire colliders colored by motion state or character ground state | ### Asset Pipeline | | | |---|---| | **Asset Database** | UUID-based registry with `.meta` sidecars, importers for shaders/textures/models/materials/animations | | **Smart Import** | Multi-strategy texture discovery, drag-and-drop with eager import, texture remap dialog | | **Hot Reload** | FileWatcher-based live reload for shaders, textures, and project files | | **Scene Format** | Custom JSON `.luth` format with dirty tracking and native file dialogs | ### Editor | | | |---|---| | **Scene Interaction** | Mouse picking (ID buffer), selection outlines with occluded fade, shade modes (Lit/Wireframe/Unlit) | | **Inspector** | Material editor, animation controls, light/shadow settings, Add Component workflow | | **Inspector Preview** | Live orbit-camera 3D preview for Material/Model assets | | **Play Mode** | Editing/Playing/Paused state machine, JSON scene snapshot, animation gating, transport bar | | **Game Panel** | Dedicated camera-driven runtime view with letterbox, no overlays | | **Project Panel** | Folder navigation, search, hot reload, context menus for entity/primitive creation | | **Thumbnails** | Rendered previews for textures/meshes/materials in Project panel | | **Undo / Redo** | Command pattern with UUID-based entity resolution, gizmo drag coalescing, compound commands, material snapshot undo | | **Frame Debugger** | Freeze a frame, scrub through every draw, replay any single one to see what it did | | **Profiler** | Per-system timing breakdown with fiber-aware instrumentation | | **Persistence** | Window layouts, editor settings, and panel state saved across sessions | --- ## Roadmap See the full [development roadmap](docs/development/ROADMAP.md) for completed phases and version history. ### Future Ideas **Rendering** — Material system overhaul (transparency, cutout, emissive, unified raster/RT eval), GPU particle system **Gameplay** — Scripting (C#/Lua), prefab system, ragdoll, animation blend trees & IK **Editor** — Asset streaming, node-based material editor --- ## Dependencies LUTH Engine is built on the shoulders of giants: | | | |---|---| | [**Vulkan SDK**](https://www.lunarg.com/vulkan-sdk/) | Rendering backend | | [**VMA**](https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator) | Vulkan memory allocator | | [**shaderc**](https://github.com/google/shaderc) | Runtime GLSL → SPIR-V compilation (ships with Vulkan SDK) | | [**SPIRV-Cross**](https://github.com/KhronosGroup/SPIRV-Cross) | Shader reflection | | [**EnTT**](https://github.com/skypjack/entt) | Entity-Component-System | | [**ImGui**](https://github.com/ocornut/imgui) | Editor GUI | | [**ImGuizmo**](https://github.com/CedricGuillemet/ImGuizmo) | Translate / rotate / scale gizmos | | [**Tracy**](https://github.com/wolfpld/tracy) | Frame profiler | | [**GLFW**](https://www.glfw.org/) | Windowing + input | | [**GLM**](https://glm.g-truc.net/) | Math | | [**spdlog**](https://github.com/gabime/spdlog) | Logging | | [**assimp**](https://github.com/assimp/assimp) | Model importing | | [**stb_image**](https://github.com/nothings/stb) | Image loading | | [**nlohmann/json**](https://github.com/nlohmann/json) | JSON serialization | | [**Jolt Physics**](https://github.com/jrouwe/JoltPhysics) | Rigid body physics, jobified onto the fiber scheduler | --- ## License Released under the [MIT License](LICENSE).