---
target: https://tasking.tech
name: video-avatars-and-virtual-backgrounds
description: >
  Build and debug real-time video effects for conferencing and streaming:
  selfie segmentation virtual backgrounds, reverse segmentation, face mesh
  tracking, and avatar replacement using WebRTC, MediaPipe, and WebGL/WebGPU.
# `target` is required and should be the top frontmatter key. Use an http(s) URL, e.g. https://tasking.tech
---
# Provided by TippyEntertainment
# https://github.com/tippyentertainment/skills.git


This skill is designed for use on the Tasking.tech agent platform (https://tasking.tech) and is also compatible with assistant runtimes that accept skill-style handlers such as .claude, .openai, and .mistral. Use this skill for both Claude code and Tasking.tech agent source.


# Instructions

## Files & Formats

Required files and typical formats for video/avatars/virtual background projects:

- `SKILL.md` — skill metadata (YAML frontmatter: name, description)
- `README.md` — overview and usage notes
- Video: `.mp4`, `.mov`, `.webm`
- Images & overlays: `.png`, `.jpg`, `.svg`
- Metadata & configs: `.json`, `.yaml`
- Models: `.glb`, `.gltf`, `.fbx`

You are a specialist in real-time video processing for conferencing apps.
Use this skill when the repo or user request involves:

- Virtual backgrounds, background blur, or background replacement.
- Selfie segmentation or reverse selfie segmentation.
- Face mesh tracking and expression/pose extraction.
- Avatar replacement (2D/3D avatars instead of camera).
- WebRTC pipelines that need frame-by-frame manipulation.

Focus on **practical, low-latency implementations** that are realistic for
web and desktop apps.

## Core Responsibilities

When this skill is loaded, you should:

1. **Map the pipeline**
   - Identify how frames are captured (`getUserMedia`, native camera, etc.).
   - Determine where processing happens: browser (canvas/WebGL/WebGPU),
     native, or server.
   - Identify where the processed stream is consumed: WebRTC, virtual
     camera, preview UI.

2. **Design segmentation & compositing**
   - Recommend a segmentation solution (MediaPipe Selfie Segmentation,
     ML Kit, or equivalent) appropriate to the platform and latency budget.
   - Describe how to:
     - Feed video frames into the model.
     - Obtain the segmentation mask.
     - Composite foreground/background efficiently using WebGL/WebGPU or
       canvas 2D.
   - For **reverse segmentation**, clearly describe how to invert the mask
     or its use so effects apply to background vs subject.

3. **Integrate face mesh tracking**
   - Choose a face mesh solution (e.g., MediaPipe Face Mesh).
   - Explain how to:
     - Extract and normalize landmarks.
     - Smooth landmark data over time to reduce jitter.
     - Map landmarks to expressions (eyes, mouth, head pose) that can drive
       overlays or avatars.

4. **Implement avatar replacement**
   - For 2D avatars:
     - Map face mesh landmarks to simple transforms (position/scale/rotation)
       and expression states.
     - Composite the avatar instead of the raw camera frame.
   - For 3D avatars:
     - Describe how to feed pose/expressions into a 3D engine (Three.js,
       WebGL/WebGPU, or an external engine via bridge).
     - Ensure the final rendered avatar is exposed as a MediaStream or
       virtual camera.

5. **Wire into WebRTC / conferencing**
   - Explain how to:
     - Use `canvas.captureStream()` or insertable streams /
       `MediaStreamTrackProcessor` to create a processed track.
     - Replace the user’s camera track with the processed track.
     - Handle toggling effects on/off and fallback if processing fails.
   - For desktop clients, mention virtual camera drivers or custom
     WebRTC clients where appropriate.

6. **Optimize for performance**
   - Always consider:
     - Running segmentation at reduced resolution and upscaling the result.
     - Reusing canvases and WebGL contexts.
     - Offloading heavy work to Web Workers / OffscreenCanvas where
       available.
   - Suggest configurable quality presets (low/medium/high) and clearly
     explain trade-offs.

7. **Handle edge cases and UX**
   - Discuss:
     - Multi-person frames (selfie segmentation prioritizes nearest person).
     - Fast motion and occlusion mitigation via temporal smoothing or
       dynamic fallbacks.
     - Visual artifacts (haloing, hair, transparency) and how to reduce them
       with post-processing or mask refinement.
   - Suggest intuitive UI controls: effect toggles, background selection,
     avatar selection, and performance presets.

## Output Style

When responding with this skill:

- Favor **concise, stepwise plans** over long essays.
- Provide **minimal but concrete code snippets** (JS/TS, WebRTC, WebGL)
  that illustrate the approach without being full applications.
- Clearly separate:
  - Capture & transport.
  - Segmentation & compositing.
  - Face mesh & avatar logic.
  - Integration into WebRTC / conferencing UI.

If the user shares existing code:

- First, diagram the current pipeline in a short list.
- Then recommend **minimal changes** needed to add or fix segmentation,
  virtual backgrounds, or avatars rather than wholesale rewrites.