---
name: dist-op-dev
description: Execution-oriented workflow for HyperParallel distributed operator development. Analyzes the operator, implements or updates code and tests.
---

# HyperParallel Distributed Operator Development Workflow

> ✅ 【Unified Entry】When developing HyperParallel distributed operators, **just call this SKILL**, and I will automatically handle the entire process including operator analysis, implementation, testing, etc.

## When to Use This Workflow

Use this workflow when developers need to add distributed operator support for the HyperParallel framework or optimize sharding strategy inference for existing operators.

## How to Use

Call this SKILL directly, providing the MindSpore mint interface name or PyTorch operator name, along with source code paths:

```bash
# Develop distributed support for MindSpore mint interface
/dist-op-dev I want to develop distributed support for MindSpore mint interface mint.matmul. MindSpore source code is at /root/workspace/mindspore, PyTorch source code is at /root/workspace/pytorch.

# Develop distributed support for PyTorch operator
/dist-op-dev I want to develop distributed support for PyTorch operator torch.nn.functional.linear. MindSpore source code is at /root/workspace/mindspore, PyTorch source code is at /root/workspace/pytorch.
```

**Source code paths are required** — the dist-op-analysis SKILL needs them to locate interface definitions, Primitive mappings, and distributed strategy references.

---

## Execution Flow Overview

Distributed operator development follows a **5-step process**, from operator analysis to code push:

```text
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  1. Operator     │ ──▶ │  2. Python      │ ──▶ │  3. YAML        │
│     Analysis     │     │     Implement   │     │     Registration│
│  Call SKILL      │     │  Inherit/Custom │     │  Configure map  │
│  🔴Output report │     │  infer_layout   │     │  Select suffix  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                                           │
            ┌───────────────────────────────────────────────┘
            ▼
┌─────────────────┐     ┌─────────────────┐
│  4. Unit Test    │ ──▶ │  5. Integration │
│     (UT)         │     │     Test (ST)   │
│  Verify inference│     │  8-card verify  │
│  Cover DP/MP     │     │  Compare output │
└─────────────────┘     └─────────────────┘
```

### Workflow Execution Checklist

When using this SKILL to develop distributed operators, create a TODOLIST, then execute the following workflows in order:

- [ ] **[Step 1](workflows/01-operator-analysis.md)**: Operator Analysis

  - The operator analysis process must follow the procedure described in **workflows/01-operator-analysis.md**. Execute each step in order.
  - Goal: Get operator interface definition, distributed implementation plan, implementation reference
  - Input: MindSpore mint Interface, PyTorch Interface, MindSpore Source Code Path,PyTorch Source Code Path
  - Output: Analysis report file `.claude/skills/dist-op-dev/analysis-results/{OpName}-analysis.md` (🔴required)

- [ ] **[Step 2](workflows/02-python-implementation.md)**: Python Implementation

  - Must: The Python implementation process must follow the procedure described in **workflows/02-python-implementation.md**. Execute each step in order.
  - Goal: Create distributed operator implementation class, implement infer_layout and get_expand_impl
  - Input: Analysis report from Step 1
  - Output: `hyper_parallel/core/shard/ops/parallel_*.py` file

- [ ] **[Step 3](workflows/03-yaml-registration.md)**: YAML Registration

  - Must: The yaml registration process must follow the procedure described in **workflows/03-yaml-registration.md**. Execute each step in order.
  - Goal: Register operator in YAML config file, configure infer_layout_suffix
  - Input: Analysis report from Step 1, Python implementation class info from Step 2
  - Output: `hyper_parallel/core/shard/ops/yaml/*.yaml` entry

- [ ] **[Step 4](workflows/04-unit-testing.md)**: Unit Testing (UT)

  - Must: The test generation process must follow the procedure described in **workflows/04-unit-testing.md**. Execute each step in order.
  - Goal: Verify infer_layout and get_expand_impl logic correctness, cover supported/unsupported scenarios
  - Input: Python implementation class from Step 2, analysis report from Step 1
  - Output: `tests/ut/core/shard/ops/test_parallel_*.py`

- [ ] **[Step 5](workflows/05-integration-testing.md)**: Integration Testing (ST)

  - Must: The test generation process must follow the procedure described in **workflows/05-integration-testing.md**. Execute each step in order.
  - Goal: Verify end-to-end distributed execution correctness in 8-card environment
  - Input: YAML config from Step 3, Python implementation from Step 2, analysis report from Step 1
  - Output: `tests/mindspore/st/shard/ops/test_ops_*.py` + `*_shard_in_python.py` or `tests/torch/shard/ops/test_parallel_op_*.py` + `parallel_op_*.py`

- [ ] **[Step 6](workflows/06-git-commit.md)**: Git Commit and PR Creation

  - Goal: Create feature branch, call autogit to complete lint check, commit, push, and create PR if needed
  - Input: All modified code, operator name
  - Output: Feature branch `feat/{OpName}-distributed-support`, commit pushed, PR created (if needed)

---

## Key Decision Points

| Decision Point | Criteria | Options | Impact |
|----------------|----------|---------|--------|
| **Operator Category** | Semantic matching | ElementWise/MatMul/Reduce/Reshape/Gather | Determines base class and YAML file |
| **Implementation Method** | Need custom logic | Scenario 0/Scenario 1/Scenario 2 | Code volume and UT coverage |
| **Broadcast Support** | Support broadcasting | No suffix/WithShape | YAML config and test scenarios |
| **Partial Support** | Handle partial state | _allow_partial_inputs=True/False | get_expand_impl implementation |
**Detailed decision reference:** See [Implementation Decisions](references/implementation-decisions.md)
---

## Quick Reference

### File Location Quick Reference

| Task | File Location | Key Notes |
|------|---------------|-----------|
| Python Implementation | `hyper_parallel/core/shard/ops/parallel_*.py` | Inherit `DistributedOp` or its subclass |
| YAML Registration | `hyper_parallel/core/shard/ops/yaml/*.yaml` | Configure operator to distributed implementation class mapping |
| Unit Test (UT) | `tests/ut/core/shard/ops/` | Platform-agnostic, verify `infer_layout` and `get_expand_impl` logic |
| Integration Test (ST) | `tests/mindspore/st/shard/ops/` `tests/torch/shard/ops/` | 8-card environment verify distributed execution |

> **Detailed quick reference**: See [references/quick-reference.md](references/quick-reference.md)

### Platform Differences

| Item | MindSpore | PyTorch |
|------|-----------|---------|
| **Interface Name Style** | mint.matmul, mint.nn.functional.relu | torch.matmul, torch.nn.functional.linear |
| **YAML Files** | `element_wise_ops.yaml`, `matmul_ops.yaml`, etc. | `torch_*.yaml` |
| **UT Test Directory** | `tests/ut/core/shard/ops/` (shared) | `tests/ut/core/shard/ops/` (shared) |
| **ST Test Directories** | `tests/mindspore/st/shard/ops/` | `tests/torch/shard/ops/` |

**Important Note:** If MindSpore operator and PyTorch operator have the same semantics, they **can reuse the same distributed operator implementation class**.

---

## Related SKILLs

| SKILL | Purpose | When Called |
|-------|---------|-------------|
| **autogit** | Git workflow automation (commit, pr, status, etc.) | Workflow 6, complete code commit and PR creation |
| **dist-op-analysis** | Internal operator analysis (read-only) | Workflow 1, provides interface specs, distributed strategies, and HyperParallel implementation guidance |

---

## Reference Document Paths

- **Workflow detailed steps**: `workflows/` directory
- **Knowledge reference documents**: `references/` directory

  - [Quick Reference](references/quick-reference.md)
  - [Implementation Decisions](references/implementation-decisions.md)
  - [Code Standards](references/code-standards.md)

- **Template files**: `templates/operator-analysis-template.md`