# Semgrep for .NET

## Overview

Semgrep is an open-source, language-aware static analysis tool that matches source code patterns using a concise rule syntax. Unlike regex-based tools, Semgrep understands the abstract syntax tree (AST), so it can match patterns regardless of formatting, variable names, or whitespace. For C# projects, Semgrep detects security vulnerabilities (SQL injection, XSS, insecure deserialization), enforces coding standards, and finds anti-patterns without requiring compilation or access to the .NET SDK.

Semgrep rules are written in YAML and use metavariables (`$X`, `$TYPE`) to capture code elements. Rules can include `pattern`, `pattern-not`, `pattern-inside`, and `pattern-either` operators for precise matching. Semgrep also supports autofix for automated remediation.

## Installation and Basic Usage

```bash
# Install via pip
pip install semgrep

# Or via Homebrew
brew install semgrep

# Run with community rules for C#
semgrep --config=auto --lang=csharp .

# Run with a specific rule file
semgrep --config=./rules/csharp-security.yml .

# Run with Semgrep Registry rules
semgrep --config=p/csharp .
```

## Writing Custom Rules

### SQL Injection Detection

```yaml
# rules/sql-injection.yml
rules:
  - id: csharp-sql-injection-string-concat
    patterns:
      - pattern: |
          $CMD.CommandText = $PREFIX + $INPUT + $SUFFIX;
      - pattern-not: |
          $CMD.CommandText = $PREFIX + $CONST + $SUFFIX;
        metavariable-type:
          metavariable: $CONST
          type: string
    message: >
      SQL query built with string concatenation using '$INPUT'.
      This is vulnerable to SQL injection. Use parameterized queries
      with SqlParameter instead.
    severity: ERROR
    languages: [csharp]
    metadata:
      cwe: "CWE-89: SQL Injection"
      owasp: "A03:2021 - Injection"
    fix: |
      $CMD.CommandText = "SELECT * FROM Users WHERE Id = @id";
      $CMD.Parameters.AddWithValue("@id", $INPUT);
```

The rule matches code like:

```csharp
using System.Data.SqlClient;

public class UserRepository
{
    // Semgrep will flag this as sql-injection
    public void GetUser(SqlConnection conn, string userId)
    {
        var cmd = new SqlCommand();
        cmd.Connection = conn;
        cmd.CommandText = "SELECT * FROM Users WHERE Id = " + userId + ";";
        // ^ Semgrep flags: SQL query built with string concatenation
    }

    // Correct: parameterized query (not flagged)
    public void GetUserSafe(SqlConnection conn, string userId)
    {
        var cmd = new SqlCommand("SELECT * FROM Users WHERE Id = @id", conn);
        cmd.Parameters.AddWithValue("@id", userId);
    }
}
```

### Insecure Deserialization

```yaml
# rules/insecure-deserialization.yml
rules:
  - id: csharp-insecure-json-deserialization
    pattern: |
      JsonConvert.DeserializeObject<$TYPE>($INPUT)
    message: >
      Newtonsoft.Json deserialization of '$INPUT' into '$TYPE' without
      TypeNameHandling restriction. If TypeNameHandling is set to Auto
      or All in settings, this enables remote code execution.
      Use System.Text.Json or set TypeNameHandling.None explicitly.
    severity: WARNING
    languages: [csharp]
    metadata:
      cwe: "CWE-502: Deserialization of Untrusted Data"

  - id: csharp-binaryformatter-deserialization
    pattern: |
      new BinaryFormatter().Deserialize($STREAM)
    message: >
      BinaryFormatter.Deserialize is inherently unsafe and enables
      remote code execution. BinaryFormatter is obsolete in .NET 8+.
      Use System.Text.Json, MessagePack, or protobuf-net instead.
    severity: ERROR
    languages: [csharp]
    metadata:
      cwe: "CWE-502: Deserialization of Untrusted Data"
```

### Enforcing Async Best Practices

```yaml
# rules/async-best-practices.yml
rules:
  - id: csharp-async-void-method
    pattern: |
      async void $METHOD(...)
      {
        ...
      }
    pattern-not: |
      async void $METHOD(object $SENDER, $EVENTARGS_TYPE $E)
      {
        ...
      }
    message: >
      Async void method '$METHOD' detected. Async void methods cannot
      be awaited and exceptions will crash the process. Use async Task
      instead. Exception: event handlers with (object sender, EventArgs e)
      signature are allowed.
    severity: WARNING
    languages: [csharp]
    fix: |
      async Task $METHOD(...)
      {
        ...
      }

  - id: csharp-task-result-blocking
    patterns:
      - pattern-either:
          - pattern: $TASK.Result
          - pattern: $TASK.GetAwaiter().GetResult()
      - pattern-inside: |
          async $RETURN_TYPE $METHOD(...)
          {
            ...
          }
    message: >
      Blocking on '$TASK.Result' or 'GetAwaiter().GetResult()' inside
      an async method can cause deadlocks. Use 'await $TASK' instead.
    severity: ERROR
    languages: [csharp]
```

### Detecting Missing Disposal

```yaml
# rules/missing-disposal.yml
rules:
  - id: csharp-httpclient-in-using
    pattern: |
      using var $CLIENT = new HttpClient();
    message: >
      Creating HttpClient with 'using' disposes the client after each
      request, preventing socket reuse and causing socket exhaustion.
      Use IHttpClientFactory or a static/singleton HttpClient instead.
    severity: WARNING
    languages: [csharp]
    metadata:
      cwe: "CWE-404: Improper Resource Shutdown or Release"

  - id: csharp-undisposed-stream
    patterns:
      - pattern: |
          $STREAM = new $STREAM_TYPE(...);
      - metavariable-regex:
          metavariable: $STREAM_TYPE
          regex: "(FileStream|MemoryStream|StreamReader|StreamWriter|NetworkStream)"
      - pattern-not-inside: |
          using ...
          {
            ...
          }
      - pattern-not-inside: |
          using var $STREAM = ...;
    message: >
      Stream '$STREAM' of type '$STREAM_TYPE' is not wrapped in a
      using statement. This can lead to resource leaks.
    severity: WARNING
    languages: [csharp]
```

## Semgrep Pattern Operators

| Operator              | Purpose                                              | Example                                    |
|-----------------------|------------------------------------------------------|--------------------------------------------|
| `pattern`             | Match a code pattern                                 | `Console.WriteLine($MSG)`                  |
| `pattern-not`         | Exclude matches                                      | Exclude safe patterns from results         |
| `pattern-inside`      | Match only if inside a parent pattern                | Detect usage inside async methods          |
| `pattern-not-inside`  | Match only if NOT inside a parent pattern            | Detect missing `using` block               |
| `pattern-either`      | Match any of multiple patterns (OR)                  | `$TASK.Result` OR `.GetResult()`           |
| `patterns`            | Match all patterns (AND)                             | Combine multiple conditions                |
| `metavariable-regex`  | Filter metavariable values by regex                  | Stream type name matching                  |
| `metavariable-type`   | Filter by type (experimental for C#)                 | Only match string types                    |
| `focus-metavariable`  | Report diagnostic on specific metavariable location  | Highlight the problematic variable         |
| `fix`                 | Autofix template                                     | Replace `async void` with `async Task`     |

## CI/CD Integration

### GitHub Actions

```yaml
# .github/workflows/semgrep.yml
name: Semgrep
on: [push, pull_request]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep
    steps:
      - uses: actions/checkout@v4
      - name: Run Semgrep
        run: |
          semgrep scan \
            --config=p/csharp \
            --config=./rules/ \
            --error \
            --json --output=semgrep-results.json \
            .
      - name: Upload Results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: semgrep-results
          path: semgrep-results.json
```

## Semgrep vs. Roslyn Analyzers

| Aspect               | Semgrep                            | Roslyn Analyzers                     |
|-----------------------|------------------------------------|--------------------------------------|
| Compilation required  | No (pattern-based)                 | Yes (semantic model)                 |
| Type information      | Limited                            | Full type resolution                 |
| Cross-file analysis   | Limited                            | Full compilation scope               |
| Rule authoring        | YAML (minutes to write)            | C# code (hours to write)            |
| Performance           | Fast (no compilation)              | Slower (full compilation)            |
| IDE integration       | VS Code extension                  | Visual Studio, Rider, VS Code        |
| Autofix               | Template-based                     | Programmatic (Roslyn API)            |
| Security rules        | Extensive registry                 | Limited built-in                     |

## Best Practices

1. **Start with the `p/csharp` community ruleset from the Semgrep Registry** before writing custom rules to cover common security vulnerabilities and anti-patterns without duplicating existing work.

2. **Use `pattern-not` and `pattern-not-inside` to reduce false positives** by explicitly excluding safe patterns (e.g., excluding parameterized queries from SQL injection rules, excluding event handlers from async-void rules).

3. **Set `severity: ERROR` for security-critical rules and `severity: WARNING` for style/best-practice rules** so that CI pipelines can use `--error` to fail on security findings while allowing warnings to pass.

4. **Include `metadata.cwe` and `metadata.owasp` fields in security rules** to map findings to standardized vulnerability classifications, making it easier to prioritize remediation and report to compliance teams.

5. **Write autofix templates using `fix:` for mechanical transformations** like replacing `async void` with `async Task` or wrapping streams in `using` statements, enabling developers to apply fixes with a single command.

6. **Store custom rules in a `rules/` directory at the repository root and reference them with `--config=./rules/`** to version-control your organization's rules alongside the codebase and share them across repositories.

7. **Use `metavariable-regex` to narrow matches by type name or method name patterns** when full type resolution is not available; this prevents false positives on similarly-named but unrelated APIs.

8. **Run Semgrep in CI with `--json --output=results.json` and upload results as artifacts** so that findings can be reviewed in pull request comments, tracked over time, and ingested by security dashboards.

9. **Test every custom rule with at least one positive match and one negative match** by creating test files with `// ruleid: your-rule-id` and `// ok: your-rule-id` annotations and running `semgrep --test`.

10. **Combine Semgrep with Roslyn analyzers for defense in depth** -- use Semgrep for security patterns and quick cross-language checks, and Roslyn analyzers for type-aware rules that require semantic model access.