Slang Language Specification

1. Introduction

1.1. Scope

This document is a specification for the Slang programming language.

Slang is language primarily designed for use in shader programming: performance oriented GPU programming for real-time graphics.

1.2. Conformance

This document aspires to specify the Slang language, and the behaviors expected of conforming implementations, with a high level of rigor.

The open-source Slang compiler implementation may deviate from the language as specified here, in a few key ways:

The implementation is necessarily imperfect, and can have bugs.
The implementation may not fully support constructs specified here, or their capabilities may not be as complete as what is required by the specification.
The implementation may support certain constructs that are experimental, deprecated, or are otherwise intentionally unspecified (or even undocumented).

Where practical, this document will call out known deviations between the language as specified, and the current implementation in the open-source Slang compiler.

1.3. The Slang Standard Library

Many constructs that appear to users of Slang as built-in syntax are instead defined as part of the _standard library_ for Slang. This document may, of neccessity, refer to constructs from the Slang standard library (e.g., the Texture2D type) that are not defined herein.

This document does not provide a normative definition of all of the types, functions, attributes, modifiers, capabilities, etc. that are defined in the Slang standard library. The normative reference for standard-library constructs not defined in this document is the Slang Standard Library Reference.

Need to turn that into a link.

In cases where a language construct is described in both this specification and the Standard Library Reference, the construct has all the capabilities and restrictions defined in either source. In cases where the two sources disagree, there is a correctness issue in one or the other.

1.4. Document Conventions

1.4.1. Terminology

The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this document are to be interpreted as described in RFC 2199.

Uses of the words "descriptive" and "information" in this document are to be interpreted as synonymous with "non-normative."

1.4.2. Typographical Conventions

This document makes use of the following conventions to improve readability:

When a term is introduced and defined, it will be in bold and italics. For example: a duck is something that looks like a duck and quacks like a duck.
Code fragments in Slang or other programming languages use a fixed-width font. For example: a + b.

1.4.3. Meta-Values and Meta-Types

The specification defines a variety of concepts (e.g., grammar productions, types, and values) that are used to define the static and dynamic semantics of Slang code. In many cases, these concepts exist only to serve this specification, and cannot be directly named or manipulated in Slang code itself.

We refer to these abstract concepts as meta-values. All meta-values are themselves values. A meta-type is a type with instances that are all meta-values.

This specification may introduce a named meta-value in prose, or via grammar rules. When a meta-value is introduced or referenced, it name may be rendered using one of the following conventions:

A meta-value name may be rendered in italics and UpperCamelCase. E.g., IfStatement, UnitType
When a meta-value corresponds to a named declaration, keyword, attribute, or other entity that can be referenced in Slang code, its name may be rendered in a fixed-width font. E.g., the Unit type.
In prose, a meta-value name may be rendered as ordinary English, possibly with words that reference Slang language keywords in a fixed-width font. E.g., if statement, unit type.

In each case, the concept being referenced is the same; for example, the references UnitType, Unit, and unit type all refer to the same definition.

1.4.4. Meta-Variables

A meta-variable is a placeholder name that may represents an unknown meta-value of some meta-type. A meta-variable is rendered in italics, and must either be in lowerCamelCase, or consist of a single letter (either upper- or lower-case). For example, text might introduce a meta-variable named e that represents an unknown expression.

The text that introduces a meta-variable may explicitly specify the type of that variable. When the name of a meta-variable matches the name of a type, then it is implicitly a variable of the corresponding type.

1.4.5. Patterns

When introducing a meta-value that is an instance of some nonterminal in a grammar, this specification may express the value as a meta-pattern: a sequence of terminals and meta-variables that match the corresponding nonterminal. For example, we might refer to "the block statement { stmts }", implicitly introducing the meta-variable stmts of type Statements.

The type of a meta-variable in a meta-pattern may be explicitly specified in prose, or it may be implicit from the position of the meta-variable and the grammar production being matched.

A pattern may also specify a type for a meta-variable by immediately suffixing the meta-variable with a colon and a type in UpperCamelCase. For example, a reference to "the block statement { stmts:Statements }" makes the type of stmts more explicit.

1.4.6. Characteristics of Meta-Values

A characteristic of a meta-value is a named property, attribute, or quality of that value.

1.4.7. Callouts

This document uses a few kinds of callouts, which start bold word indicating the kind of callout. For example, the following is a note:

Note: Notes are rendered like this.

The kinds of callouts used in this document are:

List the cases we end up using here

1.4.8. Traditional and Legacy Features

Some features of the Slang language are considered traditional or legacy features. The language supports these constructs, syntax, etc. in order to facilitate compatibility with existing code in other GPU languages, such as HLSL.

Sections that introduce traditional or legacy features begin with a callout indicating the status of those features.

Legacy features are those that undermine the consistency and simplicity of the language, and complicate its implementation, for little practical benefit. These features should be considered as candidates for removal in future versions of this specification. Legacy features are optional, unless otherwise indicated.

Traditional features are those that may not represent the long-term design trajectory of the language, but that offer significant practical benefit to users of the language. Traditional features are required, unless otherwise indicated.

1.5. Context-Free Grammars

This specification uses context-free grammars to define:

The lexical syntax of Slang
The abstract syntax of Slang
Representations for meta-values used in this specification

A context-free grammar consists of an alphabet of terminal symbols, and zero or more productions. A production consists of a left-hand side and a right-hand side.

The left-hand side of a production consists of a single nonterminal symbol. A production is for the nonterminal on its left-hand side.

The right-hand side of a production consists of an ordered sequence of zero or more nonterminal and terminal symbols.

A chain production is a production that has exactly one nonterminal symbol on its right-hand side.

Note: a chain production can have zero or more terminals on its right-hand side, in addition to the single nonterminal.

A trivial production is a chain production that has zero terminals on its right-hand side. A nonterminal is abstract if all of the productions for that nonterminal are trivial productions.

A nonterminal of a context-free grammar is a meta-type. An instance of a non-abstract nonterminal is the result of matching a production for that nonterminal, and consists of a sequence of matches for each of the terms on the right-hand side of that production.

An instance of an abstract nonterminal is an instance of one of the nonterminals on the right-hand side of one of its productions.

Note: If there exists a trivial production for nonterminal A that has a right-hand side consisting of nonterminal B, then B is a subtype of A.

1.5.1. Notation

This specification uses a notation inspired by Extended Backus-Naur Form to define context-free grammars. The notation uses the following notational conventions:

1.5.1.1. Productions

Need to write this up.

1.5.1.2. Terminals

Terminals in the lexical and abstract syntax are rendered in a fixed-width font: e.g., func.

Terminals used in grammar rules for meta-values are rendered in bold: e.g., extends.

1.5.1.3. Nonterminals

Nonterminals in grammar rules are named in upper camel case, and rendered in italics: e.g., CallExpression. This convention is consistent with the notational conventions for meta-types.

A nonterminal on the right-hand side of a production may introduce a name for that nonterminal in lower camel case, using the same syntax as for meta-variables in meta-patterns. E.g., base : TypeExpression.

1.5.1.4. Terminals

A plain-text vertical bar is used to separate alternatives. E.g., IntegerLiteral | FloatingPointLiteral

1.5.1.5. Optionals

A plain-text question mark (?) as a suffix indicates that the given element is optional. E.g., Expression?

1.5.1.6. Sequences

A plain-text asterisk (*) as a suffix indicates that a given element may be repeated zero or more times. E.g., Modifier *

A plain-text plus sign (+) as a suffix indicates that a given element may be repeated one or more times. E.g., AccessorDeclaration+

As an additional convenience, when an asertisk or plus is applied to a grouping where the last item in that grouping is a terminal consisting of a single comma (,), that phrase represents a repetition of comma-separated elements.

1.5.1.7. Grouping

Plain-text parentheses are used for grouping. E.g., Expression (, Expression)

1.5.1.8. Exclusion

A plain-text minus (-) used as an infix operator matches input that matches its left operand but not its right operand.

1.5.1.9. Characteristics

A match for a production will have zero or more characteristics introduced by the right-hand side of that production.

If the right-hand side of a non-trivial production P includes

exactly one occurence of nonterminal N, or
a nonterminal N with a label L

then a match of that production has a characteristic with a name and type derived as follows:

Let label be either L, if it is present, or the name of N otherwise
If N appears under a sequence (marked with * or +), then the type of the characteristic is List< N >, and the name of the characteristic is the plural of label
Otherwise, if N appears under a "?", then the type of the characteristic is Optional< N >, and the name of the characteristic is label
Otherwise, the type of the characteristic is N, and the name of the characteristic is label

As an example, given a grammar production like:

Boat := mainMast:Mast Mast* Sail* Hold?

Then a match for this production would be a boat with the following characteristics:

its main mast, of type Mast
its sails, a list of Sails
its hold, an optional Hold

The term "Mast" in the right-hand side of the production does not introduce a characteristic, because it is not the sole occurence of the non-terminal *Mast on the right-hand side.

2. Lexical Structure

This chapter describes how a source unit is decomposed into a sequence of [lexemes].

2.1. Source Units

A source unit comprises a sequence of zero or more [characters]. For the purposes of this document, a character is defined as a [Unicode] scalar value.

Note: A Unicode scalar value is a Unicode code point that is not a surrogate code point.

Implementations may accept source units stored as files on disk, buffers in memory, or any appropriate implementation-specified means.

2.2. Encoding

Implementations must support source units encoded using UTF-8, if they support any encoding of source units as byte sequences.

Implementations should default to the UTF-8 encoding for all text input/output.

Implementations may support additional implementation-specified encodings.

2.3. Phases

Lexical processing of a [source unit] proceeds as if the following steps are executed in order:

Line numbering (for subsequent diagnostic messages) is noted based on the locations of [line breaks]
[Escaped line breaks] are eliminated. No new [characters] are inserted to replace them. Any new [escaped line breaks] introduced by this step are not eliminated.
Each comment is replaced with a single space (U+0020)
The source unit is lexed into a sequence of [tokens] according to the lexical grammar in this chapter
The lexed sequence of tokens is _preprocessed_ to produce a new sequence of tokens

The final [token] sequence produced by this process is used as input to subsequent phases of compilation.

2.4. Lexemes

A lexeme is a contiguous sequence of characters in a single [source unit].

Lexing is the process by which an implementation decomposes a [source unit] into zero or more non-overlapping [lexemes].

Every [lexeme] is either a [token] or it is [trivia].

Lexeme :
  Token
  | Trivia
  ;

2.4.1. Trivia

Trivia are [lexemes] that do not appear in the abstract syntax; they are only part of the lexical grammar. The presence or absence of [trivia] in a sequence of [lexemes] has no semantic impact, except where specifically noted in this specification.

Trivia :
  Whitespace
  | Comment
  ;

Note: Trivia is either whitespace or a comment.

2.4.1.1. Whitespace

Whitespace consists of [horizontal whitespace] and [line breaks].

Whitespace :
  HorizontalSpace
  | LineBreak
  ;

Horizontal whitespace consists of any sequence of space (U+0020) and horizontal tab (U+0009).

HorizontalSpace : (' ' | '\t')+ ;

2.4.1.1.1. Line Breaks

A line break consists of a line feed (U+000A), carriage return (U+000D) or a carriage return followed by a line feed (U+000D, U+000A).

An escaped line break is a backslash (\, U+005C) follow immediately by a [line break].

A source unit is split into lines: non-empty sequences of [characters] separated by [line breaks].

Note: Line breaks are used as line separators rather than terminators; it is not necessary for a source unit to end with a line break.

2.4.1.2. Comments

A comment is either a [line comment] or a [block comment].

A line comment comprises two forward slashes (/, U+002F) followed by zero or more characters that do not contain a [line break]. A [line comment] extends up to, but does not include, a subsequent [line break] or the end of the source unit.

A block comment begins with a forward slash (/, U+002F) followed by an asterisk (*, U+0052). A [block comment] is terminated by the next instance of an asterisk followed by a forward slash (*/). A [block comment] contains all [characters] between where it begins and where it terminates, including any [line breaks].

Note: [Block comments] do not nest.

It is an error if a [block comment] that begins in a [source unit] is not terminated in that [source unit].

2.4.2. Tokens

Tokens are lexemes that are significant to the abstract syntax.

Token :
    Identifier
    | Literal
    | Operator
    | Puncutation
;

2.4.2.1. Identifiers

Identifier :
    IdentifierStart IdentifierContinue*
    ;

IdentifierStart :
    [A-Z]
    | [a-z]
    | '_'
    ;

IdentifierContinue :
    IdentifierStart
    | [0-9]
    ;

TODO: identifier.

The identifier consisting of a single underscore (_) is reserved by the language and must not be used by programs as a name in a declaration or binding.

Note: There are no other fixed keywords or reserved words recognized by the lexical grammar.

2.4.2.2. Literals

Literal :
    NumericLiteral
    | TextLiteral
    ;

TODO: literal, numeric literal.

2.4.2.2.1. Numeric Literals

A numeric literal is either an [integer literal] or a [floating-point literal].

NumericLiteral :
    IntegerLiteral
    | FloatingPointLiteral
    ;

A radix specifier is one of:

0x or 0X to specify a hexadecimal literal (radix 16)
0b or 0B to specify a binary literal (radix 2)

When no radix specifier is present, a numeric literal is a decimal literal (radix 10).

Note:Octal literals (radix 8) are not supported. A 0 prefix on an integer literal is not used to specify an octal literal as it does in C. Implementations might warn on integer literals with a 0 prefix in case users expect C behavior.

The grammar of the digits for a numeric level depend on its radix, as follows:

The digits of a decimal literal may include 0 through 9
The digits of a hexadecimal literal may include 0 through 9, the letters A through F and a through f. The letters represent digit values 10 through 15.
The digits of a binary literal may include 0 and 1

Digits for all numeric literals may also include _, which are ignored and have no semantic impact.

A numeric literal suffix consists of any sequence of characters that would be a valid identifier, that does not start with e or E.

Note: A leading - (U+002D) before a numeric literal is not part of the literal, even if there is no whitespace separating the two.

2.4.2.3. Integer Literals

An integer literal consists of an optional radix specifier followed by digits and an optional numeric literal suffix.

The suffix on an integer literal may be used to indicate the desired type of the literal:

A u suffix indicates the UInt type
An l or ll suffix indicates the Int64 type
A ul or ull suffix indicates the UInt64 type

2.4.2.4. Floating-Point Literals

A floating-point literal consists of either

An optional radix specifier, followed by digits, followed by a . (U+002E), followed by optional digits, an optional exponent, and an optional numeric literal suffix.
An optional radix specifier, followed by digits, an exponent, and an optional numeric literal suffix.
A . (U+002E) followed by digits, an optional exponent, and an optional numeric literal suffix.

A floating-point literal may only use dexicmal or hexadecimal radix.

The exponent of a floating-pointer literal consists of either e or E, followed by an optional sign of + or -, followed by decimal digits.

TODO: sign.

2.4.3. Text Literals

Need to document supported escape sequences.

2.4.3.1. String Literals

A string literal consists of a sequence of characters enclosed between two ", with the constraint that any " within the sequence must be escaped with \.

TODO: string literal

2.4.3.2. Character Literals

A character literal consists of a sequence of characters enclosed between two ', with the constraint that any ' within the sequence must be escaped with \.

The sequence of characters within a character literal must represent a single character.

2.5. Operators and Punctuation

The following table defines tokens that are used as operators and punctuation in the syntax. When a given sequence of characters could be interpreted as starting with more than one of the following tokens, the longest matching token is used. The name or names given to tokens by this table may be used in the rest of this document to refer to those tokens.

Punctuation :
  | `(`     // left parenthesis
  | `)`     // right parenthesis
  | `[`     // left square bracket, opening square bracket
  | `]`     // right square bracket, closing square bracket
  | `{`    // left curly brace, opening curly brace
  | `}`    // right curly brace, closing curly brace
  | `;`     // semicolon
  | `:`     // colon
  | `,`     // comma
  | `.`     // dot
  | `...`   // ellipsis
  | `::`    // double colon, scope operator
  | `->`    // arrow
    ;

Operator :
  | ```     // backtick
  | `!`     // exclamation mark
  | `@`     // at sign
  | `$`     // dollar sign
  | `%`     // percent sign
  | `~`     //tilde
  | `^`     // caret
  | `&`     // ampersand
  | `*`     // asterisk, multiplication operator
  | `-`     // minus sign, subtraction operator
  | `=`     // equals sign, assignment operator
  | `+`     // plus sign, addition operator
  | `|`     // pipe
  | `<`     // less-than sign, less-than operator
  | `>`     // greater-than sign, greater-than operator
  | `/`     // slash, division operator
  | `?`     // question mark
  | `==`    // double-equals, equal-to operator
  | `!=`    // not-equal operator
  | `%=`    // modulo-assign operator
  | `+=`    // add-assign operator
  | `^=`    // xor-assign operator
  | `&=`    // and-assign operator
  | `*=`    // multiply-assign operator
  | `-=`    // subtract-assign operator
  | `|=`    // or-assign operator
  | `<=`    // less-than-or-equal-to operator
  | `>=`    // greater-than-or-equal-to operator
  | `/=`    // divide-assign operator
  | `&&`    // and-and operator
  | `--`    // decrement operator
  | `++`    // increment operator
  | `||`    // or-or operator
  | `<<`    // left shift operator
  | `>>`    // right shift operator
    ;

2.6. Associating Trivia With Tokens

We should define lexing in terms of producing a sequence of tokens-with-trivia instead of just lexemes.

Each token in a source unit may be prededed or followed by trivia:

TokenWithTrivia =
  LeadingTrivia Token TrailingTrivia

LeadingTrivia = Trivia*

TrailingTrivia = (Trivia - LineBreak)* LineBreak?

A compiler implementation must use the "maximal munch" rule when lexing, so that each TokenWithTrivia includes as many characters of TrailingTrivia as possible.

Note: Informally, the trailing trivia of a token is all the trivia up to the next line break, the next token, or the end of the source unit - whichever comes first. The leading trivia of a token starts immediately after the trailing trivia of the preceding token, or at the beginning of the file if there is no preceding token.

Lexing decomposes a source unit into a sequence of zero or more tokens with trivia, as well as zero or more pieces of trivia that do not belong to any token:

LexicalSourceUnit =
  TokenWithTrivia* Trivia*

The terminals of the context-free grammar should be understood to be tokens with trivia. A reference to a nonterminal N in productions for the abstract syntax should be understood as matching the TokenWithTrivia rule, restricted to the case where the contained token is N.

We should probably make a dedicated EOF be part of the alphabet for input source units, so that the LexicalSourceUnit can be simpler to define. E.g., "The alphabet of the lexical grammar comprises all Unicode scalar values, as well as the unique meta-value EOF."

3. Preprocessor

As the last phase of lexical processing the token sequence of a source unit is preprocessed to produce a new token stream.

The preprocessor supported by Slang is derived from the C/C++ preprocessor with a few changes and extensions.

We either need to pick a normative reference here for some existing preprocessor (and thus bind ourselves to eventually supporting the semantics of that normative reference), or we need to take the time to fully document what the semantics of our current preprocessor are.

Slang programs may use the following preprocessor directives, with the same semantics as their C/C++ equivalent:

#include
#define
#undef
#if, #ifdef, #ifndef
#else, #elif
#endif
#error
#warning
#line
#pragma

An implementation may use any implementation-specified means to resolve paths provided to #include directives.

An implementation that supports #pragma once may use any implementation-specified means to determine if two source units are identical.

3.1. Changes

The input to the Slang preprocessor is a token sequence produced by the rules in Chapter 2, and does not use the definition of "preprocessor tokens" as they are used by the C/C++ preprocessor.

Note: The key place where this distinction matters is in macros that perform token pasting. The input to the Slang preprocessor has to be a valid sequence of tokens before any token pasting occurs.

When tokens are pasted with the ## operator, the resulting concatenated text is decomposed into one or more new tokens. It is an error if the concatenated text does not form a valid sequence of tokens.

Note: The C/C++ preprocessor always yields a single token from any token pasting, whether or not that token is valid.

3.2. Extensions

At the very least we need to document support for the GLSL #version directive, if we intend to keep it.

4. Parsing

The intention of this chapter is to establish the overall rules for the recursive-descent parsing strategy needed for Slang’s grammar.

Slang attempts to thread the needle by supporting the familiar C-like syntax of existing shading languages, while also support out-of-order declarations. The parsing strategy is necessarily complicated as a result, but we hope to isolate most of the details to this chapter so that the grammar for each construct can be presented in its simplest possible fashion.

Parsing is the process of matching a sequence of tokens from the lexical grammar with a rule of the abstract syntax grammar.

4.1. Contexts

Parsing is always performed with respect to a context, which determines which identifiers are bound, and what they are bound to.

The initial context for parsing includes bindings for syntax rules corresponding to each declaration, statement, and expression keyword defined in this specification.

Note: For example, an if expression begins with the if keyword, so the initial context contains a binding for the identifier if to a syntax rule that parses an if expression.

The initial context also includes bindings for the declarations in the Slang standard library. Implementations may include additional bindings in the initial context.

4.2. Strategies

Parsing is always performed in one of two modes: unordered or ordered. Unless otherwise specified, each grammar rule matches its sub-rules in the same mode.

4.2.1. Ordered

Parsing in ordered mode interleaves parsing and semantic analysis. When the parser is in ordered mode and is about to parse a declaration, statement, or expression, and the lookahead is an Identifier, it first looks up that identifier in the context.

If the identifier is bound to a syntax rule s, then the parser attempts to match the corresponding production in the grammar given here.
Otherwise, the parser first matches a term, and then based on whether the result of checking that term is a type or an expression, it either considers only productions starting with a type expression or those starting with an expression.

If the parser is in ordered mode and is about to parse a declaration body, it switches to unordered mode before parsing the body, and switches back to ordered mode afterward.

4.2.2. Unordered

In unordered mode, semantic analysis is deferred.

When the parser is in unordered mode and is about to parse a declaration, and the lookahead is an Identifier, it first looks up that identifier in the context.

If the identifier is bound to a syntax rule s, then the parser attempts to match the corresponding production in the grammar given here.
Otherwise, it considers only productions that start with a type specifier.

Balanced tokens are those that match the following production:

BalancedTokens :
    BalancedToken*

BalancedToken :
    `{` BalancedTokens `}`
    | `(` BalancedTokens `)`
    | `[` BalancedTokens `]`
    | /* any token other than `{`, `}`, `(`, `)`, `[`, `]` */

A balanced rule in the grammar is one that meets one of the following criteria:

starts with a { and ends with a }
starts with a ( and ends with a )
starts with a [ and ends with a ]

Whenever the parser is in ordered mode and would attempt to match a balanced rule r, it instead matches the balanced token rule and saves the token sequence that was matched. Those tokens are then matched against r as part of semantic analysis, using the context that the checking rules specify.

4.3. Angle Brackets

4.3.1. Opening Angle Brackets

4.3.1.1. Ordered Mode

If the parser is in ordered mode with a lookahead of <, and it could match that token as part of either an infix expression or a specialize expression, then it considers the term that has been matched so far:

If the term has a generic type or overloaded type, then the specialize expression case is favored.
Otherwise the infix expression case is favored.

4.3.1.2. Unordered Mode

If the parser is in ordered mode with a lookahead of <, and it could match that token as part of either an infix expression or a specialize expression, then it first attempts to match the generic arguments rule. If that rule is matched successfully, then the new lookahead token is inspected:

If the lookahead token is an identifier, (, literal, or prefix operator, then roll back to before the < was matched and favor the infix expression rule.
Otherwise, use the generic arguments already matched as part of matching the specialize expression rule.

4.3.2. Closing Angle Brackets

If the parser is attempting to match the closing > in the generic parameters or generic arguments rules, and the lookahead is any token other than > that starts with a > character (>=, >>, or >>=), then it splits the opening > off and matches it, replacing the lookahead with a token comprised of the remaining characters (=, >, or >=).

5. Types and Values

An integer value
A floating-point value
A string value
A code point value
A Boolean value

A composite value is one of:

A tuple value
An array-like value
A structure value
An enum value

A type is a set of values; the values in that set are instances of the type.

Note: A given value might be an instance of zero or more types. We avoid saying that a value has some type, except in cases where there is an "obviously right" type for such a value.

Value :=
  ImplicitlyTypedValue
  | UntypedValue

TypedValue :=
  ImplicitlyTypedValue
  | ExplicitlyTypedValue

ExplicitlyTypedValue := UntypedValue `:` Type

ImplicitlyTypedValue := /* TODO */

UntypedValue := /* TODO */

5.1. Level

Every value has a positive integer level.

Note: Typical values that a Slang program calculates and works with have level zero.

5.2. Types of Types

A type is itself a value.

The level of a type is one greater than the maximum of zero and the maximum level of the instances of that type.

Note: It is impossible for a type to be an instance of itself.

A type whose instances are all values with level zero is a proper type.

Note: Most of what a programmer thinks of as types are proper types. The level of a proper type will always be one.

A type whose instances are types with level one is a kind.

The kind Type is the kind of all proper types.

A type whose instances are types with level two is a sort.

The sort Kind is the sort of all kinds.

5.3. Scalars

All scalar values have level zero.

5.3.1. Unit

The unit type, named Unit, has a single instance: the unit value. The name void is an alias for the unit type.

5.3.2. Booleans

The type Bool has two instances: true and false.

5.3.3. Numeric Scalars

5.3.3.1. Integers

The integer types are:

Type	Kind
`Int8`	8-bit signed integer type
`Int16`	16-bit signed integer type
`Int32`	32-bit signed integer type
`Int64`	64-bit signed integer type
`UInt8`	8-bit unsigned integer type
`UInt16`	16-bit unsigned integer type
`UInt32`	32-bit unsigned integer type
`UInt64`	64-bit unsigned integer type

An integer type is either signed or unsigned. An integer type has a bit width.

Signed integer types use two’s complement representation. Arithmetic operations on values of integer type (both signed and unsigned) wrap on overflow/underflow.

The name Int is an alias for the type Int32. The name UInt is an alias for the type UInt32.

5.3.3.2. Floating-Point Numbers

The floating-point types are:

Type	Kind	IEEE 754 Format
`Float16`	16-bit floating-point type	`binary16`
`Float32`	32-bit floating-point type	`binary16`
`Float64`	64-bit floating-point type	`binary16`

All floating-point types are signed. A floating-point type has a bit width.

A floating-point type has a corresponding IEEE 754 format. The instances of a floating-point type are all the values of its IEEE 754 format.

The name Float is an alias for the Float32 type.

5.3.4. Text

5.3.4.1. Unicode Scalar Values

Need to write this section.

5.3.4.2. Characters

Need to write this section.

5.3.4.3. Strings

Need to write this section.

5.4. Finite Sequences

The element count of a finite sequence is the number of elements in it.

The element type of a homogeneous sequence is the type of elements in it.

The level of a sequence is the level of its element type.

Sequence := `{` (Value `,`)* `}`

5.5. Array Types

ArrayType := elementType:Type `[` elementCount:Int `]`

An array is a finite homogenous sequence. An array type T[N] is the type of N-element arrays with elements of type T.

The element count of an array type must be a non-negative Int.

5.6. Vectors

VectorType := `Vector` `<` elementType:Type `,` elementCount:Int `>`

A vector is a finite homogenous sequence. The element type of a vector must be a scalar numeric type.

A vector type Vector<T,N> is the type of vectors with N elements of type T.

The element count of a vector type must be a non-negative Int.

5.7. Matrices

A matrix is a finite homogenous sequence. The element type of a matrix must be a vector type.

The rows of a matrix are its elements. The row type of a matrix is its element type. The row count of a matrix is its element count.

The scalar type of a matrix is the element type of its element type.

The column count of a matrix is the element count of its row type. The column type of a matrix with scalar type T and column count C is Vec<T,C>

A matrix type Matrix<T,R,C> is the type of matrices with R rows of type Vector<C,T>.

5.8. The `Never` Type

NeverType := `Never`

The type Never has no instances.

5.9. Declaration References

A declaration reference is a value that refers to some declaration.

DeclarationReference :=
  DirectDeclarationReference
  | MemberDeclarationReference
  | SpecializedDeclarationReference

5.9.1. Direct Declaration References

A declaration serves as a direct declaration reference to itself.

DirectDeclarationReference := Declaration

5.9.2. Member Declaration Reference

MemberDeclarationReference := base:DeclarationReference `::` member:Declaration

A member declaration reference refers to some member declaration of a base declaration reference. A member declaration reference must satisfy the following constraints:

The base must refer to a declaration with members (TODO: make this precise)
The member must be a direct member declaration of the base

5.9.3. Specialized Declaration Reference

SpecializedDeclarationReference := base:Value `<` (arguments:Value `,`)* `>`

A specialized declaration reference refers to a specialization of some generic declaration. A specialized declaration reference must satisfy the following constraints:

The base must refer to a generic declaration
The base must not be a specialized declaration reference
The number of arguments must match the number of parameters of base
Each of the arguments must be an instance of the type of the corresponding parameter of base

5.9.4. Fully-Specialized Declaration References

A declaration reference r is unspecialized if r refers to a generic declaration and r is not a specialized declaration reference.

A declaration reference r is fully specialized if all of:

r is unspecialized
if r has a base declaration reference, then its base is fully specialized

We need to do some work to define what a fully-qualified declaration reference is, since some parts of the semantics want it.

A direct declaration reference is fully qualified if it is to a module, a parameter (generic or value), or a local declaration.

A member declaration reference is fully qualified if its base is. A specialization is fully qualified if its base is.

5.10. Nominal Types

A nominal type is a fully-specialized declaration reference to a type declaration.

5.11. `struct` Types

A struct type is a fully-specialized declaration reference to a struct declaration.

A struct type is a proper type.

5.12. `class` Types

A class type is a fully-specialized declaration reference to a class declaration.

An instance of a class type is an object.

A class type is a proper type.

5.13. `enum` Types

An enum type is a fully-specialized declaration reference to an enum declaration.

An enum type is a proper type.

5.14. Type Aliases

A type alias is a fully-specialized declaration reference to an typealias declaration.

A type alias is a proper type.

5.15. Associated Types

An associated type is a fully-specialized declaration reference to an associatedtype declaration.

An associated type is a proper type.

5.16. Interfaces

An interface is a value that is either:

a fully-specialized declaration reference to an interface declaration
a conjunction of interfaces

The instances of an interface are the proper types that conform to it.

An interface has level two.

The sort Interface is the sort of all interfaces.

5.17. `any` Types

ExistentialAnyType := `any` Interface

An any type takes the form any I, where I is an interface.

The level of an any type is one.

An instance of any I is an existential value.

a type T
a witness that T conforms to I
a value of type T

The level of an existential value is zero.

5.18. Functions

A function is a value that can be called.

A function is called with zero or more values as arguments. The arguments to a function call must match the function’s parameters

If a call to a function returns normally, it returns a value of the function’s result type.

A call to a function may have additional effects.

A function type takes the form:

FunctionType := `(` (Parameter `,`)* `)` Effect* `->` resultType:Type

Parameter := (Name `:`)? Type

( parameters ) effects -> resultType

where:

parameters is a comma-separated sequence of zero or more parameters
effects is zero or more effects
resultType is a type

Each of the parameters of a function type may either be a type or of the form name:type.

A fully-specialized declaration reference to a function declaration is a function.

The level of a function is the maximum of the levels of its parameter types and result types.

5.19. Generics

A generic is a value that can be specialized.

A generic is specialized with one or more values as arguments. The arguments to a specialization must match the generic’s parameters.

A dependent function type takes the form:

DependentFunctionType := `<` (Parameter `,`)* `>` `->` resultType:Type

< parameters > -> resultType

where:

parameters is a comma-separated sequence of zero or more parameters
resultType is a type

The parameters of a generic type may either be a type or of the form name:type.

The result type of a generic type may refer to the parameters.

An unspecialized declaration reference to a generic declaration is a generic.

The level of a generic is the maximum of the level of its result type and one greater than the maximul of the levels of its parameter types.

5.20. Witnesses

A proposition is a phrase that logically may be either true or false.

Propositions are types. A witness is an instance of a proposition. The existance of a witness of some proposition P demonstrates the truth of P.

The notation "a witness that P", where P is a proposition, is equivalent to "an instance of P".

The notation T extends S denotes a proposition that the type T is a subtype of the type S.

The notation T implements I denotes a proposition that the type T conforms to the interface I.

Proposition := /* TODO */

SubtypeProposition := subtype:Type `extends` supertype:Type

ImplementationProposition := Type `implements` Interface

A witness that t implements i is a witness table.

Given:

an interface i
a requirement of i, represented as a direct member declaration of r
a type t
a witness table w, of type t implements i

then looking up the witness value for r should basically amount to a (static) member reference into w.

This would all be easier if Slang followed the approach of something like Haskell, where the This parameter is basically explicit.

Basically this amounts to saying that t implements i acts as a declaration reference to i, and that a declaration reference to an interface is not "fully specialized" (in the sense that members can be looked up) without that added parameter.

(t implements i)::r

So... I need to do that.

5.21. Module Values

A module is a value.

6. Semantic Checking

The intention of this chapter is to establish the formalisms and notations that will be used for expressing typing judgements and other semantic-checking rules in this specification.

The notation being used here is aligned with how papers in the PLT world on type theory, semantics, etc. present a language.

6.1. Contexts

A context is an ordered sequence of zero or more context entries. Context entries include bindings of the name of a variable (an [identifier]) to its type.

Context :
    ContextEntry*
    
ContextEntry :
    Binding
    
Binding :
    Identifier : Type // variable
    Identifer = TypedValue // alias

A context c binds an identifier n if c contains one or more bindings with n on the left-hand side.

We need to describe here the process by which an identifier resolves to a binding, including overloaded, etc. The discussion here will have to be a forward reference to the algorithms to be given later.

6.2. Expressions

Slang use a variation on a bidirectional type-checking algorithm.

There are two main steps used when checking expressions. Each form of expression may define rules for resolving, checking, or both.

We need to define what checked expressions, resolved expressions, and (parsed) expressions are.

A parsed expression is what you get out of the relevant parsing rules for Expression.

A checked expression is a limited subset of forms that includes stuff like:

typed values
declaration references
calls

A typed value evaluates to itself, with no side effects.

A resolved expression is either a checked expression or one of a few cases that need further context:

overloaded declaration references
overloaded calls
initializer lists

We also need to drill into the representation of types and values...

6.2.1. Resolving

Resolving an expression in a context results in a resolved expression.

Note: Resolving is used in places where an expression needs to have some amount of validation performed, but an expected type for the expression is not available.

To resolve an expression e when the form of e does not define a resolving rule:

Extend the context with a fresh type variable T
Return the result of checking e against T

6.2.2. Checking

Checking a resolved expression against a type results in a checked expression.

Note: Checking is used in places where the type that an expression is expected to have is known.

To check an expression e against a type T when form of e does not define a checking rule:

Let re be the result of resolving e
Return the result of coercing re to T

OLD TEXT: A checking judgement determines that, given a context c, an expression e, and a type t, e checks against t in context c.

6.3. Statements

Statement also use checking judgements. A checking judgement for a statement determines that statement s checks in context c.

6.4. Declarations

Declarations also use checking judgements. A checking judgement for a declaration determines that declaration _d_ [checks] in context c.

Checking of a declaration may modify the context c to include additional bindings.

7. Memory

Values of storable types may be stored into and loaded from memory. A proper type T is storable if T conforms to IStorable.

7.1. Memory Locations

Memory logically consists of a set of memory locations, each of which can hold an 8-bit byte. Each memory location is part of one and only one address space.

Two sets of memory locations overlap if their intersection is non-empty.

7.2. Memory Accesses

A memory access is an operation performed by a strand on a set of memory locations in a single address space. A memory access has a memory access mode.

7.2.1. Memory Access Modes

A memory access mode is one of: read, write, or modify.

7.3. Memory Model

We need to find an existing formal memory model to reference and use.

8. Layout

We clearly need a chapter on the guarantees Slang makes about memory layout.

Types in Slang do not necessarily prescribe a single layout in memory. The discussion of each type will specify any guarantees about layout it provides; any details of layout not specified here may depend on the target platform, compiler options, and context in which a type is used.

9. Storage

9.1. Concrete Storage Locations

A concrete storage location with layout L for storable type T comprises a set of contiguous memory locations, in a single address space, where a value of type T with layout L can be stored.

9.1.1. Reference Types

A reference type is written m ref T where m is a memory access mode and T is a storable type.

How do we get layout involved here?

9.1.1.1. Type Conversion

An expression of reference type modify ref T can be implicitly coerced to type read ref S if T is subtype of S.

An expression of reference type modify ref T can be implicitly coerced to type write ref S if S is subtype of T.

An expression of reference type read ref T can be implicitly coerced to type T.

9.2. Abstract Storage Locations

An abstract storage location is a logical place where a value of some proper type T can reside.

A concrete storage location is also an abstract storage location.

Example: A local variable declaration like var x : Int; defines an abstract storage location, and a reference to x will have the abstract storage reference type read modify Int.

9.3. Abstract Storage Access

An abstract storage access is an operation performed by a strand on an abstract storage location. Each abstract storage access has an abstract storage access mode.

9.4. Access Mode

An abstract storage access mode is either get, set, or a memory access mode.

9.5. Abstract Storage References

An abstract storage reference type is written m T where T is a proper type and m is a set of one or more abstract storage access modes.

Some of the access types imply others, and it is inconvenient to have to write all explicitly in those cases. For example:

If you have read access, you can also perform a get (if the stored type is copyable).
If you have modify access, you can also perform a write.
If you hae modify or write access, you can also perform a set (if the stored type is copyable).

(Having modify access doesn’t let you perform a read or get, because it implies the possibility of write-back which could in principle conflict with other accesses)

9.5.1. Type Conversion

An expression with reference type read ref T can be implicitly coerced to an abstract storage reference type read T.

An expression with reference type write ref T can be implicitly coerced to an abstract storage reference type write T.

An expression with reference type modify ref T can be implicitly coerced to an abstract storage reference type read write modify T.

An expression with abstract storage reference type a T can be implicitly coerced to abstract storage reference type b T if b is a subset of a.

An expression with an abstract storage type m T can be implicitly coerced to an abstract storage type m S if all of the following are true:

S is a subtype of T unless m does not contain any of write, modify, and set
T is a subtype of S unless m does not contain any of read, modify, and get

9.6. Containment

An abstract storage location may contain other abstract storage locations.

Two abstract storage locations overlap if they are the same location, or one transitively contains the other.

An abstract storage location of an array type T[N] contains N storage locations of type T, one for each element of the array.

An abstract storage location of a struct type S contains a distinct storage location for each of the fields of S.

9.6.1. Conflicts

An abstract storage access performed by a strand begins at some point b in the execution of that strand, and ends at some point e in the execution of that strand. The interval of an abstract storage access is the half-open interval [b, e).

The intervals [a0, a1) and [b0, b1) overlap if either:

a0 happens-before b0 and b0 happens-before a1
b0 happens-before a0 and a0 happens-before b1

Two abstract storage accesses A and B overlap if all of the following:

The abstract storage location of A overlaps the abstract storage location of B
The interval of A overlaps the interval of B

Two distinct abstract storage accesses A and B conflict if all of the following:

A and B overlap
At least one of the accesses is a write, modify, or set

9.7. Expressions That Reference Storage

This text really belongs in the relevant sections about type-checking each of these expression forms.

Rather than evaluating to a value, of an expression may evaluate to an abstract storage location, along with restrictions on the kinds of access allowed to that location. An expression that evaluates to an abstract storage location does not constitute an abstract storage access to that location.

Examples of expressions that evaluate to an abstract storage location include:

An identifier expression that names some variable _v_
A member-access expression x.m where x is an abstract storage location of some struct type S and m is a field of S
A member-access expression x.p where p is a property
A subscript exrpession x[i] that resolves to an invocation of a subscript declaration

Examples of expressions that perform an abstract storage access include:

An assignment dst = src performs a write to dst. It also performs a read from src if src is an abstract storage location.
A call f(... a ... ) where argument a is an abstract storage location performs an abstract storage access based on the direction of the corresponding parameter:
An in parameter performs a read access that begins and ends before the strand enters the body of f
An out parameter begins a write access before the call begins, and ends it after the call returns. The abstract storage location a must have been uninitialized before the call, and will be initialized after the call.
An inout parameter begins a modify access before the call that ends after the call returns. The abstract storage location a must have been initialized before the call.
TODO: ref etc.

10. Program Lifecycle

10.1. Compilation

A Slang compiler is a tool that is invoked on Slang source code to produce object code.

Object code produced by a Slang compiler from one toolchain may be incompatible with tools from other toolchains, other versions of the same toolchain, or across target platforms supported by the toolchain.

Note: This specification does not prescribe the language or format of object code that a particular toolchain implementation has to use. Object code might use a toolchain-specific IR, a portable intermediate language, or a format that is compatible with another existing toolchain (e.g., the object code format used by a C toolchain on the target).

A compiler implementation must support being invoked on an entire Slang source module, potentially comprising multiple source units, to produce object code for that entire module. A compiler implementation may support being invoked on source code at other granularities: e.g., on a subset of the source units of a module.

When compiling code that is part of a Slang module M, the output of a compiler implementation must not depend on any non-fragile information in the modules that M depends on.

10.2. Linking

A static library is a unit of code suitable for distribution or re-use. Static libraries created by a toolchain should be usable with later versions of that same toolchain.

A binary is a unit of loadable code that is either an executable or a dynamic library.

Note: We need to refer to executables and dynamic libraries somewhere in the text.

A Slang linker is a tool that is invoked on one or more inputs, consisting of units of object code and static libraries, to produce a static library or a binary.

A binary may contain loadable code for zero or more Slang modules.

Note: it is possible that a linked binary might contain both code compiled from Slang as well as code generated from other languages, or by other tools.

It is invalid for a binary to contain code for non-fragile symbols from a Slang module M while also having external dependencies on symbols from M. A linker may fail with an error rather than produce an invalid binary.

Note: In practice, a linked binary has to either contain all of a given Slang module, or none of it. The nuance is that fragile symbols, such as inlinable functions, from a module M might get copied into object code for a module N that depends on M, and thus end up in a binary for N.

10.3. Loading

A Slang runtime must support loading binaries into a runtime session, and resolving dependencies between them.

Loading of a binary containing definitions of non-fragile symbols from Slang module M must fail if any external symbol that M depends on has not already been loaded into the process.

Loading of a binary containing non-fragile symbols from module M may fail if any binary containing definition of non-fragile symbols from M has already been loaded into the process.

It is valid to load a binary containing fragile symbols from module M even if other binaries containing one or more of those symbols has already been loaded, and even if the definitions of those fragile symbols differ between the binaries. References to fragile symbols may resolve at runtime to any definition of those symbols that was compiled into object code that made its way into the binaries that are loaded.

If Slang code is stored in a binary for a host, then that code may be loaded as part of initialization of a process for a host [=executable].

If a module of Slang code is loaded programmatically, a Slang runtime must return a mirror reflecting that module.

10.4. Runtime Reflection

A Slang runtime may support reflection of modules that have been loaded into a runtime session.

A mirror is a runtime value that reflects some entity in a Slang codebase. A mirror is created and managed by a runtime session, and reflects entities in the code that has been loaded into that runtime session. A mirror that reflects an X is also called an X mirror.

Example: A module mirror is a mirror that reflects a module.

A mirror may reflect:

A module
A declaration
A reference to a declaration
A type
A value

A mirror is not the same as the entity it reflects.

10.5. Bundling

This concept needs a proper name.

A bundle is a runtime entity composed from a sequence of distinct runtime entities where each element in the sequence is one of:

A module
An entry point (a fully specialized reference to an entry point declaration)
A witness to conformance relationship of a type to an interface
A bundle

A Slang runtime must support creation of a bundle from a sequence of mirrors reflecting the runtime entities to be bundled.

The entry points in a bundle are the entry points in the input sequence.

The modules in a bundle are the set of modules in the input sequence, the modules referenced by the entities in the input sequence, and the transitive closure of modules imported by those.

The global uniform shader parameters of a bundle are the global uniform shader parameters declared in modules in the bundle.

The entry point uniform shader parameters of a bundle are the entry-point uniform shader parameters declared by entry points in the bundle.

The shader parameters of a bundle are the global uniform shader parameters of the bundle and the entry-point uniform shader parameters of the bundle.

A kernel is a runtime entity created from a bundle, an entry point in that bundle, and a target device.

A kernel created from a bundle B is compatible with kernels created from a bundle BX where BX was created from an input sequence that started with B.

11. Execution

A runtime session is a context for loading and execution of Slang code. A runtime session exists within the context of a process running on some host machine. For some implementations, a runtime session may be the same as a process, but this is not required.

A runtime session has access to one or more distinct devices on which work may be performed. Different devices in a runtime session may implement different architectures, each having its own machine code format(s). For some implementations, the host may also be a device, but this is not required.

We need to define what it means to "execute" statements and to "evaluate" expressions...

When a strand invokes an invocable entity f, it executes the body of f (a statement). As part of executing a statement, a strand may execute other statements, and evaluate expressions. A strand evaluates an expression to yield a value, and also for any side effects that may occur.

11.1. Strands

A strand is a logical runtime entity that executes Slang code. The state of a strand consists of a stack of activation records, representing invocations that the strand has entered but not yet exited.

A strand may be realized in different ways by different devices, runtime systems, and compilation strategies.

Note: The term "strand" was chosen because a term like "thread" is both overloaded and charged. On a typical CPU architecture, a strand might map to a thread, or it might map to a single SIMD lane, depending on how code is compiled. On a typical GPU architecture, a strand might map to a thread, or it might map to a wave, thread block, or other granularity.

Note: An implementation is allowed to "migrate" a strand from one thread to another.

A strand is launched for some invocable declaration D, with an initial stack comprising a single activation record for D, a

11.2. Activation Records

An activation record represents the state of a single invocation of some invocable declaration by some strand. An activation record for invocable declaration D by strand S consists of:

A value for each parameter of D
The location L within the body of D that S’s execution has reached for that invocation
A value for each local variable of D that is in scope at L

11.3. Waves

A wave is a set of one or more strands that execute concurrently. At each point in its execution, a strand belongs to one and only one wave.

Waves are guaranteed to make forward progress independent of other waves. Strands that belong to the same wave are not guaranteed to make independent forward progress.

Note: A runtime system can execute the strands of a wave in "lock-step" fashion, if it chooses.

Strands that are launched to execute a kernel as part of a dispatch will only ever be in the same wave as strands that were launched as part of the same dispatch.

Every wave W has an integer wave size N > 0. Every strand that belongs to W has an integer lane ID i, with 0 <= i < N. Distinct strands in W have distinct lane IDs.

Note: A wave with wave size N can have at most N strands that belong to it, but it can also have fewer. This might occur because a partially-full wave was launched, or because strands in the wave terminated. If strands in a wave have terminated, it is possible that there will be gaps in the sequence of used lane IDs.

Two strands that are launched for the same kernel for the same node of the same pipeline configuration as part of the same dispatch will belong to waves with the same wave size.

Note: Wave sizes can differ across target platforms, pipeline stages, and even different optimization choices made by a compiler.

11.4. Uniformity

A collective operation is one that require multiple strands to coordinate. A collective operation is only guaranteed to execute correctly when the required subset of strands all participate in the operation together.

11.4.1. Tangle

Any strand begins execution of an entry point as part of a tangle of strands. The tangle that a strand belongs to can change over the course of its execution. When a strand executes a collective operation, it participates with those strands that are part of the same tangle.

When a strand executes a conditional control-flow operation, it branches to a destination that can depend on a condition value. When the strands of a tangle execute a conditional control-flow operation, each strand in the tangle becomes part of a new tangle with exactly those strands that branch to the same destination.

A control-flow region is a pair of nodes (E, M) in the control-flow graph of a function, such that either:

E dominates M
M is unreachable

A node N in the control-flow graph of a function is inside the region (E, M) if E dominates N and M does not dominate N.

A structured region is a control-flow region that is significant to the Slang language semantics.

For each function body, there is a structured region (Entry, Exit) where Entry is the unique entry block of the function’s control-flow graph, and Exit is a unique block dominated by all points where control flow can exit the function body.

Each conditional control-flow operation has a merge point, and defines a structured region (C, M) where C is the control-flow operation and M is its merge point.

A strand breaks out of a control-flow region (E, M) if it enters the region (by branching to E) and then subsequently exits the region by branching to some node N, distinct from M, that is not inside the region.

Note: We define the case where a strand breaks out of a region, but not the case where a strand exits a region normally. The motivation for this choice is that a strand that goes into an infinite loop without exiting a region needs to be treated the same as a strand that exits that region normally.

When the strands in a tangle enter a structured control-flow region (E, M), all of the strands in the tangle that do not break out of that region form a new region at M.

11.4.2. Dynamic Uniformity

Uniformity must be defined relative to some granularity of grouping for strands: e.g., per-strand, per-wave, per-block, etc.

When the strands in a tangle evaluate some expression, we say that the result is dynamically per-G for some granularity of group G, if for any two strands in the tangle that are in the same G, those strands compute the same value for that expression.

We say that a tangle is executing code dynamically per-G uniform when for every G, either all of the strands in that G are in the tangle, or all of them are not in that tangle.

11.4.3. Static Uniformity

A value in a Slang program (such as the result of evaluating an expression) is statically per-G, for some granularity of group G, if it can be proved, using the rules in this specification, that the value will always be dynamically per-G at runtime.

We say that control-flow is statically per-G uniform at some point in a program if it can be proved, using the rules in this specification, that at runtime any stand at that point would be part of a tangle that is per-G uniform.

TODO: Okay, now we actually need to write just such rules...

12. GPUs

12.1. Dispatches

Work is issued to a GPU device in the form of commands. A dispatch is a command that causes a GPU device to execute a specific pipeline instance.

A dispatch determines:

The pipeline instance P to execute
A value for each of the dispatch parameters of P
A value for each of the uniform shader parameters of P

12.1.1. Pipeline Instance

A pipeline instance is an instance of some pipeline type, and consists of:

A set of zero or more stage instances
A bundle

The kernels of any programmable stage instances in a pipeline instance must be compatible with the bundle of the pipeline instance.

12.1.1.1. Pipeline Types

A pipeline type is a category of workloads that a GPU can support. Slang supports the following pipeline types:

rasterization pipelines
compute pipelines
ray-tracing pipelines

Each pipeline type PT defines:

Zero or more dispatch parameters, each having a name and a type
A set of stage types
A set of associated record types
Validation rules for pipeline instances of type PT

12.1.2. Stage Instances

A stage instance is an instance of some stage type ST, and consists of:

A kernel K for ST, if ST is programmable.
Values for each of the fixed-function parameters of ST that are not included in K

12.1.2.1. Stage Types

A stage type is a type of stage that may appear in pipeline instances.

Each stage type belongs to a single pipeline type.

A stage type is either programmable or it is fixed-function.

A stage has zero or more fixed-function parameters, each having a name and a type

A stage type is either required, optional, or instanceable.

12.1.2.2. Signatures

The signature of a programmable stage consists of its uniform signature and its varying signature.

The uniform signature of a programmable stage is a sequence of types.

A programmable stage has a set of boundaries. Each programmable stage has an input boundary and an output boundary.

The varying input signature of a programmable stage is its signature along its input boundary; The varying output signature of a programmabel stage is its signature along its output boundary;

The varying signature of a programmable stage comprises the boundary signature of that stage along each of its boundaries.

To compute the signature of a stage S with a kernel based on a function f:

Initialize the uniform signature of S to an empty sequence
Initailize the boundary signature of S along each of its boundaries to an empty sequence
If f has an implicit this parameter then:
Append the type of the this parameter to the uniform signature of S
For each explicit parameter p of f:
If p is uniform, then:
- Add the type of p to the uniform signature of S
Otherwise:
- TODO: handling of semantics!
- Add p to the varying signature of S
Add the result type R to the varying output signature of S
TODO: need to handle semantics on f as semantics on the result...

The default way to add a parameter p to the varying signature of some stage S is:

If the direction of p is in or inout, then add p to the varying input signature of S
If the direction of p is out or inout, then add p to the varying output signature of S

12.1.3. Records

A record is an instance of an associated record type of a pipeline type. Data is passed between the stage instances of a pipeline instance in the form of records.

The varying input and output signature of a stage instance is defined with respect to the associated record types of the corresponding pipeline type.

A stage instance depends on the associated record types that are part of its signature. A pipeline instance depends on the associated record types that are part of its stage instances.

A programmable stage has a set of boundaries. Every programmable stage has an input boundary and an output boundary.

A record type signature is a sequence of pairs (T, s), where T is a type and s is a semantic. The varying signature of a programmable stage along one of its boundaries is a record type signature.

For each associated record type R that a pipeline instance P depends on, if there are stage instances A and B in P such that A outputs R and B inputs R, then the configuration of R for B must be a prefix of the configuration of A.

12.1.3.1. Compute

A compute pipeline instance is a pipeline instance with a single stage instance of stage type compute. The compute stage type is programmable.

A dispatch of a compute pipeline instance is a compute dispatch. A compute dispatch invokes its kernel for each strand in a block of strands. The number and organization of strands in a block is determined by a block shape. A block shape comprises its rank N, a positive integer, and its extent, a positive integer, along each axis i, where 0 <= i < N. Typical GPU platforms support block shapes with ranks up to 3.

An entry point for the compute stage may use a [numthreads] attribute to specify the block shape to be used by pipeline configurations using that entry point.

The strands in a block are executed concurrently.

Each block of strands that executes a compute entry point is part of a grid of blocks. The number and organization of blocks in a grid is determined by a grid shape. A grid shape comprises its rank N, a positive integer, and its extent, a positive integer, along each axis i, where 0 <= i < N. Typical GPU platforms support grid shapes with ranks up to 3.

The blocks in a grid may be executed concurrently, but this is not guaranteed.

12.1.3.2. Rasterization

A rasterization pipeline instance is a pipeline instance.

12.1.3.2.1. Index Fetch

12.1.3.2.2. Vertex Fetch

Vertex fetch is a fixed-function stage.

This stage takes as input an VertexIndex record, and produces as output an AssembledVertex record.

For each attribute of the AssembledVertex record, fetch data from the corresponding vertex stream, using either the instance ID or vertex ID (based on configuration).

12.1.3.2.3. Vertex Shader

The vertex stage type is programmable.

The signature of a vertex stage is:

One AssembledVertex record as input
One CoarseVertex record as output

This stage takes as input one AssembledVertex record, and produces as output one CoarseVertex record.

Given an entry point function for the vertex stage:

Iterate over its entry-point parameters
If the parameter is uniform, then add it to the uniform signature
If the parameter is in or inout, then add it to the AssembledVertex record signature
If the parameter is out or inout, then add it to the CoarseVertex record signature

A given dispatch may invoke the kernel for the vertex stage one or more times for each vertex ID and instance ID.

12.1.3.2.4. Primitive Assembly

A primitive consists of a primitive topology and N vertices, where N matches the requirements of that primitive topology.

This stage takes as input coarse vertex records from a preceding stage. It produces as output primitives of coarse vertex records.

12.1.3.2.5. Tessellation

###### Hull Shader ###### {#exec.gpu.pipeline.raster.stage.tess.hull}

This stage takes as input a patch primitive of coarse vertex records. It produces as output a patch primitive of control point records a patch record.

###### Domain Shader ###### {#exec.gpu.pipeline.raster.stage.tess.domain}

This stage takes as input a patch primitive of control point records, and a patch record. It outputs a fine vertex record.

12.1.3.2.6. Geometry Shader

This stage takes as input a primitive of vertex records (either coarse or fine vertices, depending on what preceding stages are enabled). As output, a geometry shader may have one or more output streams, each having a coresponding type of record. For each output stream, a strand executing a geometry shader may output zero or more records of the corresponding type.

If one of the output streams of the geometry shader is connected to the rasterizer stage, then that output stream corresponds to raster vertex records.

12.1.3.2.7. Mesh Shading

12.1.3.2.8. Rasterizer

The rasterizer is a fixed-function stage.

This stage takes as input a primitive of raster vertex records, and determines which pixels in a render target that primitive covers. The stage outputs zero or more raster fragment records, one for each covered pixel.

12.1.3.2.9. Depth/Stencil Test

12.1.3.2.10. Fragment Shader

This stage takes as input a raster fragment record, and produces as output either:

One ShadedFragment record
One ShadedFragment record and N ShadedFragment.Sample records

12.1.3.2.11. Blend

This stage takes as input a ShadedFragment record (possibly including its ShadedFragment.Sample) and a Pixel record, and produces as output a Pixel record.

12.1.3.3. Ray Tracing

TODO: define the stages of the ray-tracing pipeline.

13. Source and Binary Stability

This section needs to define what it means for something to be _stable_ vs. fragile.

14. Modules

A module is a unit of encapsulation for declarations.

14.1. Primary Source Unit

One or more source units may be compiled together to form a module. Compilation of a module begins with a single source unit that is the primary source unit of the module. An implementation should use the primary source unit as a means to identify a module when resolving references from one module to another.

A primary source unit may start with a module declaration, which specifies the name of the module:

ModuleDeclaration :
    `module` Identifier `;`

If a primary source unit does not start with a module declaration, the module comprises only a single source unit, and the name of the module must be determined by some implementation-specified method. A primary source unit without a module declaration must not contain any include declarations.

Note: An implementation might derive a default module name from a file name/path, command-line arguments, build configuration files, etc.

Note: A module might be defined in a single source unit, but have code spread across multiple files. For example, #include directives might be used so that the content of a single source unit comes from multiple files.

A module declaration must always be the first declaration in its source unit. A module declaration must not appear in any source unit that is not a primary source unit.

14.2. Seconary Source Units

Any source units in a module other than the primary source unit are secondary source units.

14.2.1. Include Declarations

Secondary source units are included into a module via include declarations.

IncludeDeclaration :
    `include` path:PathSpecifier `;`

PathSpecifier :
    PathElement ( `.` PathElement )*

PathElement :
    Identifier
    | StringLiteral

An implementation must use the _path_ given in an include declaration to identify a source unit via implementation-specified means. It is an error if source unit matching the given name cannot be identified.

Include declarations are distinct from preprocessor #include directives.

Each source unit in a module is preprocessed independently, and the preprocessor state at the point where an include declaration appears does not have any impact on how the source unit identified by that include declaration will be preprocessed.

A module comprises the transitive closure of source units referred to via include declarations. An implementation must use an implementation-specified means to determine if multiple include declarations refer to the same source unit or not.

Note: Circular, and even self-referential, include declarations are allowed.

14.2.2. Implementing Declarations

Secondary source units must start with an implementing declaration, naming the module that the secondary source unit is part of:

ImplementingDeclaration}
    `implementing` PathSpecifier `;`

Note: It is an error if the name on an implementing declaration in a secondary source unit does not match the name on the module declaration of the corresponding primary source unit.

Note: implementing declarations ensure that given a source unit, a tool can always identify the module that the source unit is part of (which can in turn be used to identify all of the source units in the module via its include declarations).

An implementing declaration must always be the first declaration in its source unit. An implementing declaration must not appear in any source unit that is not a secondary source unit.

14.3. Import Declarations

Modules may depend on one another via import declarations.

ImportDeclaration :
  `import` PathSpecifier `;`

An import declaration names a module, and makes the public declarations in the named module visible to the current source unit.

Note: An import declaration only applies to the scope of the current source unit, and does not import the chosen module so that it is visible to other source units of the current module.

Implementations must use an implementation-specified method to resolve the name given in an import declaration to a module. Implementations may resolve an import declaration to a previously compiled module. Implementations may resolve an import declaration to a source unit, and then attempt to compile a module using that source unit as its primary source unit. If an implementation fails to resolve the name given in an import declaration, it must diagnose an error. If an implementation diagnoses erros when compiling a soure unit named via an import declaration fails, it must diagnose an error.

The current Slang implementation searches for a module by translating the specified module name into a file path by:

Replacing any dot (.) separators in a compound name with path separators (e.g., /)
Replacing any underscores (_) in the name with hyphens (-)
Appending the extension .slang

The implementation then looks for a file matching this path on any of its configured search paths. If such a file is found it is loaded as a module comprising a single source unit.

15. Declarations

A declaration is a unit of syntax that typically affects what names are visibile in a scope, and how those names are interpreted. In the abstract syntax, a source unit comprises a sequence of declarations.

SourceUnit
    ( ModuleDeclaration | ImplementingDeclaration ) ?
    IncludeDeclaration*
    ImportDeclaration*
    ( Declaration )*

A declaration may be preceded by zero or more modifiers.

Declaration
    Modifier Declaration
    | VariableDeclaration
    | FunctionDeclaration
    | TypeDeclaration

A binding is an association of an identifier with a specific entity, such as a declaration. A scope is a set of bindings.

Note: A scope can include more than one binding for the same identifier.

An environment is either the unique empty environment, or it comprises a scope that is the local scope of that environment, and a parent environment.

Note: No Slang code is ever parsed or checked in the empty environment. Many checking rules are expected to introduce new local scopes, which will use the current environment as the parent environment.

For each production in the abtract syntax, there is a input environment in effect at the start of that production, and an output environment in effect after that production. Each production determines the input environment used by its sub-terms. When the description of a production does not say otherwise:

The input environment of a production’s first sub-term is the input environment of that production.
The input environment of each sub-term other than the first is the output environment of the preceding sub-term.
The output environment of the entire production is the output environment of its last sub-term.

The output environment of a token is its input environment. The output environment of an empty production is its input environment.

When we say that a declaration introduces a binding, we mean that its output environment is the output environment of its last sub-term extended with that binding.

15.1. Shared Concepts

We need to define concepts that are used by traditional-style declarations, including type specifiers.

15.1.1. Bodies

We need to define that there are two kinds of bodies: declaration bodies and _statement bodies_.

15.1.2. Bases

15.1.3. Parameters

A parameters clause defines a sequence of parameter declarations.

ParametersClause
    `(` (ParameterDeclaration `,`)* `)`

ParameterDeclaration
    Identifier `:` Direction? TypeExpression DefaultValueClause?

DefaultValueClause
    `=` Expression

A parameter declaration must include a type expression; the type of the parameter is the result of evaluating that type expression.

A parameter declaration may include a default-value clause; it it does, the default value for that parameter is the result of evaluating the expression in that clause with an expected type that is the type of the parameter.

15.1.3.1. Directions

Every parameter has a direction, which determines how an argument provided at a call site is connected to that parameter.

Direction
    `in`
    | `out`
    | `take`
    | `borrow`
    | `inout` | `in` `out`
    | `ref`

If a parameter declaration includes a direction, then that determines the direction of the parameter; otherwise the direction of the parameter is in.

15.1.3.1.1. `in`

The in direction indicates typical pass-by-value (copy-in) semantics. The parameter in the callee binds to a copy of the argument value passed by the caller.

15.1.3.1.2. `out`

The out direction indicates copy-out semantics. The parameter in the callee binds to a fresh storage location that is uninitialized on entry. When the callee returns, the value in that location is moved to the storage location referenced by the argument.

15.1.3.1.3. `take`

The take direction indicates move-in semantics. The argument provided by the caller is moved into the parameter of the callee, ending the lifetime of the argument.

15.1.3.1.4. `borrow`

The borrow direction indicates immutable borrow semantics.

If the parameter has a noncopyable type, then the parameter in the callee binds to the storage location referenced by the argument.

If the parameter has a copyable type, then the parameter in the callee may, at the discretion of the implementation, bind to either the storage location referenced by the argument, or a copy of the argument value.

15.1.3.1.5. `inout`

The inout direction indicates exclusive mutable borrow semantics. The syntax in out is equivalent to inout

If the parameter has a noncopyable type, then the parameter in the callee binds to the storage location referenced by the argument.

If the parameter has a copyable type, then the parameter in the callee may, at the discretion of the implementation, bind to either:

the storage location referenced by the argument
a fresh storage location, in which case that location is initialized with a copy of the argument value on entry, and the value at that location is written to the argument on return from the function.

15.1.3.1.6. `ref`

The ref direction indicates pass-by-reference semantics. The parameter in the callee binds to the storage location referenced by the argument.

15.1.4. Accessors

15.1.5. Invocable Declarations

A declaration that is invocable can be invoked by a strand during execution.

15.2. Patterns

Pattern :=
    BindingPattern
    | TuplePattern
    | ExpressionPattern

TuplePattern := `(` (Pattern `,`)* `)`

ExpressionPattern := Expression

BindingPattern := `let` BindingPatternInner

BindingPatternInner :=
    Identifier
    | TuplePatternInner

TuplePatternInner := `(` (BindingPatternInner `,`)* `)`

15.3. Generics

15.4. Type Declarations

A type declaration introduces a binding to a type.

TypeDeclaration =
    AggregateTypeDeclaration
    | EnumDeclaration
    | TypeAliasDeclaration

15.4.1. Aggregate Types

AggregateTypeDeclaration
    (`struct` | `class` | `interface`) Identifier
    GenericParametersClause?
    BasesClause?
    GenericWhereClause?
    DeclarationBodyClause `;`?

DeclarationBodyClause
    `{` Declaration* `}`

An aggregate type declaration is either a struct declaration, a class declaration, or an interface declaration.

The declarations in the body clause of an aggregate type declaration are referred the direct members of that declaration.

Note: We need rules to define what the members that aren’t direct members are.

15.4.1.1. Instance and Static Members

Members of an aggregate type may me marked with a static modifier, in which case they are static members of the type. Members not marked with static are instance members of the type.

Static members are referenced through the type itself. Instance members are referened through values that are instances of the type.

15.4.1.2. Fields

Variable declarations that are instance members of an aggregate type declaration are also referred to as the fields of the aggregate type declaration.

15.4.1.3. Methods

Function declarations that are instance members of an aggregate type declaration are also referred to as methods.

By default, the implicit this parameter of a method acts has a direction of in. A method of a struct type declaration may be modified with the [mutating] attribute, in which case the implicit this parameter has a direction of inout.

15.4.2. Bases

BasesClause
    `:`Base (`,`Base)*

Base
    TypeExpression

An aggregate type declaration has zero or more bases. If an aggregate type declaration has no bases clause, then it has zero bases; otherwise, the bases of the aggregate type declaration are the types identified by eachBase in that clause.

The list of bases of a struct declaration must consist of:

at most one struct type
zero or more interface types

The list of bases of a class declaration must consist of:

at most one class type
zero or more interface types

The list of bases of an interface declaration must consist of:

zero or more interface types

15.4.2.1. Traditional Syntax

15.4.2.1.1. Trailing Semicolon

For compatibility, the body clause of an aggregate type declaration may end with a semicolon (;). The body clause of an aggregate type declaration must end with a semicolon if there are any tokens between the closing } token and the next line break that follows it.

Note: Put more simply: a closing ; can be left off of an aggregate type declaration so long as there is nothing but trivia after it on the same line.

15.4.2.1.2. Aggregate Type Specifiers

An aggregate type declaration may be used as a type specififer in declarations using traditional syntax.

When an aggregate type declaration is used as a type specififer, the closing } of its body must be followed by another token on the same line.

15.4.3. `enum` Declarations

EnumDeclaration
    `enum` Identifier
    GenericParametersClause?
    BasesClause?
    GenericWhereClause?
    EnumBody `;`?

EnumBody
    TraditionalEnumBody
    | DeclarationBody

An enum declaration introduces a binding for a type whose instances value into one or more cases.

An enum declaration that has a declaration body is a modern enum declaration. A enum declaration that has a traditional enum body is a traditional enum declaration.

An enum declaration may have a bases clause, in which case the list of bases must consist of:

at most one proper type; if present, this determines an underlying type for the enum declaration.
zero or more interface types

A traditional enum declaration always has an underlying type. If an underlying type is not specified in the bases list of a traditional enum declaration, then the underlying type of that declaration is Int. The underlying type of a traditional enum declaration must be a built-in integer type.

A modern enum declaration must not specify an underlying type.

15.4.3.1. Cases

CaseDeclaration :
    `case` Identifier InitialValueClause?
        
TraditionalEnumBody :
    `{` (TraditionalCaseDeclaration `,`)* `}`

TraditionalCaseDeclaration :
    Identifier InitialValueClause?

The possible values of an enum type are determined by the case declarations in its body. An enum type has one case for each case declaration in its body.

The case declarations in a modern enum declaration are declared using the case keyword. The case declarations in a traditional enum body are traditional case declarations.

Each case of a traditional enum declaration has an underlying value, that is a value of the underlying type of that enum declaration. When a traditional case declaration includes an initial-value expression, its underlying value is the result of evaluating that expression, with the underlying type of the enum declaration as the expected type. The underlying value of a traditional case declaration must be an compile-time-constant integer value. If a traditional case declaration does not have an initial-value expression, then its underlying value is one greater than the underlying value of the preceding case declaration, or zero if there is no preceding case declaration.

Case declarations are implicitly static.

15.4.4. Type Aliases

A type alias declaration introduces a binding that resolves to some other type.

TypeAliasDeclaration :
    ModernTypeAliasDeclaration
    | TraditionalTypeAliasDeclaration

ModernTypeAliasDeclaration :
    `typealias` Identifier
    GenericParametersClause?
    GenericWhereClause?
    `=` TypeExpression `;`

TODO: type alias declaration

GIVEN identifier n, type t, context c
GIVEN t checks in c
THEN `typealias` n = t `;` checks in c

15.4.4.1. Traditional Syntax

A type alias may also be declared using a traditional type alias declaration.

TraditionalTypeAliasDeclaration :
    `typedef` TypeSpecifier Declarator `;`

TODO: traditional type alias declaration

To desguar a traditional type alias declaration typedef ts d ;, where ts is a type specifier and d is a declarator:

Unwrap d for ts to yield type expression t and optional name declarator nd
If nd is not present then:
- diagnose an error
Otherwise:
- Let n be the name of nd
- Return the modern type alias declaration typealias n = t ;

15.4.5. Associated Types

An associated type declaration introduces a type requirement to an interface declaration.

AssociatedTypeDeclaration
    `associatedtype` Identifier
    GenericParametersClause?
    BasesClause?
    GenericWhereClause? `;`

An associated type declaration may only appear as a member declaration of an interface declaration.

An associated type is an interface requirement, and different implementations of an interface may provide different types that satisfy the same associated type interface requirement.

15.5. Traditional Declarations

Declarator :
    NameDeclarator
    | ParenthesizedDeclarator
    | ArrayDeclarator
    | PointerDeclarator

NameDeclarator : Identifier

ParenthesizedDeclarator: `(` Declarator `)`

ArrayDeclarator: Declarator `[` (Expression `,`)* `]`

PointerDeclarator: `*` PointerQualifier* Declarator

Unwrapping a declarator for some type expression yields a type expression and an optional name declarator.

The unwrapping of a name declarator nd for a type expression t is (t, nd).

The unwrapping of a parenthesized declarator ( d ) for a type expression t is the unwrapping of d for t.

The unwrapping of an array declarator d [ args ] for a type expression t is the unwrapping of d for Array<t, args>.

The unwrapping of a pointer declarator * q d for a type expression t is the unwrapping of d for Ptr<(q t)>

15.6. Variables

VariableDeclaration :
    (`let` | `var`) Identifier
        TypeAscriptionClause?
        InitialValueClause? `;`

TypeAscriptionClause :
    `:` TypeExpression

InitialValueClause :
    `=` Expression

A variable is an abstract storage location that can hold a value. Every variable has a type, and it can only hold values of that type. Every variable is either immutable or mutable.

A variable declaration introduces a binding for the given Identifier, to a variable. A variable declaration using the let keyword binds an immutable variable. A variable declaration using the var keyword binds a mutable variable.

Every variable declaration has a type, and the variable it binds is of the same type.

A variable declaration must have either a TypeAscriptionClause or an InitialValueClause; a variable declaration may have both.

If a variable declaration has a type ascription, then let T be the type that results from evaluating the type expression of that type ascription in the input environment of the declaration.

If a variable declaration has a TypeAscriptionClause and no InitialValueClause, then the type of that variable declaration is T.

If a variable declaration has both a TypeAscriptionClause and an InitialValueClause, then let E be the value that results from evaluating the expression of the initial value clause against the expected type T in the input environment of the declaration. The type of the variable declaration is the type of E, and its initial value is E.

If a variable declaration has an InitialValueClause but no TypeAscriptionClause, then let E be the value that results from evaluating the expression of the initial value clause in the input environment of the declaration. The type of the variable declaration is the type of E, and its initial value is E.

15.6.1. Traditional Variable Declarations

Variable declaration may also be declared using traditional syntax, which is similar to C:

VariableDeclaration :
    LegacyVariableDeclaration

LegacyVariableDeclaration :
    TypeSpecifier InitDeclarator * `;`

InitDeclarator :
    Declarator InitialValueClause ?

To desugar a traditional variable declaration typeSpecifier initDeclarators ;:

Let variable result be an empty sequence of declarations
Let ts be a fresh name
Append typealias ts = typeSpecifier ; to result
For each i in initDeclarators:
- Let d be the declarator of i
- Unwrap d for ts to yield n and t
- Let iv be the initial value clause of i
- Let v be var n : t iv ;
- Append v to result
Return result

The above desugaring logic is ignoring modifiers. It also isn’t making sure to use a let declaration where possible (because t ends up being const).

A legacy variable declaration introduces a binding for each InitDeclarator. Let S be the type that results from evaluating the TypeSpecifier in the input environment. The type T corresponding to each declarator is determined by evaluating the declarator in the input environment, against input type S.

Note: Slang does not support the equivalent of the auto type specifier in C++. Variables declared using let and var can be used to fill a similar role.

The variables introduced by a legacy variable declaration are immutable if the declaration has a const modifier; otherwise it is mutable.

15.6.2. Variables at Global Scope

A variable declaration at global scope may be either a global constant, a static global variable, or a global shader parameter.

Variables declared at global scope may be either a global constant, a static global variables, or a global shader parameters.

15.6.2.1. Global Constants

A variable declared at global scope and marked with static and const is a global constant.

A global constant must have an initial-value clause, and the initial-value expression must be a compile-time constant expression.

Need a section to define what a "compile-time constant expression" is.

15.6.2.2. Static Global Variables

A variable declared at global scope and marked with static but not with const is a static global variable.

A static global variable provides storage for each strand executing an entry point. Writes to a static global variable from one strand do not affect the value seen by other strands.

Note: The semantics of static global variable are similar to a "thread-local" variable in other programming models.

A static global variable may include an initial-value expression; if an initial-value expression is included it is guaranteed to be evaluated and assigned to the variable before any other expression that references the variable is evaluated. If a thread attempts to read from a static global variable during the evaluation of the initial-value expression for that variable, a runtime error is raised.

There is no guarantee that the initial-value expression for a static global variable is evaluated before entry point execution begins, or even that the initial-value expression is evaluated at all (in cases where the variable might not be referenced at runtime).

Note: The above rules mean that an implementation can perform dead code elimination on static global variables, and can choose between eager and lazy initialization of those variables at its discretion.

15.6.2.3. Global Shader Parameters

A variable declared at global scope and not marked with static is a global shader parameter.

Global shader parameters are used to pass arguments from application code into strands executing an entry point. The mechanisms for parameter passing are specific to each target platform.

A global shader parameter may include an initial-value expression, but such an expression does not affect the semantics of the compiled program.

Note: An implementation can choose to provide ways to query the initial-value expression of a global shader parameter, or to evaluate it to a value. Host applications can use such capabilities to establish a default value for global shader parameters.

TODO: We need to define uniform shader parameters somewhere.

15.6.3. Variables at Function Scope

Variables declared at function scope (in the body of a function, initializer, subscript acessor, etc.) may be either a function-scope constant, function-scope static variable, or a local variable.

15.6.3.1. Function-Scope Constants

A variable declared at function scope and marked with both static and const is a function-scope constant. Semantically, a function-scope constant behaves like a global constant except that its name is only bound in the local scope.

15.6.3.2. Function-Scope Static Variables

A variable declared at function scope and marked with static (but not const) is a function-scope static variable. Semantically, a function-scope static variable behaves like a global static variable except that its name is only visible in the local scope.

The initial-value expression for a function-scope static variable may refer to non-static variables in the body of the function. In these cases initialization of the variable is guaranteed not to occur until at least the first time the function body is evaluated for a given invocation.

15.6.3.3. Local Variables

A variable declared at function scope and not marked with static (even if marked with const) is a local variable. A local variable has unique storage for each activation of a function.

Note: When a function is called recursively, each call produces a distinct activation with its own copies of local variables.

15.7. Functions

A function declaration introduces a binding to a function.

FunctionDeclaration :
    `func` Identifier
    GenericParametersClause?
    ParametersClause
    GenericWhereClause?
    ResultTypeClause?
    FunctionBodyClause

A function declaration is invocable.

15.7.1. Result Type

ResultTypeClause :
    `->` TypeExpression

If a function declaration has a result type clause, then the result type of the function is the type that results from evaluating the type expression in that clause. If a function declaration does not have a result type clause, then the result type of the function is the unit type.

15.7.2. Body

FunctionBodyClause :
    BlockStatement
    | `;`

A function declaration may have a body. A function declaration with a body is a function definition.

If the function body clause is a block statement, then that statement is the body of the function definition. If the function body clause is ;, then that function declaration has no body.

15.7.3. Traditional Function Declarations

A traditional function declaration is a function declaration that uses traditional C-like syntax.

TraditionalFunctionDeclaration :
    TypeSpecifier Declarator
    GenericParametersClause?
    TraditionalParametersClause
    GenericWhereClause?
    FunctionBodyClause

TraditionalParametersClause :
    `(` (TraditionalParameterDeclaration `,`) * `)`

TraditionalParameterDeclaration :
    Direction TypeSpecifier Declarator DefaultValueClause?

TODO: traditional function declaration

15.7.4. Entry Points

An entry point declaration is a function declaration that can be used as the starting point of execution for a thread.

TODO: reference entry point declaration.

15.8. Constructors

A constructor declaration introduces logic for initializing an instance of the enclosing type declaration. A constructor declaration is implicitly static.

ConstructorDeclaration :
    `init`
    GenericParametersClause?
    ParametersClause
    FunctionBodyClause

A constructor declaration is invocable. A constructor declaratin has an implicit this parameter, with a direction of out.

Note: Slang does not provide any equivalent to C++ destructors, which run automatically when an instance goes out of scope.

15.9. Properties

A property declaration introduces a binding that can have its behavior for read and write operations customized.

PropertyDeclaration :
    `property`
    Identifier
    GenericParametersClause?
    TypeAscriptionClause
    GenericWhereClause
    PropertyBody

PropertyBody
    `{` AccessorDecl* `}`

TODO: reference property declaration.

15.9.1. Accessors

AccessorDecl
    GetAccessorDecl
    SetAccessorDecl

GetAccessorDecl
    `get` FunctionBodyClause

SetAccessorDecl
    `set` ParametersClause? FunctionBodyClause

An accessor declaration is invocable.

15.10. Subscripts

A subscript declaration introduces a way for the enclosing type to be used as the base expression of a subscript expression.

SubscriptDeclaration :
    `subscript`
    GenericParametersClause?
    ParametersClause
    ResultTypeClause
    GenericWhereClause?
    PropertyBody

A subscript declaration is invocable.

Note: Unlike a function declaration, a subscript declaration cannot elide the result type clause.

15.11. Extensions

An extension declaration is introduced with the extension keyword:

extension MyVector
{
    float getLength() { return sqrt(x*x + y*y); }
    static int getDimensionality() { return 2; }
}

An extension declaration adds behavior to an existing type. In the example above, the MyVector type is extended with an instance method getLength(), and a static method getDimensionality().

An extension declaration names the type being extended after the extension keyword. The body of an extension declaration may include type declarations, functions, initializers, and subscripts.

Note: The body of an extension cannot include variable declarations. An extension cannot introduce members that would change the in-memory layout of the type being extended.

The members of an extension are accessed through the type that is being extended. For example, for the above extension of MyVector, the introduced methods are accessed as follows:

MyVector v = ...;

float f = v.getLength();
int n = MyVector.getDimensionality();

An extension declaration need not be placed in the same module as the type being extended; it is possible to extend a type from third-party or standard-library code. The members of an extension are only visible inside of modules that import the module declaring the extension; extension members are not automatically visible wherever the type being extended is visible.

An extension declaration may include an inheritance clause:

extension MyVector : IPrintable
{
    ...
}

The inheritance clause of an extension declaration may only include interfaces. When an extension declaration lists an interface in its inheritance clause, it asserts that the extension introduces a new conformance, such that the type being extended now conforms to the given interface. The extension must ensure that the type being extended satisfies all the requirements of the interface. Interface requirements may be satisfied by the members of the extension, members of the original type, or members introduced through other extensions visible at the point where the conformance was declared.

It is an error for overlapping conformances (that is, of the same type to the same interface) to be visible at the same point. This includes cases where two extensions declare the same conformance, as well as those where the original type and an extension both declare the same conformance. The conflicting conformances may come from the same module or difference modules.

In order to avoid problems with conflicting conformances, when a module M introduces a conformance of type T to interface I, one of the following should be true:

The type T is declared in module M
The interface I is declared in module M

Any conformance that does not follow these rules (that is, where both T and I are imported into module M) is called a retroactive conformance, and there is no way to guarantee that another module _N_ will not introduce the same conformance. The runtime behavior of programs that include overlapping retroactive conformances is currently undefined.

Currently, extension declarations can only apply to structure types.

15.12. Generics

Many kinds of declarations can be made generic: structure types, interfaces, extensions, functions, initializers, and subscripts.

A generic declaration introduces a generic parameter list enclosed in angle brackets <>:

T myFunction<T>(T left, T right, bool condition)
{
    return condition ? left : right;
}

15.12.1. Generic Parameters

A generic parameter list can include one or more parameters separated by commas. The allowed forms for generic parameters are:

A single identifier like T is used to declare a generic type parameter with no constraints.
A clause like T : IFoo is used to introduce a generic type parameter T where the parameter is constrained so that it must conform to the IFoo interface.
A clause like let N : int is used to introduce a generic value parameter N, which takes on values of type int.

Note: The syntax for generic value parameters is provisional and subject to possible change in the future.

Generic parameters may declare a default value with =:

T anotherFunction<T = float, let N : int = 4>(vector<T,N> v);

For generic type parameters, the default value is a type to use if no argument is specified. For generic value parameters, the default value is a value of the same type to use if no argument is specified.

15.12.2. Explicit Specialization

A generic is specialized by applying it to generic arguments listed inside angle brackets <>:

anotherFunction<int, 3>

Specialization produces a reference to the declaration with all generic parameters bound to concrete arguments.

When specializing a generic, generic type parameters must be matched with type arguments that conform to the constraints on the parameter, if any. Generic value parameters must be matched with value arguments of the appropriate type, and that are specialization-time constants.

An explicitly specialized function, type, etc. may be used wherever a non-generic function, type, etc. is expected:

int i = anotherFunction<int,3>( int3(99) );

15.12.3. Implicit Specialization

If a generic function/type/etc. is used where a non-generic function/type/etc. is expected, the compiler attempts implicit specialization. Implicit specialization infers generic arguments from the context at the use site, as well as any default values specified for generic parameters.

For example, if a programmer writes:

int i = anotherFunction( int3(99) );

The compiler will infer the generic arguments <int, 3> from the way that anotherFunction was applied to a value of type int3.

Note: Inference for generic arguments currently only takes the types of value arguments into account. The expected result type does not currently affect inference.

15.12.4. Syntax Details

The following examples show how generic declarations of different kinds are written:

T genericFunction<T>(T value);
funct genericFunction<T>(value: T) -> T;

__init<T>(T value);

__subscript<T>(T value) -> X { ... }

struct GenericType<T>
{
    T field;
}

interface IGenericInterface<T> : IBase<T>
{
}

Note: Currently there is no user-exposed syntax for writing a generic extension.

15.13. Traditional Buffer Declarations

A traditional buffer declaration is a shorthand for declaring a global-scope shader parameter.

TraditionalBufferDeclaration :
    (`cbuffer` | `tbuffer`) Identifier DeclarationBody `;`?

Given a traditional buffer declaration, an implementation shall behave as if the declaration were desugared into:

A struct declaration with some unique name S, and with the declaration body of the buffer declaration.
A global shader parameter declaration with some unique name P, where the type of the parameter is ConstantBuffer<S> in the case of a cbuffer declaration, or TextureBuffer<S> in the case of a tbuffer declaration.

For each member declared in S, a traditional buffer declaration introduces a binding that refers to that member accessed through P.

16. Statements

A statement is an entity in the abstract syntax that describes actions to be taken by a thread. Statements are executed for their effect, rather than evaluated to produce a value.

Statement :
  ExpressionStatement
  | DeclarationStatement
  | BlockStatement
  | EmptyStatement
  | IfStatement
  | SwitchStatement
  | CaseStmt
  | ForStatement
  | WhileStatement
  | DoWhileStatement
  | BreakStatement
  | ContinueStatement
  | ReturnStatement
  | DiscardStatement
  | LabeledStatement

16.1. Expression Statement

An expression statement evaluates an expression, and then ignores the resulting value.

ExpressionStatement :
  Expression `;`

GIVEN context c, expression e
GIVEN e synthesizes type t in context c
THEN e `;` checks in context c

An implementation may diagnose a warning when execution of an expression statement cannot have side effects.

16.2. Declaration Statement

A declaration statement introduces a declaration into the current scope.

DeclarationStatement :
  Declaration

GIVEN context c, declaration d
GIVEN declaration d checks in context c
THEN statement d checks in context c

Only the following types of declarations may be used in a declaration statement:

VariableDeclaration

16.3. Block Statement

A block statement executes each of its constituent statements in order.

BlockStatement :
  `{` Statement* `}`

GIVEN context c
GIVEN statements s0,s1,...
GIVEN context d, a fresh sub-context of c
GIVEN s0,s1,... check in d
THEN `{` s0,s1,... `}` checks in c

Note: Declarations in a block statement are visible to later statements in the same block, but not to earlier statements in the block, or to code outside the block.

16.4. Empty Statement

Executing an empty statement has no effect.

EmptyStatement :
  `;`

GIVEN context c
THEN `;` checks in c

16.5. Conditional Statements

16.5.1. If Statement

An if statement executes a sub-statement conditionally.

IfStatement :
  `if` `(` IfCondition `)`
  Statement
  ElseClause?

IfCondition :
  Expression
  | LetDeclaration

ElseClause :
  `else` Statement

GIVEN context c, expression e, statement t
GIVEN e checks against `Bool` in c
GIVEN t checks in C
THEN `if(` e `)` t checks in c

GIVEN context c, expression e, statement t
GIVEN e checks against `Bool` in c
GIVEN t checks in C
GIVEN f checks in C
THEN `if(` e `)` t `else` f checks in c

If the condition of an if statement is an expression, then it is evaluated against an expected type of Bool to yield a value C. If C is true, then the ThenClause is executed. If C is false and there is a ElseClause, then it is executed.

If the condition of an if statement is a let declaration, then that declaration must have an initial-value expression. That initial-value expression is evaluated against an expected type of Optional<T>, where T is a fresh type variable, to yield a value D. If D is Some(|C|), then the ThenClause is executed, in an environment where the name of the let declaration is bound to C. If D is null and there is a ElseClause, then it is executed.

16.5.2. Switch Statement

A switch statement conditionally executes up to one of its alternatives, based on the value of an expression.

SwitchStatement :
  `switch` `(` Expression `)`
  `{` SwitchAlternative+ `}`

SwitchAlternative :
  SwitchAlternativeLabel+ Statement+

SwitchAlternativeLabel :
  CaseClause
  | DefaultClause

CaseClause :
  `case` Expression `:`

DefaultClause
  `default` `:`

GIVEN context c, expression e, switch alternatives a0,a1,...
GIVEN e synthesizes type t in c
GIVEN a0,a1,... check against t in c
THEN `switch(` e `) {` a0,a1,... `}` checks in c

Note: A switch statement is checked by first checking the expression, and then checking each of the alternatives against the type of that expression.

GIVEN context c, type t,
GIVEN switch alternative labels l0,l1,...
GIVEN statements s0,s1,...
GIVEN l0,l1,... check against t in c
GIVEN s0,s1,... check in c
THEN switch alternative l0,l1,... s0,s1,... checks in c

GIVEN context c, type t, expression e
GIVEN e checks against t in c
THEN `case` e `:` checks against t in c

GIVEN context c, type t
THEN `default:` checks against t in c

Note: A case clause is valid if its expression checks against the type of the control expressio nof the switch statement.

A switch statement may have at most one default clause.

If the type of the controlling expression of a switch statement is a built-in integer type, then:

The expression of each case clause must be a compile-time constant expression.
The constant value of the expression for each case clause must not be equal to that of any other case clause in the same switch.

Each alternative of a switch statement must exit the switch statement via a break or other control transfer statement. "Fall-through" from one switch case clause to another is not allowed.

Note: Semantically, a switch statement is equivalent to an "if cascade" that compares the value of the conditional expression against each case clause,

16.6. Loop Statements

A loop statement executes a body statement one or more times.

16.6.1. For Statement

ForStatement :
  `for` `(`
  initial:Statement? `;`
  conditional:Expression? `;`
  sideEffect:Expression? `)`
  body:Statement

GIVEN init checks in c
GIVEN cond checks against `Bool` in c
GIVEN iter synthesizes type t in c
GIVEN body checks in c
THEN `for(init;cond;iter) body` checks in c

The checking judgements above aren’t complete because they don’t handle the case where _cond_ is absent, in which case it should be treated like it was true.

16.6.2. While Statement

WhileStatement :
  `while` `(`
  conditional:Expression `)`
  body:Statement

GIVEN cond checks against `Bool` in c
GIVEN body checks in c
THEN `while(cond) body` checks in c

16.6.3. Do-While Statement

A do-while statement uses the following form:

DoWhileStatement :
  `do` body:Statement
  `while` `(` conditional:Expression `)` `;`

GIVEN body checks in c
GIVEN cond checks againt `Bool` in c
THEN `do body while(cond);` checks in c

These simplified prose checking rules are leaving out all the subtlties of sub-contexts, etc.

16.7. Control Transfer Statements

16.7.1. `break` Statement

A break statement is used to transfer control out of the context of some enclosing statement. A break statement may optionally provide a label, to identify the enclosing statement.

BreakStatement :
  `break` label:Identifier? `;`

A [break statement] without a [label] transfers control to after the end of the closest lexically enclosing [switch statement] or [loop statement].

A [break statement] with a [label] transfers control to after the end of the lexically enclosing [switch statement] or [loop statement] labeled with a matching [label].

GIVEN context c
GIVEN lookup of BREAK_LABEL in c yields label l
THEN `break;` checks in context c

GIVEN context c, label l
GIVEN c contains an entry of the form BREAK_LABEL = l
THEN `break l;` checks in context c

Issue: The checking rules for break-able statements should add a suitable item to the context used for checking their body statements, which will be matched by the rules above.

We also need to define the context-containment rule that is being used to look up the break item in the context.

16.7.2. Continue Statement

ContinueStatement :
  `continue` label:Identifier? `;`

A continue statement transfers control to the start of the next iteration of a loop statement. In a for statement with a side effect expression, the side effect expression is evaluated when continue is used:

GIVEN context c
GIVEN lookup of CONTINUE_LABEL in c yields label l
THEN `continue;` checks in context c

GIVEN context c, label l
GIVEN c contains an entry of the form CONTINUE_LABEL = l
THEN `continue l;` checks in context c

16.7.3. Return Statement

A return statement transfers control out of the current function.

ReturnStatement :
  `return` Expression? `;`

GIVEN context c, expression e
GIVEN lookup of RESULT_TYPE in c yields type t
GIVEN e checks against t in c
THEN `return e;` checks in c

GIVEN context c
GIVEN lookup of RESULT_TYPE in c yield type t
GIVEN `Unit` is a subtype of t
THEN `return;` checks in c

Note: A return statement without an expression is equivalent to a return of a value of type Unit.

16.7.4. Discard Statement

DiscardStatement :
  `discard` `;`

A discard statement may only be used in the context of a fragment shader, in which case it causes the current invocation to terminate and the graphics system to discard the corresponding fragment so that it does not get combined with the framebuffer pixel at its coordinates.

Operations with side effects that were executed by the invocation before a discard will still be performed and their results will become visible according to the rules of the platform.

GIVEN context c
GIVEN capability `fragment` is available in c
THEN `discard;` checks in c

The intent of the above rule is that checking a discard statement will add the fragment capability to the context, so that it can be checked against the capabilities of the surrounding function/context. However, the way the other statement rules handle the context, they do not allow for anything in the context to flow upward/outward in that fashion. It may be simplest to have the process of collecting the capabilities required by a function body be a different judgement than the type checking rules.

17. Expressions

Expressions are terms that can be evaluated to produce values. This section provides a list of the kinds of expressions that may be used in a Slang program.

We need a place to define a term and note how it can be either an expression or a type expression.

A specialize expression might occur in either the expression or type expression grammar.

In general, the order of evaluation of a Slang expression proceeds from left to right. Where specific expressions do not follow this order of evaluation, it will be noted.

Some expressions can yield l-values, which allows them to be used on the left-hand-side of assignment, or as arguments for out or in out parameters.

17.1. Literal Expressions

Literal expressions are never l-values.

17.1.1. Integer Literal Expressions

An integer literal expression consists of a single IntegerLiteral token.

\DerivationRule{
  \begin{trgather}
      \CheckConforms{\ContextVarA,\alpha}{\alpha}{\code{IFromIntegerLiteral}}{\ContextVarB}
  \end{trgather}  
}{
  \SynthExpr{\ContextVarA}{i}{\alpha}{\ContextVarB}
}

To check an unsuffixed integer literal lit against type T:

Validate that T conforms to IFromIntegerLiteral, yielding conformance witness w
Let f be a declaration reference to IFromIntegerLiteral.init looked up through w
Return the checked expression f ( lit ) : T

We need a description of how suffixed integer literals have their type derived from their suffix.

17.1.2. Floating-Point Literal Expressions

A floating-point literal expression consists of a single FloatingPointLiteral token.

\DerivationRule{
  \begin{trgather}
      \CheckConforms{\ContextVarA,\alpha}{\alpha}{\code{IFromFloatingPointLiteral}}{\ContextVarB}
  \end{trgather}  
}{
  \SynthExpr{\ContextVarA}{f}{\alpha}{\ContextVarB}
}

Note: An unsuffixed floating-point literal synthesizes a type that is a fresh type variable $\alpha$, constrained to conform to \code{IFromFloatingPointLiteral}.

We need a description of how suffixed floating-point literals have their type derived from their suffix.

17.1.3. Boolean Literal Expressions

Note: Boolean literal expressions use the keywords true and false.

BooleanLiteralExpression
      `true` | `false`

The expression true resolves to the typed Boolean value true : Bool.

The expression false resolves to the typed Boolean value false : Bool.

17.1.4. String Literal Expressions

Note: A string literal expressions consists of one or more string literal tokens in a row.

StringLiteralExpression
      StringLiteral+

\DerivationRule{
}{
  \SynthExpr{\ContextVarA}{\overline{s}}{\code{String}}{\ContextVarA}
}\\

17.2. Identifier Expressions

IdentifierExpression
      Identifier
      | \code{operator} Operator

\DerivationRule{
      \SynthLookup{\ContextVarA}{name}{type}
}{
  \SynthExpr{\ContextVarA}{name}{type}{\ContextVarA}
}\vspace{1em}

\DerivationRule{
      \CheckLookup{\ContextVarA}{name}{type}{\ContextVarB}
}{
  \CheckExpr{\ContextVarA}{name}{type}{\ContextVarB}
}\\

This presentation delegates the actual semantics of identifier expressions to the \textsc{Lookup} judgement, which needs to be explained in detail.

%\begin{verbatim} %When evaluated, this expression looks up someName in the environment of the %expression and yields the value of a declaration with a matching name. % %An identifier expression is an l-value if the declaration it refers to is mutable. % % ### Overloading ### {overload} % %It is possible for an identifier expression to be _overloaded_, such that it %refers to one or more candidate declarations with the same name. %If the expression appears in a context where the correct declaration to use can be %disambiguated, then that declaration is used as the result of the name %expression; otherwise use of an overloaded name is an error at the use site. % % ### Implicit Lookup ### {lookup.implicit} % %It is possible for a name expression to refer to nested declarations in two ways: % %* In the body of a method, a reference to someName may resolve to this.%someName, using the implicit this parameter of the method % %* When a global-scope cbuffer or tbuffer declaration is used, someName may %refer to a field declared inside the cbuffer or tbuffer %\end{verbatim}

17.3. Member Expression

MemberExpression}
      Expression `.` Identifier

The semantics of member lookup are similar in complexity to identifier lookup (and indeed the two share a lot of the same machinery). In addition to all the complications of ordinary name lookup (including overloading), member expressions also need to deal with:

Implicit dereference of pointer-like types.
Swizzles (vector or matrix).
Static vs. instance members.

In both synthesis and checking modes, the base expression should first synthesize a type, and then lookup of the member should be based on that type. \end{Incomplete}

%\begin{verbatim} %When base is a structure type, this expression looks up the field or other %member named by m. %Just as for an identifier expression, the result of a member expression may be %overloaded, and might be disambiguated based on how it is used. % %A member expression is an l-value if the base expression is an l-value and the %member it refers to is mutable. % % ### Implicit Dereference ### {dereference.implicit} % %If the base expression of a member reference is a _pointer-like type_ such as %ConstantBuffer<T>, then a member reference expression will implicitly %dereference the base expression to refer to the pointed-to value (e.g., in the %case of ConstantBuffer<T> this is the buffer contents of type T). % % ### Vector Swizzles ### {swizzle.vector} % %When the base expression of a member expression is of a vector type vector<T,N> %then a member expression is a _vector swizzle expression_. %The member name must conform to these constraints: % %* The member name must comprise between one and four ASCII characters %* The characters must be come either from the set (x, y, z, w) or (r, %g, b, a), corresponding to element indics of (0, 1, 2, 3) %* The element index corresponding to each character must be less than N % %If the member name of a swizzle consists of a single character, then the %expression has type T and is equivalent to a subscript expression with the %corresponding element index. % %If the member name of a swizzle consists of M characters, then the result is a %vector<T,M> built from the elements of the base vector with the corresponding %indices. % %A vector swizzle expression is an l-value if the base expression was an l-value %and the list of indices corresponding to the characeters of the member name %contains no duplicates. % % ### Matrix Swizzles ### {swizzle.matrix} % %> Note: The Slang implementation currently doesn’t support matrix swizzles. % % ### Static Member Expressions ### {member.static} % %When the base expression of a member expression is a type instead of a value, the %result is a _static member expression_. %A static member expression can refer to a static field or static method of a %structure type. %A static member expression can also refer to a case of an enumeration type. % %A static member expression (but not a member expression in general) may use the %token :: instead of . to separate the base and member name: % %hlsl %// These are equivalent %Color.Red %Color::Red % %\end{verbatim}

17.4. This Expression

ThisExpression
      \code{this

\DerivationRule{
}{
  \SynthExpr{\ContextVarA}{\code{this}}{\code{This}}{\ContextVarB}
}

Note: In contexts where a this expression is valid, it refers to the implicit instance of the closest enclosing type declaration. The type of a this expression is always \code{This}.

This section needs to deal with the rules for when this is mutable vs. immutable.

17.5. Parenthesized Expression

Note: An expression wrapped in parentheses (()) is a parenthesized expression and evaluates to the same value as the wrapped expression.

ParenthesizedExpression := 
      `(` Expression `)`

If expression e resolves to er then the parenthesized expression ( e ) resolves to er.

17.6. Call Expression

CallExpression
      Expression `(` (Argument `,`)* \code{)

  Argument
      Expression \\
      | NamedArgument

  NamedArgument
      Identifier `:` Expression

\DerivationRule{
      \CheckCall{\ContextVarA}{f}{\overline{args}}{type}{\ContextVarB}
}{
  \CheckExpr{\ContextVarA}{f `(` \overline{args} `)`}{type}{\ContextVarB}
}\vspace{1em}

\DerivationRule{
      \SynthCall{\ContextVarA}{f}{\overline{args}}{type}{\ContextVarB}
}{
  \SynthExpr{\ContextVarA}{f `(` \overline{args} `)`}{type}{\ContextVarB}
}

These rules just kick the can down the road and say that synthesis/checking for call expressions bottlenecks through the \textsc{Call} judgements.

%\begin{verbatim} %A _call expression_ consists of a base expression and a list of argument %expressions, separated by commas and enclosed in (): % %hlsl %myFunction( 1.0f, 20 ) % % %When the base expression (e.g., myFunction) is overloaded, a call expression can %disambiguate the overloaded expression based on the number and type or arguments %present. % %The base expression of a call may be a member reference expression: % %hlsl %myObject.myFunc( 1.0f ) % % %In this case the base expression of the member reference (e.g., myObject in this %case) is used as the argument for the implicit this parameter of the callee. % % ### Mutability ### {#expr.call.mutable} % %If a [mutating] instance is being called, the argument for the implicit this %parameter must be an l-value. % %The argument expressions corresponding to any out or in out parameters of the %callee must be l-values. %\end{verbatim}

17.7. Subscript Expression

SubscriptExpression
  Expression \code{[} (Argument `,`)* \code{]}

To a first approximation, a subscript expression like \code{base[a0, a1]} is equivalent to something like \code{base.subscript(a0, a1)}. That is, we look up the subscript members of the \code{base} expression, and then check a call to the result of lookup (which might be overloaded).

Unlike simple function calls, a subscript expression can result in an l-value, based on what accessors the subscript declaration that is selected by overload resolution has.

%A subscript expression invokes one of the subscript declarations in the type of %the base expression. Which subscript declaration is invoked is resolved based on %the number and types of the arguments. % %A subscript expression is an l-value if the base expression is an l-value and if %the subscript declaration it refers to has a setter or by-reference accessor. % %Subscripts may be formed on the built-in vector, matrix, and array types.

17.8. Initializer List Expression

InitializerListExpression
  `{` (Argument `,`)* `}`

If the sequence of arguments args resolves to resolvedArgs, then the initializer list expression { args } resolves to the resolved initializer list expression { resolvedArgs }.

Note: An initializer-list expression can only appear in contexts where it will be coerced to an expected type.

17.9. Cast Expression

CastExpression
  `(` TypeExpression `)` Expression

Note: A cast expression attempts to coerce an expression to a desired type.

To resolve a cast expression ( t ) e:

Let checkType be the result of checking t as a type
Let checkedExpr be the result of checking e against checkType
Return checkedExpr

The above rule treats a cast exprssion as something closer to a type ascription expression, where it expects the underlying expression to be of the desired type, or something implicitly convertible to it.

In contrast, we want a cast expression to be able to invoke \emph{explicit} conversions as well, which are currently not something the formalism encodes.

\begin{Legacy}

17.9.1. Legacy: Compatibility Feature

As a compatiblity feature for older code, Slang supports using a cast where the base expression is an integer literal zero and the target type is a user-defined structure type:

MyStruct s = (MyStruct) 0;

The semantics of such a cast are equivalent to initialization from an empty initializer list:

MyStruct s = {};

\end{Legacy}

17.10. Assignment Expression

AssignmentExpression
      \SynVar[destination]{Expression} `=` \SynVar[source]{Expression}

\DerivationRule{
      \begin{trgather}
      \SynthExpr{\ContextVarA}{dst}{`out` type}{\ContextVarB
      \CheckExpr{\ContextVarB}{src}{type}{\ContextVarB
      \end{trgather}
}{
  \SynthExpr{\ContextVarA}{dst `=` src}{`out` type}{\ContextVarB}
}

\DerivationRule{
      \begin{trgather}
      \CheckExpr{\ContextVarA}{dst}{`out` type}{\ContextVarB
      \CheckExpr{\ContextVarB}{src}{type}{\ContextVarB
      \end{trgather}
}{
  \CheckExpr{\ContextVarA}{dst `=` src}{type}{\ContextVarB}
}

Note: Assignment expressions support both synthesis and checking judgements.

In each case, the destination} expression is validated first, and then the source} expression.

The above rules pretend that we can write out before a type to indicate that we mean an l-value of that type. We will need to expand the formalism to include \emph{qualified} types.

17.11. Operator Expressions

17.11.1. Prefix Operator Expressions

This section defines the terms: prefix operator.

PrefixOperatorExpression
      PrefixOperator Expression

  PrefixOperator
      \code{+} // identity
      | `-` // arithmetic negation
      | \code{\~} // bit\-wise Boolean negation
      | \code{!} // Boolean negation
      | \code{++} // increment in place
      | \code{--} // decrement in place

\DerivationRule{
      \begin{trgather}
      \SynthCall{\ContextVarA}{\code{operator}op}{expr}{type}{\ContextVarB
      \end{trgather}
}{
  \SynthExpr{\ContextVarA}{op\ expr}{type}{\ContextVarB}
}

\DerivationRule{
      \begin{trgather}
          \CheckCall{\ContextVarA}{\code{operator}op}{expr}{type}{\ContextVarB
      \end{trgather}
}{
  \CheckExpr{\ContextVarA}{op\ expr}{type}{\ContextVarB}
}

Note: A prefix operator expression is semantically equivalent to a call expression to a function matching the operator, except that lookup for the function name only considers function declarations marked with the \code{prefix} modifier.

The notation here needs a way to express the restrictions on lookup that are used for prefix/postfix operator names.

17.12. Postfix Operator Expressions

PostfixOperatorExprssion
      Expression PostfixOperator}

  PostfixOperator
      | \code{++} // increment in place
      | \code{--} // decrement in place

\DerivationRule{
      \begin{trgather}
      \SynthCall{\ContextVarA}{\code{operator}op}{expr}{type}{\ContextVarB
      \end{trgather}
}{
  \SynthExpr{\ContextVarA}{expr op}{type}{\ContextVarB}
}

\DerivationRule{
      \begin{trgather}
          \CheckCall{\ContextVarA}{\code{operator}op}{expr}{type}{\ContextVarB
      \end{trgather}
}{
  \CheckExpr{\ContextVarA}{expr op}{type}{\ContextVarB}
}

Note: Postfix operator expressions have similar rules to prefix operator expressions, except that in this case the lookup of the operator name will only consider declarations marked with the \code{postfix} modifier.

17.12.1. Infix Operator Expressions

The syntax here should introduce the term: infix expression.

  InfixOperatorExpression
        Expression InfixOperator Expression

    InfixOperator
        | \code{*} // multiplication
        | `/` // division
        | \code{\%} // remainder of division
        | \code{+} // addition
        | `-` // subtraction
        | \code{<<} // left shift
        | \code{>>} // right shift
        | `<` // less than
        | `>` // greater than
        | \code{<=} // less than or equal to
        | \code{>=} // greater than or equal to
        | \code{==} // equal to
        | \code{!=} // not equal to
        | \code{&} // bitwise and
        | \code{^} // bitwise exclusive or
        | \code{|} // bitwise or
        | \code{&&} // logical and
        | \code{||} // logical or
        | \code{+=}    // compound add/assign
        | \code{-=}      // compound subtract/assign
        | \code{*=}      // compound multiply/assign
        | \code{/=}      // compound divide/assign
        | \code{\%=}      // compound remainder/assign
        | \code{<<=}     // compound left shift/assign
        | \code{>>=}     // compound right shift/assign
        | \code{&=}      // compound bitwise and/assign
        | \code{\|=}     // compound bitwise or/assign
        | \code{^=}      // compound bitwise xor/assign
%        | `=`      // assignment
%        | `,`    // sequence

%TODO: need to get the precedence groups from this table into the grammar rules. % %| Operator | Kind | Description | %|-----------|-------------|-------------| %| * | Multiplicative | multiplication | %| / | Multiplicative | division | %| % | Multiplicative | remainder of division | %| + | Additive | addition | %| - | Additive | subtraction | %| << | Shift | left shift | %| >> | Shift | right shift | %| < | Relational | less than | %| > | Relational | greater than | %| <= | Relational | less than or equal to | %| >= | Relational | greater than or equal to | %| == | Equality | equal to | %| != | Equality | not equal to | %| & | BitAnd | bitwise and | %| ^ | BitXor | bitwise exclusive or | %| \| | BitOr | bitwise or | %| && | And | logical and | %| \|\| | Or | logical or | %| += | Assignment | compound add/assign | %| -= | Assignment | compound subtract/assign | %| *= | Assignment | compound multiply/assign | %| /= | Assignment | compound divide/assign | %| %= | Assignment | compound remainder/assign | %| <<= | Assignment | compound left shift/assign | %| >>= | Assignment | compound right shift/assign | %| &= | Assignment | compound bitwise and/assign | %| \|= | Assignment | compound bitwise or/assign | %| ^= | Assignment | compound bitwise xor/assign | %| = | Assignment | assignment | %| , | Sequencing | sequence |

\DerivationRule{
      \begin{trgather}
      \SynthCall{\ContextVarA}{\code{operator}op}{left right}{type}{\ContextVarB
      \end{trgather}
}{
  \SynthExpr{\ContextVarA}{left op right}{type}{\ContextVarB}
}

\DerivationRule{
      \begin{trgather}
          \CheckCall{\ContextVarA}{\code{operator}op}{left right}{type}{\ContextVarB
      \end{trgather}
}{
  \CheckExpr{\ContextVarA}{left op right}{type}{\ContextVarB}
}

With the exception of the assignment operator (=), an infix operator expression like left + right is equivalent to a call expression to a function of the matching name operator+(left, right).

17.12.2. Conditional Expression

ConditionalExpression
      \SynVar[condition]{Expression} \code{?} \SynVar[then]{Expression} `:` \SynVar[else]{Expression}

To check the conditional expression cond ? t : e against expected type T:

Let checkedCond be the result of checking cond against Bool
Let checkedThen be the result of checking t against T
Let checkedElse be the result of checking e against T
Return the checked expression checkedCond ? checkedThen : checkedElse

To check the conditional expression ce:

Extend the context with a fresh type variable T
Return the result of checking ce against T

18. Attributes

This chapter needs to list the built-in attributes that impact the language semantics, and also to specify the rules for how user-define attributes are defined and used.

19. Visibility Control

This chapter needs to explain the rules for:

What declarations in a module are visible to modules that import it?
What are the modifiers that can be used to control visibility (public, internal, private), and what are their semantics.
What is the inferred visibility of a declaration without an explicit modifier?
What are the rules about what modifiers are legal in what contexts?
E.g., a public field should not be allowed in an internal type.
E.g., a public function should not be allowed to have a non-public type used in its signature.
How is visibility controlled for interface conformances?

20. Type Linearization

This chapter needs to explain how, given a type, an implementation should derive the \emph{linearization} of that type into an ordered list of \emph{facets}. Each facet corresponds to some DeclBody, and thus has a set of declarations in it that may bind certain names.

The linearization of a type should include:

A facet for the type itself, which will include the declarations explicitly written inside the body of that type’s DeclBody.
One facet for each transitive base of the type, including both concrete types it inherits from and interfaces it conforms to.
One facet for each extension that is both visible in the context where linearization is being performed \emph{and} applicable to the type.

Note that base facets are included even for bases that are introduced via an extension. Similarly, extension facets are included for extensions that might apply to the type through one of its bases.

The set of facets for a type will always be a subset of those for each of its bases.

The order of facets is significant, because it affects the relative priority of the bindings in those facets for lookup. In the simplest terms, facets earlier in the list will be prioritized over those later in the list. Thus, derived types should always appear before their bases. In general, facets for extensions should appear after the facet for the type declaration that the extension applies to.

(It is possible that a simple \emph{sequence} of facets will not actually be sufficient, and a DAG representation might be needed, if forcing a total ordering on facets would risk unintuitive lookup results)

There is good precedent in the PL world for this kind of linearization, which is sometimes referred to as the \emph{method resolution order} (MRO). A particular solution, known as the \emph{C3 linearization algorithm} is often recommended as the one to use, and it is what the Slang compiler currently implements.

21. Interfaces

This chapter needs to document the rules for what it means for a type to \emph{conform} to an interface, and how the compiler should check for conformance.

The biggest challenge in specifying the rules for conformance is that the Slang compiler does not require an \emph{exact} match between the requirement in an interface and the concrete declaration that satisfies it:

A field can satisfy a property requirement.
A non-\code{throws} function can satisfy a \code{throws} requirement
A non-[mutating] function can satisfy a [mutating] requirement
A function with default arguments can satisfy a requirement without those arguments
A generic function may satisfy a non-generic requirement
A generic declaration with fewer constraints may satisfy a generic requirement with more constraints

In the limit, we can say that a type satisfies a requirement if a \emph{synthesized} declaration that exactly matches the requirement signature would type-check successfully. The body of a synthesized get accessor for a property \code{p} would consist of something like \code{return this.p;}. Similarly, the body of a synthesized func for a function \code{f} would consist of something like \code{return this.f(a0, a1, ...);} where the values \code{a0}, etc., are the declared parameters of the synthesized function.

The main place where the above approach runs into wrinkles is around synthesis for generic requirements, where subtle semantic choices need to be made.

Ideally, checking of conformances is something that can be done after checking of all explicit function bodies (statements/expressions) in a module. Other checking rules can \emph{assume} that a type conforms to an interface if it is declared to, knowing that the error will be caught later if it doesn’t.

The important case where conformances can’t be checked late is if we support inference (not just synthesis) of declarations to satisfy requirements. Basically, if we want to support inference of associatedtypes, then only way to infer that type is by looking for the declarations that satisfy other requirements.

(Conveniently, checking of requirements also doesn’t require looking at function bodies: only signatures)

This chapter is probably also the right place to define the rules for canonicalization of interface conjunctions.

22. Generics

This chapter needs to define the key semantic-checking operations that relate to generics, including:

Inference of \emph{implied constraints} on a generic declaration, based on the signature of that declaration.
Validation and canonicalization of constraints on generic declarations.
Validation of explicit specialization of a generic (e.g., G<A,B>), which includes validating that the constraints on the generic are satisfied by the parameters.
Inference of generic arguments in contexts where a generic function is called on value arguments (without first specializing it).

23. Lookup

This chapter needs to document the rules for how names are looked up in two cases:

Looking up a standalone Identifier as an expression or type expression, we need to look up in the current context.
When looking up a name in some type, whether explicitly as part of a MemberExpression}, or implicitly as part of lookup through this inside a type body.

The first kind of lookup is easily expressed as a recursion over the structure of contexts, but will often bottleneck through lookup in the context of some type (since code is so often nested under a type). This kind of lookup also needs to deal with import declarations and the visibility rules for source units in the same module as one another.

The second kind of lookup requires access to a linearization of the type. It potentially needs to deal with performing substitutions on the type/signature of a declaration that is found, so that it is adjusted for the \code{This} type it has been looked up through.

Both kinds of lookup need to deal with the possibility that a name will be \emph{overloaded}. At this level, lookup should probably return the full set of all \emph{candidates} when overloading occurs, but retain enough structure in the lookup results that overload resolution can evaluate which should be prioritized over the others, based (e.g., picking members from a more-derived type over those from a base type).

The lookup rules here should be able to handle lookup of init, and subscript declaration via a synthesized name that cannot conflict with other identifiers (in case, e.g., a user has a struct with an ordinary field named \texttt{subscript}).

Lookup will also need to interact with visibility, by stripping out non-public bindings from imported modules.

Note that this chapter does not need to deal with extension declarations, as they are handled as part of type linearization.

One important kind of subtlety to lookup is that upon finding a \emph{first} in-scope declaration of a name, we can inspect the category of that declaration to determine if overloading is even possible. That is, if the first declaration found is in a category of declaration that doesn’t support overloading (basically anything other than a func, init, or subscript), then that result can be used as-is as the result of lookup.

24. Overload Resolution

This chapter needs to describe how to disambiguate an overloaded lookup result. Overload resolution comes up primarily in two places:

When the typing rules attempt to coerce an overloaded expression to some type.
When attempting to call an overloaded expression as a function.

(Hypothetically it can also arise for a generic specialization F<A,B,...> as well, so the rules need to be ready for that case)

The first of the two cases is the easier one by far, with an outline of it’s semantics being:

\emph{Filter} the candidates to only those that can be coerced successfully to the desired type.
Select the best candidate among the remaining ones, based on priority.
The relative "distance" of declarations needs to factor in (more derived vs. more base, imported vs. local, etc.)
Additionally, the relative cost of the type conversions applied (if any) may be relevant.

The second case (overloaded calls) is superficially similar, in that we can filter the candidates to the applicable ones and then pick the best. The definition of best'' for an overloaded call is more subtle and needs care to not mess up developer expectations for things like simple infix expressions (since event infix \code{operator+} is handled as an overloaded call, semantically). A few basic principles that guide intuition: * For every candidate we should know the conversion/cost associated with each argument position, as well as the conversion/cost associated with the function result position (if we are in a checking rather than synthesis context). * One overload candidate is strictly better than another if it is better at at least one position (one argument, or the function result), and not worse at any position. * When choosing between candidates that are equally good, a few factors play in: * The relative "distance" of declarations (as determined by lookup) * Which candidates required defaulting of arguments (and how many) * Candidates that required implicit specialization of a generic vs. those that didn’t * For two candidates that required implicit specialization of a generic, which of them is more general'' than the other.

Generally, a function $f$ is ``more general'' than another function $g$ if for every set of arguments that $g$ is applicable to, $f$ is also applicable, but not vice versa.

Also: it doesn’t exactly belong in this chapter, but something needs to document the rules for when two function declarations are considered to have the same signature (or at least conflicting signatures), so that errors can be diagnosed when trying to redefine a function.

25. Subtyping

This chapter needs to define what the subtyping rules are, as something distinct from the conversion/coercion rules (which need their own chapter).

At the most basic, we want to have a guarantee that if \code{T} is a subtype of \code{S}, then it is safe and meaningful to read an \code{S} from a memory location that holds a \code{T}. Beyond that, we need to decide whether it a subtype (in the formal semantics) must have the same size and stride as the supertype, or it if is allowed to be larger.

The key reason to have the subtyping rules kept distinct from the conversion/coercion rules is that many built-in types (as well as the conceptual/intermediate types that are needed for semantic checking) are covariant type constructors, where the covariance must be defined in terms of true subtyping and not just convertability. E.g., a borrow T} can be converted to a borrow S} only if \code{T} is truly a subtype of \code{S}, and not just convertible to it.

Note that the is-a-subtype-of'' relationship between two types is distinct from the conforms-to'' relationship between a proper type and an interface type. The ``conforms-to'' relationship might need to be documented in this chapter as well, just because it shares a lot of structurally similar machinery.

(In fact, it may be best to have a judgement that is parameterized over the type relationship that is being checked: conformance, subtyping, implicit coercion, explicit conversion.)

26. Type Conversions

Broadly this chapter needs to do two things:

Define the semantic rules for implicit type coercion and explicit type conversion.
Specify the implicit coercions that are defined for the basic Slang types, and their costs.

The latter part (the builtin implicit coercions and their costs) can be a simple table, once the rules are properly defined.

From a big-picture perspective, the implicit type coercion rules when coercing something of type From to type To are something like:

If From and To are identical types, then coercion succeeds, with no cost.
If From is a subtype of To, then coercion succeeds, with a cost that relates to how "far" apart the types are in an inheritance hierarchy.
If neither of the above applies, look up init declarations on type To, filtered to those that are marked as suitable for implicit coercion, and then perfom overload resolution to pick the best overload. The cost is the cost that was marked on the chosen init declaration.

The overload resolution rules rely on checking implicit type coercions, and the implicit type coercion rules can end up checking an overloaded init call. The specification must explain how to avoid the apparant possibility of infinite recursion. The simplest way to avoid such problems is to specif that the overload resolution for the init call above uses a modified form where only subtyping, and not implicit coercion, is allowed for arguments.

Explicit type conversion follows the same overall flow as the implicit case, except it does not filter the candidates down to only those marked as usable for implicit coercion.

27. Automatic Differentiation

This chapter needs to specify:

The built-in types and attributes related to automatic differentiation.
The rules applied when checking the body of a [Differentiable] function.
The semantics of the fwd_diff and bwd_diff operators, including how to determine the signature of the function they return.
The requirements of the IDifferentiable interface, and also the rules used to automatically synthesize conformance for it.

Slang Language Specification

Living Standard, 3 March 2025

Abstract

1. Introduction

1.1. Scope

1.2. Conformance

1.3. The Slang Standard Library

1.4. Document Conventions

1.4.1. Terminology

1.4.2. Typographical Conventions

1.4.3. Meta-Values and Meta-Types

1.4.4. Meta-Variables

1.4.5. Patterns

1.4.6. Characteristics of Meta-Values

1.4.7. Callouts

1.4.8. Traditional and Legacy Features

1.5. Context-Free Grammars

1.5.1. Notation

1.5.1.1. Productions

1.5.1.2. Terminals

1.5.1.3. Nonterminals

1.5.1.4. Terminals

1.5.1.5. Optionals

1.5.1.6. Sequences

1.5.1.7. Grouping

1.5.1.8. Exclusion

1.5.1.9. Characteristics

2. Lexical Structure

2.1. Source Units

2.2. Encoding

2.3. Phases

2.4. Lexemes

2.4.1. Trivia

2.4.1.1. Whitespace

2.4.1.1.1. Line Breaks

2.4.1.2. Comments

2.4.2. Tokens

2.4.2.1. Identifiers

2.4.2.2. Literals

2.4.2.2.1. Numeric Literals

2.4.2.3. Integer Literals

2.4.2.4. Floating-Point Literals

2.4.3. Text Literals

2.4.3.1. String Literals

2.4.3.2. Character Literals

2.5. Operators and Punctuation

2.6. Associating Trivia With Tokens

3. Preprocessor

3.1. Changes

3.2. Extensions

4. Parsing

4.1. Contexts

4.2. Strategies

4.2.1. Ordered

4.2.2. Unordered

4.3. Angle Brackets

4.3.1. Opening Angle Brackets

4.3.1.1. Ordered Mode

4.3.1.2. Unordered Mode

4.3.2. Closing Angle Brackets

5. Types and Values

5.1. Level

5.2. Types of Types

5.3. Scalars

5.3.1. Unit

5.3.2. Booleans

5.3.3. Numeric Scalars

5.3.3.1. Integers

5.3.3.2. Floating-Point Numbers

5.3.4. Text

5.3.4.1. Unicode Scalar Values

5.3.4.2. Characters

5.3.4.3. Strings

5.4. Finite Sequences

5.5. Array Types

5.6. Vectors

5.7. Matrices

5.8. The Never Type

5.9. Declaration References

5.9.1. Direct Declaration References

5.8. The `Never` Type

5.11. `struct` Types

5.12. `class` Types

5.13. `enum` Types

5.17. `any` Types