# SSOQL - Super Simple Object Query Language

What if there was SQL for interacting with Javascript objects organized into a table? Introducing SSOQL, a simple custom made language for querying javascript objects that's inspired by SQL's syntax. Designed to have AI interact with a tabular dataset without having it directly generating code. 

What SSOQL is not:
- Not a high performance or optimized library meant for databases
- Not a database at all - there's no inserting data
- Not intended to replace full-featured query languages for complex applications
- Not meant to query objects not organized into a tabular area.

[![TypeScript](https://img.shields.io/badge/TypeScript-Ready-blue)](https://www.typescriptlang.org/)

## Implementation

SSOQL is implemented in TypeScript with a simple parser and execution engine that supports the full language specification described below. The implementation includes:

- A parser that processes USE, QUERY, and operation statements
- Support for variables with $ prefix
- SELECT operations with WHERE clause filtering
- Aggregate operations like SUM, COUNT, DIVIDE, PERCENT_OF, and MOST_FREQUENT
- Multiple query blocks with individual results

## Documentation

After importing the library, you can create queries by using `ssoql.createQuery(query)` which returns a `SSOQLQuery` object. This does not run the query yet. You can see the object paths the query needs via `ssoqlQuery.expectedObjects()`, which returns the paths of required objects. You can run the query and get the results by calling `ssoqlQuery.execute(data)` with your data object. Each keyword in the query language runs a specific operation on the current context, determined by USE and SELECT clauses.

```typescript
import ssoql from 'ssoql';

// Create a query
const query = ssoql.createQuery(`
  USE data.users
  
  QUERY activeUsers
  COUNT SELECT users WHERE (active = "true")
  RETURN
`);

// See what objects the query needs
const requiredObjects = query.expectedObjects();
// Returns: ['data.users']

// Execute the query with data
const results = query.execute({
  data: {
    users: [
      { id: 1, name: "Alice", active: "true" },
      { id: 2, name: "Bob", active: "false" }
    ]
  }
});
// Returns: { activeUsers: 1 }
```

## Examples for 12sAI
1. In Week 1 versus the Broncos, did the Seahawks give up more yards per play when Jarren Reed was on the field or when Dre'Mont Jones was on the field?
```sql
USE y2024.w1.D.[plays, yards, dPlayers]

QUERY reedYPP
SELECT yards WHERE (dPlayers CONTAINS 90)
$reedYards SUM
$reedPlays COUNT SELECT plays WHERE (dPlayers CONTAINS 90)
DIVIDE $reedYards $reedPLays
RETURN

QUERY montYPP
SELECT yards WHERE (dPlayers CONTAINS 55)
$montYards SUM
$montPlays COUNT SELECT plays WHERE (dPlayers CONTAINS 55)
DIVIDE $montYards $montPLays
RETURN


```
Expected Result: {reedYPP:3.35,montYPP: 4.56}

2. In Week 1 versus the Broncos, what percentage of the Broncos run plays 
from under center featured a condensed formation?

```sql
USE y2024.w1.D.[plays, runPass, underCenterOrGun, offensiveFormation]

QUERY runs
PERCENT_OF SELECT plays WHERE (runPass = "R" & offensiveFormation CONTAINS "Condensed"), SELECT plays WHERE (runPass = "R")

RETURN
```
Expected Result: {runs:75}

3. Did the Seahawks have a higher undercenter pass percentage against the Broncos or the Patriots?
```sql
USE y2024.[w1, w2].O.[plays, runPass, underCenterOrGun]

QUERY pass
PERCENT_OF SELECT plays WHERE (runPass="R" & underCenterOrGun="G"), SELECT plays WHERE (runPass="R")

RETURN
```
Expected Result: {pass:{w1:75,w2:88.8}}

4. In Week 1 versus the Broncos, what was the Broncos favorite coverage? What was the percentage of that coverage?
```sql
USE y2024.w1.O.[plays, coverage]

QUERY favCoverage
$mostOccuringCoverage MOST_FREQUENT SELECT coverage
RETURN //Returns the contents of the variable - the contents of that select are still the context

QUERY perCoverage
PERCENT_OF SELECT plays WHERE (coverage=$mostOccuringCoverage), SELECT plays // Variables are always global
RETURN
```
Expected Result: {favCoverage:"Cover 1", perCoverage: 46}

5. Against the Patriots, did the Seahwaks average more yards on Zone runs or gap runs?
```sql
USE y2024.w2.O.[plays, runType, yards]

QUERY zoneAvg
SELECT yards WHERE (runType = "Zone")
$zoneYards SUM
$zonePlays COUNT SELECT plays WHERE (runType = "Zone")
DIVIDE $zoneYards $zonePlays
RETURN

QUERY gapAvg
SELECT yards WHERE (runType = "Gap")
$gapYards SUM
$gapPlays COUNT SELECT plays WHERE (runType = "Gap")
DIVIDE $gapYards $gapPlays
RETURN
```
Expected Result: {zoneAvg:2.83, gapAvg:2.00}

6. How many yards did the Seahawks average on play action in Week 2?
```sql
USE y2024.w2.O.[plays, playType, yards]

QUERY paAvg
SELECT yards WHERE (playType = "Y")
$paYards SUM
$paPlays COUNT SELECT plays WHERE (playType = "PlayAction")
DIVIDE $paYards $paPlays
RETURN
```
Expected Result: {paAvg:2}

7. In what percentage of passing plays did the Broncos blitz the Seahawks?
```sql
USE y2024.w1.D.[plays, runPass, blitz]

QUERY blitzPercent
PERCENT_OF SELECT plays WHERE (runPass = "P" & blitz = "Y"), SELECT plays WHERE (runPass = "P")
RETURN
```
Expected Result: {blitzPercent: 50}

8. On Julian Love's interception against the Broncos, what coverage and front were the Seahawks in?
```sql
USE y2024.w1.O.[coverage, front, playOutcume, actuator]

QUERY int
SELECT [coverage, front] WHERE (actuator = 20 & playOutcome="Interception")
RETURN
```
Expected Result: {int:{coverage: "Quarters", front:"4-0"}}


## Language Design

### Core Structure
SSOQL follows a simple structure with named query blocks that contain operations on data. Queries can return strings, booleans, integers or arrays of those values.

### Assumed Data
Lots of data is assumed to reduce complexity. For example, if a query uses a key name for something far down in USE context with multiple contexts, it will return the query ran on all of the possible assumed fields. For example,
```
USE [y2024, y2025].[w1, w2].O.plays

QUERY numPlays
COUNT SELECT plays
RETURN
```
Expected Result: {y2024: {w1: 51, w2: 62}, y2025: {w1:42, w2:74}}

Since we use plays, we assume to run the query on all possible source objects and returns an object based on those possible source objects.

### Keywords and Operations

#### USE
A keyword to pull in data from the javascript object. Specifies which data sources and fields to use in the query. Use can be... used... anywhere outside of a QUERY statement.

Syntax:
```
USE x.x.[key1, key2, ...]
```

Examples:
```
USE y2024.w1.D.[plays, yards, dPlayers]
USE y2024.[w1, w2].O.[plays, runPass, underCenterOrGun]
```

When using data, it will pull in ALL the available data possible. For the second USE in the example, it will pullin both week 1 and week 1 plays, runPass, and underCenterOrGun

#### QUERY
Defines a named block of operations that process data. Multiple query blocks can exist in a single SSOQL string. 

Syntax:
```
QUERY queryName
  operations...
RETURN
```

#### SELECT
Retrieves data from the current context based on specified criteria.

Syntax:
```
SELECT [EACH] field(s) [WHERE (condition)]
```

Examples:
```
SELECT yards WHERE (dPlayers CONTAINS 90)
SELECT [coverage, front] WHERE (actuator = 20 & playOutcome="Interception")
```

#### WHERE
Filters data based on conditions. Used with SELECT to specify which data to retrieve.

Syntax for conditions:
- Equality: `field = value`
- Inequality: `field != value`
- Greater than: `field > value`
- Less than: `field < value`
- Greater than or equal to: `field >= value`
- Less than or equal to: `field <= value`
- Contains: `field CONTAINS value`
- Not contains: `field NOT_CONTAINS value`
- Logical AND: `condition1 & condition2`
- Logical OR: `condition1 | condition2`
- Logical NOT: `!(condition)`

#### EACH 
For selecting something inside an array of objects. 

Syntax:
```
SELECT EACH name
```

Example for the data products:[{name:"Phone"}, {name:"Computer"}]
```
USE products
SELECT EACH name
```
Expected output: ["Phone", "Computer"]

#### Variables
Variables store intermediate results for later use. Variable names are prefixed with $.

Syntax:
```
$variableName operation
```

Examples:
```
$reedYards SUM
$mostOccuringCoverage MOST_FREQUENT SELECT coverage
```

Variables are global across all query blocks in a statement.

#### Operations

##### Data Aggregation
- `SUM`: Calculates the sum of values
- `COUNT`: Counts the number of items
- `DIVIDE`: Divides one value by another
- `MULTIPLY`: Multiplies two values together
- `SUBTRACT`: Subtracts the second value from the first
- `AVERAGE`: Calculates the average (mean) of values
- `MEDIAN`: Finds the median value
- `MIN`: Finds the minimum value
- `MAX`: Finds the maximum value
- `PERCENT_OF`: Calculates what percentage the first set is of the second set
- `MOST_FREQUENT`: Finds the most frequently occurring value
- `LEAST_FREQUENT`: Finds the least frequently occurring value
- `UNIQUE`: Returns a list of unique values
- `STANDARD_DEVIATION`: Calculates the standard deviation of values
- `VARIANCE`: Calculates the variance of values
- `RANGE`: Calculates the difference between the maximum and minimum values

Examples:
```
$reedYards SUM
$reedPlays COUNT SELECT plays WHERE (dPlayers CONTAINS 90)
DIVIDE $reedYards $reedPlays
PERCENT_OF SELECT plays WHERE (runPass = "P" & blitz = "Y"), SELECT plays WHERE (runPass = "P")
```

#### RETURN
Marks the end of a query block and returns the current context or the value of the last operation.

### Comments
Comments can be added with double slashes. They provide documentation but don't affect query execution.

Example:
```
RETURN //Returns the contents of the variable - the contents of that select are still the context
```

### Data Types
- Strings: Represented with double quotes (e.g., `"R"`, `"PlayAction"`)
- Numbers: Represented without quotes (e.g., `90`, `55`)
- Booleans: Represented as `true` or `false` (case-sensitive)
- Arrays: Fields can contain arrays, which can be checked with the CONTAINS or NOT_CONTAINS operators
- Null: Represented as `null` (case-sensitive)

### Execution Model
Each query is executed in sequence. The query's result is determined by the last operation before the RETURN statement. When multiple query blocks exist, the results are combined into a single object with keys matching the query names.

## AI Use
AI helped me to generate tests and the starting structure for this project. It generated the starting code for src/ that I redesigned, refactored, and reviewed.