{"doc_id": "GSQLTutorial", "doc_type": "markdown", "content": "\n\n# Introduction \n\nThis GSQL tutorial introduces new users to TigerGraph\u2019s graph query language. GSQL supports schema design, data loading, and querying, offering Turing-completeness for Agentic AI applications such as real-time task dependency management and hybrid data retrieval for GraphRAG.\n\nGSQL V3 syntax aligns with the [2024 ISO GQL](https://www.iso.org/standard/76120.html) standard, incorporating ASCII art and OpenCypher pattern matching.\n\nTo follow this tutorial, install the TigerGraph Docker image (configured with 8 CPUs and 20 GB of RAM or at minimum 4 CPUs and 16 GB of RAM) or set up a Linux instance with Bash access. Download our free [Community Edition](https://dl.tigergraph.com/) to get started.\n\n---\n# Table of Contents\n\n- [Sample Graph](#sample-graph-for-tutorial)\n- [Set Up Environment](#set-up-environment)\n- [Set Up Schema (model)](#set-up-schema)\n- [Load Data](#load-data)\n- [1-Block Query Examples](#1-block-query-examples)\n - [SELECT BLOCK](#select-block)\n - [SELECT BLOCK WITH VARIABLES](#select-block-with-variables)\t\n- [Stored Procedure Query Examples](#stored-procedure-query-examples)\n - [Two Flavors of SELECT](#two-flavors-of-select)\n - [Node Pattern](#node-pattern)\n - [Edge Pattern](#edge-pattern)\n - [Path Pattern](#path-pattern)\n - [Pattern Summary](#pattern-summary)\n- [Advanced Topics](#advanced-topics)\n - [Accumulators](#accumulators)\n - [Accumulator Operators](#accumulator-operators)\n - [Global vs Vertex Attached Accumulator](#global-vs-vertex-attached-accumulator)\n - [ACCUM vs POST-ACCUM](#accum-vs-post-accum)\n - [Edge Attached Accumulator](#edge-attached-accumulator)\n - [Vertex Set Variables And Accumulators As Composition Tools](#vertex-set-variables-and-accumulators-as-composition-tools)\n - [Using Vertex Set Variables](#using-vertex-set-variables)\n - [Using Accumulators](#using-accumulators)\n - [Flow Control](#flow-control)\n - [IF Statement](#if-statement)\n - [WHILE Statement](#while-statement)\n - [FOREACH Statement](#foreach-statement)\n - [CONTINUE and BREAK Statement](#continue-and-break-statement)\n - [CASE WHEN Statement](#case-when-statement)\n - [DML](#dml)\n - [Update Attribute](#update-attribute)\n - [Insert Edge](#insert-edge)\n - [Delete Element](#delete-element)\n - [Vertex Set Operators](#vertex-set-operators)\n - [Union](#union)\n - [Intersect](#intersect)\n - [Minus](#minus)\n - [Vector Search](#vector-search)\n - [OpenCypher Query](#opencypher-query)\n - [Virtual Edge](#virtual-edge)\n - [REST API For GSQL](#rest-api-for-gsql)\n - [Query Tuning And Debug](#query-tuning-and-debug)\n - [Batch Processing to Avoid OOM](#batch-processing-to-avoid-oom)\n - [Debug Using PRINT Statement](#debug-using-print-statement)\n - [Debug Using LOG Statement](#debug-using-log-statement)\n - [Explore Catalog](#explore-catalog)\n - [Experimental Features](#experimental-features)\n - [Table](#table)\n - [Init Table Statement](#init-table-statement)\n - [Order Table Statement](#order-table-statement)\n - [Filter Table Statement](#filter-table-statement)\n - [Project Table Statement](#project-table-statement)\n - [Join Statement](#join-statement)\n - [Union Statement](#union-statement)\n - [Union All Statement](#union-all-statement)\n - [Unwind Statement](#unwind-statement) \n - [Support](#support)\n - [Contact](#contact)\n - [References](#references)\n\n---\n# Sample Graph For Tutorial\nThis graph is a simplifed version of a real-world financial transaction graph. There are 5 _Account_ vertices, with 8 _transfer_ edges between Accounts. An account may be associated with a _City_ and a _Phone_.\nThe use case is to analyze which other accounts are connected to 'blocked' accounts.\n\n\n\n---\n# Set Up Environment \n\nIf you have your own machine (including Windows and Mac laptops), the easiest way to run TigerGraph is to install it as a Docker image. Download [Community Edition Docker Image](https://dl.tigergraph.com/). Follow the [Docker setup instructions](https://github.com/tigergraph/ecosys/blob/master/demos/guru_scripts/docker/README.md) to set up the environment on your machine. \n\n**Note**: TigerGraph does not currently support the ARM architecture and relies on Rosetta to emulate x86 instructions. For production environments, we recommend using an x86-based system.\nFor optimal performance, configure your Docker environment with **8 CPUs and 20+ GB** of memory. If your laptop has limited resources, the minimum recommended configuration is **4 CPUs and 16 GB** of memory.\n\nAfter installing TigerGraph, the `gadmin` command-line tool is automatically included, enabling you to easily start or stop services directly from your bash terminal.\n```python\n docker load -i ./tigergraph-4.2.0-alpha-community-docker-image.tar.gz # the xxx.gz file name are what you have downloaded. Change the gz file name depending on what you have downloaded\n docker images #find image id\n docker run -d -p 14240:14240 --name mySandbox imageId #start a container, name it \u201cmySandbox\u201d using the image id you see from previous command\n docker exec -it mySandbox /bin/bash #start a shell on this container. \n gadmin start all #start all tigergraph component services\n gadmin status #should see all services are up. If not, try gadmin start all again\n```\n\nFor the impatient, load the sample data from the tutorial/gsql folder and run your first query. \n```python\n cd tutorial/gsql/ \n gsql 00_schema.gsql #setup sample schema in catalog\n gsql 01_load.gsql #load sample data \n gsql #launch gsql shell\n GSQL> use graph financialGraph #enter sample graph\n GSQL> ls #see the catalog content\n GSQL> select a from (a:Account) #query Account vertex\n GSQL> select s, e, t from (s:Account)-[e:transfer]->(t:Account) limit 2 #query edge\n GSQL> select count(*) from (s:Account) #query Account node count\n GSQL> select s, t, sum(e.amount) as transfer_amt from (s:Account)-[e:transfer]->(t:Account) # query s->t transfer ammount\n GSQL> @01_load.gsql #use @filename in GSQL prompt to run a file containing gsql script\n GSQL> exit #quit the gsql shell \n```\nAs shown above, you have two ways to run gsql script. \n\n- Copy the gsql script in a file, say test.gsql, and in Bash command line, use `gsql test.gsql` to run\n- Enter GSQL prompt, and use one line gsql script or multiple line enclosed by `Begin` and `End` to run gsql\n```\n GSQL> Begin\n GSQL> line1 xxxx\n GSQL> line2 xxxx\n GSQL> ....\n GSQL> End\n```\nYou can also access the GraphStudio visual IDE directly through your browser:\n```python\n http://localhost:14240/\n```\n\nIf you are interested in using GUI IDE (which is different from GSQL shell below), here is a step-by-step guide on [GraphStudio](https://github.com/tigergraph/ecosys/blob/master/tutorials/GraphStudio.md)\n\n\nThe following command is good for operation.\n\n```python\n#To stop the server, you can use\n gadmin stop all\n#Check `gadmin status` to verify if the gsql service is running, then use the following command to reset (clear) the database.\n gsql 'drop all'\n```\n\n**Note that**, our fully managed service -- [TigerGraph Savanna](https://savanna.tgcloud.io/) is entirely GUI-based and does not provide access to a bash shell. To execute the GSQL examples in this tutorial, simply copy the query into the Savanna GSQL editor and click Run.\n\nAdditionally, all GSQL examples referenced in this tutorial can be found in your TigerGraph tutorials/gsql folder.\n\n---\n[Go back to top](#top)\n\n# Set Up Schema \nA graph schema describes the vertex types, edge types, and properties found in your graph. TigerGraph is a schema-first database, meaning that the schema is declared before loading data. This not only optimizes data storage and query performance, but it also provides built-in checks to make sure your data conformed to the expected schema.\n\nCopy [00_schema.gsql](./gsql/00_schema.gsql) to your container. \nNext, run the following in your container's bash command line. \n```\ngsql 00_schema.gsql\n```\nAs seen below, the declarative DDL creates vertex and edge types. Vertex type requires a `PRIMARY KEY`. Edge types requires `FROM` and `TO` vertex types as the key.\nMultiple edges of the same type can share endpoints. In such case, a `DISCRIMINATOR` attribute is needed to differentiate edges sharing the same pair of endpoints. If an edge type has the `REVERSE_EDGE` option, then that type is paired with a companion type so that every edge has a twin edge, sharing the same properties, except it runs in the opposite direction. You can put the following in a file, and invoke it under GSQL prompt `GSQL>@file.gsql`.\n\n```python\n//install gds functions\nimport package gds\ninstall function gds.**\n\n//create vertex types\nCREATE VERTEX Account ( name STRING PRIMARY KEY, isBlocked BOOL)\nCREATE VERTEX City ( name STRING PRIMARY KEY)\nCREATE VERTEX Phone (number STRING PRIMARY KEY, isBlocked BOOL)\n\n//create edge types\nCREATE DIRECTED EDGE transfer (FROM Account, TO Account, DISCRIMINATOR(date DATETIME), amount UINT) WITH REVERSE_EDGE=\"transfer_reverse\"\nCREATE UNDIRECTED EDGE hasPhone (FROM Account, TO Phone)\nCREATE DIRECTED EDGE isLocatedIn (FROM Account, TO City)\n\n//create graph; * means include all graph element types in the graph.\nCREATE GRAPH financialGraph (*)\n```\n\n[Go back to top](#top)\n\n---\n# Load Data \n\nNow that you have a graph schema, you can load data using one of the following methods. \n\n- Load sample data from our publicly accessible s3 bucket:\n \n Copy [01_load.gsql](./gsql/01_load.gsql) to your container. \n Next, run the following in your container's bash command line. \n ```\n gsql 01_load.gsql\n ```\n or in GSQL editor of TigerGraph Savanna, copy the content of [01_load.gsql](./gsql/01_load.gsql), and paste it into the GSQL editor to run.\n \n- Load from local file in your container\n - Copy the following data files to your container.\n - [account.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/account.csv)\n - [phone.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/phone.csv)\n - [city.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/city.csv)\n - [hasPhone.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/hasPhone.csv)\n - [locate.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/locate.csv)\n - [transfer.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/transfer.csv)\n\n - Copy [25_load2.gsql](./gsql/25_load2.gsql) to your container. Modify the script with your local file path. Next, run the following in your container's bash command line. \n ```\n gsql 25_load2.gsql\n ``` \n \n You can either use the Query Editor in the Savanna (cloud) by copying the content of [25_load2.gsql](./gsql/25_load2.gsql), and pasting it into the query editor to run it.\n\n Alternatively, navigate to the gsql folder and open a GSQL Shell in Bash by typing GSQL. Then, use @filename to execute a GSQL script file within the GSQL Shell.\n\n ```\n GSQL>@25_load2.gsql\n ```\n\n The declarative loading script is self-explanatory. You define the filename alias for each data source, and use the LOAD statement to map the data source to the target schema elements-- vertex types, edge types, and vector attributes.\n Copy the following in a gsql file-- load.gsql, And run it in bash shell `gsql load.gsql`.\n \n ```python\n USE GRAPH financialGraph\n\n DROP JOB load_local_file\n\n //load from local file\n CREATE LOADING JOB load_local_file {\n // define the location of the source files; each file path is assigned a filename variable. \n DEFINE FILENAME account=\"/home/tigergraph/data/account.csv\";\n DEFINE FILENAME phone=\"/home/tigergraph/data/phone.csv\";\n DEFINE FILENAME city=\"/home/tigergraph/data/city.csv\";\n DEFINE FILENAME hasPhone=\"/home/tigergraph/data/hasPhone.csv\";\n DEFINE FILENAME locatedIn=\"/home/tigergraph/data/locate.csv\";\n DEFINE FILENAME transferdata=\"/home/tigergraph/data/transfer.csv\";\n //define the mapping from the source file to the target graph element type. The mapping is specified by VALUES clause. \n LOAD account TO VERTEX Account VALUES ($\"name\", gsql_to_bool(gsql_trim($\"isBlocked\"))) USING header=\"true\", separator=\",\";\n LOAD phone TO VERTEX Phone VALUES ($\"number\", gsql_to_bool(gsql_trim($\"isBlocked\"))) USING header=\"true\", separator=\",\";\n LOAD city TO VERTEX City VALUES ($\"name\") USING header=\"true\", separator=\",\";\n LOAD hasPhone TO Edge hasPhone VALUES ($\"accnt\", gsql_trim($\"phone\")) USING header=\"true\", separator=\",\";\n LOAD locatedIn TO Edge isLocatedIn VALUES ($\"accnt\", gsql_trim($\"city\")) USING header=\"true\", separator=\",\";\n LOAD transferdata TO Edge transfer VALUES ($\"src\", $\"tgt\", $\"date\", $\"amount\") USING header=\"true\", separator=\",\";\n }\n\n run loading job load_local_file\n ```\n\n- Load from Iceberg table, or alternative Spark data sources, through [Tigergraph Spark Connector](https://docs.tigergraph.com/tigergraph-server/current/data-loading/load-from-spark-dataframe)\n- Please follow the Jupyter Notebook PySpark demo: [26_load_iceberg.ipynb](./gsql/26_load_iceberg.ipynb)\n \n[Go back to top](#top)\n\n---\n# 1-Block Query Examples \n\n**NOTE** This 1-Block feature is available since 4.2.0 version. Prior version does not support this feature. \n\n## SELECT BLOCK\n1-Block SELECT is a feature that offers an exploratory (interactive style) approach to querying data in a style similar to SQL or Cypher. This syntax enables users to write a single, concise select-from-where-accum statement on one line to retrieve data based on specified conditions. It also supports operations such as filtering, aggregation, sorting, and pagination, making it an excellent tool for ad hoc data inspection.\n\n### Basic Syntax\nThe basic syntax structure of `1-Block SELECT` is as follows:\n\n```python\nSELECT FROM WHERE \n```\nSimilar to SQL, graph queries use SELECT-FROM-WHERE clauses. The key difference is that graph queries generalize the FROM clause to allow patterns. A matched pattern is a table, each row in the table is a pattern instance with the binding variables (alias) specified in the FROM clause as columns. A pattern can represent a vertex, an edge, or a path. These patterns are expressed using ASCII art:\n\n- `()` represents a node. You can put alias (a variable bound to the node) in the front, and `:VertexName` in the suffix. E.g., `(a:Account)`. \n- `-[]->` represents an edge. You can put alias (a variable bound to the edge) in the front, and `:EdgeName` in the suffix.
E.g., `-[e:transfer]->`. \n- `()-[]->()-[]->()...` A path is formed by alternating nodes and edges.
E.g., `(a:Account)-[e:transfer]->(b:Account)-[e2:transfer]->(c:Account)`.\n\nThis pattern-based approach enables more \"declarative flavor\", and more flexible and expressive querying of graph data. \n\nYou can directly type *one liner* of the above syntax in GSQL shell to explore your data. The query will not be stored in Catalog as a procedure. \nOr, you can break the one line to multiple lines and enclose them with `BEGIN` and `END` as illustrated below. \n\n```python\nGSQL> BEGIN\nGSQL> SELECT \nGSQL> FROM \nGSQL> WHERE \nGSQL> END\n```\n\n### Examples\n\n#### SELECT with filters\n\n##### Using WHERE clause\n\n```python\nGSQL > use graph financialGraph\nGSQL > SELECT s FROM (s:Account) LIMIT 10\nGSQL > SELECT s FROM (s:Account {name: \"Scott\"})\nGSQL > SELECT s FROM (s:Account WHERE s.isBlocked)\nGSQL > SELECT s FROM (s:Account) WHERE s.name IN (\"Scott\", \"Steven\")\nGSQL > SELECT s, e, t FROM (s:Account) -[e:transfer]-> (t:Account) WHERE s <> t\n```\n\n##### Using HAVING\n\n```python\nGSQL > use graph financialGraph\nGSQL > SELECT s FROM (s:Account) -[e:transfer]-> (t:Account) having s.isBlocked\nGSQL > SELECT s FROM (s:Account) -[e:transfer]-> (t:Account) having s.isBlocked AND s.name = \"Steven\"\n```\n#### Aggregation SELECT\n\n```python\nGSQL > use graph financialGraph\nGSQL > SELECT COUNT(s) FROM (s:_)\nGSQL > SELECT COUNT(*) FROM (s:Account:City)\nGSQL > SELECT COUNT(DISTINCT t) FROM (s:Account)-[e]->(t)\nGSQL > SELECT COUNT(e), STDEV(e.amount), AVG(e.amount) FROM (s:Account)-[e:transfer|isLocatedIn]->(t)\nGSQL > SELECT a, sum(e.amount) as amount1 , sum(e2.amount) as amount2 FROM (a:Account)-[e:transfer]->(b:Account)-[e2:transfer]->(c:Account) GROUP BY a;\n```\n\n\n\n#### SELECT with Sorting and Limiting\n\n```python\nGSQL > use graph financialGraph\nGSQL > SELECT s FROM (s:Account) ORDER BY s.name LIMIT 3 OFFSET 1\nGSQL > SELECT s.name, e.amount as amt, t FROM (s:Account) -[e:transfer]-> (t:Account) ORDER BY amt, s.name LIMIT 1\nGSQL > SELECT DISTINCT type(s) FROM (s:Account:City) ORDER BY type(s)\n```\n\n#### Using some expression\n\n```python\n# Using mathematical expressions\nGSQL > use graph financialGraph\nGSQL > SELECT s, e.amount*0.01 AS amt FROM (s:Account {name: \"Scott\"})- [e:transfer]-> (t)\n\n# Using CASE expression\nGSQL > BEGIN\nGSQL > SELECT s, CASE WHEN e.amount*0.01 > 80 THEN true ELSE false END AS status \nGSQL > FROM (s:Account {name: \"Scott\"})- [e:transfer]-> (t)\nGSQL > END\n```\n\n[Go back to top](#top)\n\n---\n## SELECT BLOCK WITH VARIABLES\nWe can also pass variables to 1-Block SELECT with the prefix `LET...IN` construct. The variables defined between `LET` and `IN` can be primitive types or accumulators, enable flexible computation and aggregation.\n\n### Basic Syntax\n```python \nLET \n ;\nIN \n SELECT \n```\n\nThe ```LET...IN``` prefix construct in GSQL is used to define and work with variables and accumulators. It allows for the creation of temporary variables or accumulators in the LET block, which can be referenced later in the SELECT block, enabling more powerful and flexible queries. Specifically,\n\n- ```LET```: This keyword starts the block where you define variables and accumulators.\n- ``````: Inside the `LET` block, you can define variables with primitive types such as `STRING`, `INT`, `UINT`, `BOOL`, `FLOAT`, `DATETIME` and `DOUBLE`. However, container types like `SET`, `LIST`, and `MAP` are not supported at the moment. Inside the `LET` block, you can also define accumulators.\n- ```IN SELECT ```: The `SELECT` query block follows the `IN` keyword can use the variables and accumulators defined in the `LET` block.\n\n### Examples\n\n#### Primitive type variables\nSince `LET ... IN...SELECT` typically spans multiple-lines, we need to use `BEGIN...END` to support it in GSQL shell. Below, we define some primitive type varibles with assigned value, and use them in the `SELECT` query block as bind variables. \n\n```python\nGSQL > use graph financialGraph \nGSQL > BEGIN \nGSQL > LET\nGSQL > DOUBLE a = 500.0; \nGSQL > STRING b = \"Jenny\"; \nGSQL > BOOL c = false; \nGSQL > IN \nGSQL > SELECT s, e.amount AS amt, t\nGSQL > FROM (s:Account) - [e:transfer]-> (t:Account) \nGSQL > WHERE s.isBlocked = c AND t.name <> b \nGSQL > HAVING amt > a; \nGSQL > END\n```\n\nYou can also use this syntax in one line. E.g., \n\n```python\nGSQL > LET STRING n = \"Jenny\"; IN SELECT s, count(t) as cnt FROM (s:Account {name:n}) - [:transfer*0..2]-> (t:Account);\n```\n#### Accumulator type variables\n\nIn GSQL, **accumulators** are special state variables used to store and update values during query execution. They are commonly utilized for aggregating sums, counts, sets, and other values for a matched pattern. Accumulators can be categorized into **local accumulators** (specific to individual vertices) and **global accumulators** (aggregating values across all selected nodes).\nSee [Accumulators](#accumulators) for examples first.\n\n----------\n\n**Restrictions on 1-Block SELECT Clause with Accumulators**\n\nWhen accumulators are used in a 1-block query, the `SELECT` clause is subject to the following restrictions:\n\n**Only One Node Alias Allowed:**\n\n**If accumulators are used**, the `SELECT` clause must return only **one** node alias.\n\n- \u2705 Source node alias **Allowed:** `SELECT s FROM (s:Account)-[e:transfer]-(t:Account)`\n\n- \u2705 Target node alias **Allowed:** `SELECT t FROM (s:Account)-[e:transfer]-(t:Account)`\n\n- \u274c Relationship alias **Not Allowed:** `SELECT e FROM (s:Account)-[e:transfer]-(t:Account)`\n\n- \u274c Multiple node alias **Not Allowed:** `SELECT s, t FROM (s:Account)-[e:transfer]-(t:Account)`\n \n\n**Functions, Expressions Not Allowed:**\n\n**When using accumulators**, functions and expressions cannot be used in the `SELECT` clause. \n\n- \u274c **Functions not allowed:** `SELECT count(s) FROM (s:Account)-[e:transfer]-(t:Account)`\n\n- \u274c **Expressions not allowed:** `SELECT (s.isBlocked OR t.isBlocked) AS flag FROM (s:Account)-[e:transfer]-(t:Account)`\n\n**Local Accumulator Example**\n\n**Definition:**\n\nA **local accumulator** (prefixed with `@`) is associated with a specific vertex and is stored as part of its attributes. When a node is retrieved in a `SELECT` statement, its corresponding local accumulator values are displayed alongside its properties.\n\n**Query:**\n\n```python\nGSQL> use graph financialGraph\nGSQL> BEGIN\nGSQL> LET \nGSQL> SetAccum @transferNames; \nGSQL> IN \nGSQL> SELECT s FROM (s:Account where s.isBlocked) -[:transfer*1..3]- (t:Account) \nGSQL> ACCUM s.@transferNames += t.name;\nGSQL> END\n```\n**Output:**\n```json\n{\n \"version\": {\n \"edition\": \"enterprise\",\n \"api\": \"v2\",\n \"schema\": 0\n },\n \"error\": false,\n \"message\": \"\",\n \"results\": [\n {\n \"Result_Vertex_Set\": [\n {\n \"v_id\": \"Steven\",\n \"v_type\": \"Account\",\n \"attributes\": {\n \"name\": \"Steven\",\n \"isBlocked\": true,\n \"@transferNames\": [\n \"Jenny\",\n \"Paul\",\n \"Scott\",\n \"Steven\",\n \"Ed\"\n ]\n }\n }\n ]\n }\n ]\n}\n```\n\n**Explanation:**\n\n- The **local accumulator** `@transferNames` collects the names of accounts connected to `\"Steven\"` via `transfer` edges (up to 3 hops).\n- In the output, `@transferNames` is stored as runtime attribute, alongside wiht other static attributes `name` and `isBlocked`.\n\n----------\n\n**Global Accumulator Example**\n\n**Definition:**\n\nA **global accumulator** (prefixed with `@@`) aggregates values across all selected nodes and is **printed separately** from node attributes in the query result. All global accumulators declared in the `LET` block will be included in the output.\n\n**Query:**\n```python\nGSQL> use graph financialGraph\nGSQL> BEGIN \nGSQL> LET \nGSQL> double ratio = 0.01; \nGSQL> SumAccum @totalAmt; // local accumulator for total amount \nGSQL> SumAccum @@cnt; // global accumulator for count\nGSQL> IN \nGSQL> SELECT s FROM (s:Account {name:\"Ed\"}) - [e:transfer]-> (t:Account) \nGSQL> ACCUM s.@totalAmt += ratio * e.amount, // Accumulate total amount for s \nGSQL> @@cnt += 1; // Accumulate count of transfers\nGSQL> END\n```\n**Output:**\n```json\n{\n \"version\": {\n \"edition\": \"enterprise\",\n \"api\": \"v2\",\n \"schema\": 0\n },\n \"error\": false,\n \"message\": \"\",\n \"results\": [\n {\n \"Result_Vertex_Set\": [\n {\n \"v_id\": \"Ed\",\n \"v_type\": \"Account\",\n \"attributes\": {\n \"name\": \"Ed\",\n \"isBlocked\": false,\n \"@totalAmt\": 15\n }\n }\n ]\n },\n {\n \"@@cnt\": 1\n }\n ]\n}\n```\n**Explanation:**\n\n- The **local accumulator** `@totalAmt` accumulates the weighted sum of transfer amounts and is **stored as an attribute of \"Ed\"**.\n- The **global accumulator** `@@cnt` counts the total number of transfers and is **printed separately** in the result.\n- This ensures **per-node values are displayed within node attributes, while global aggregates appear in the final output**.\n\n\n\n[Go back to top](#top)\n\n---\n\n# Stored Procedure Query Examples \n\nIn this section, we explain how to write stored procedures. A stored procedure is a named query consisting of a sequence of GSQL query blocks or statements. It is stored in the graph database catalog, installed once using a code generation technique for optimal performance, and can be invoked repeatedly using the 'run query' command or a system-generated REST endpoint URL.\n\nTo create a stored procedure, you can use the following syntax. \n\n```python\nCREATE OR REPLACE QUERY queryName (/*params*/) SYNTAX v3 {\n v1 = Query_block_1; //v1 is a vertex set variable, storing a set of selected vertices\n v2 = Query_block_2; //v2 is a vertex set variable, storing a set of selected vertices\n .\n .\n . \n}\n```\n\nThe query block can be a `Node Pattern`, an `Edge Pattern`, or a `Path Pattern`. We will illustrate each pattern with examples.\n\n## Two Flavors of SELECT\n\nIn GSQL, each query block (SELECT-FROM-WHERE) can be used to generate a vertex set or a table. \n\n- SELECT A Vertex Set Style: if a query block generates a vertex set, we can store the vertex set in a variable, and use the vertex set variable to drive subsequent query blocks composition via pattern matching or set operation. Syntax\n ```python\n V= SELECT s\n FROM pattern\n [WHERE condition];\n ``` \n- SELECT INTO A Table Style: if a query block generates a table, we can output the table. Syntax\n ```python\n SELECT exp1, exp2.. INTO T\n FROM pattern\n [WHERE condition];\n ```\n\nRegardless which style you are choosing, the FROM clause will always specify a pattern. The pattern follows ISO GQL standard syntax, which is a well-designed ASCII art syntax-- `()` represents nodes, and `-[]->` represents edges. \n\nWe show both styles for each pattern class. \n\n---\n## Node Pattern\n### SELECT A Vertex Set Style \nCopy [02_q1a.gsql](./gsql/02_q1a.gsql) to your container. \n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\n//create a query\nCREATE OR REPLACE QUERY q1a () SYNTAX v3 {\n\n // select from a node pattern-- symbolized by (),\n //\":Account\" is the label of the vertex type Account, \"a\" is a binding variable to the matched node. \n // v is a vertex set variable, holding the selected vertex set\n v = SELECT a\n FROM (a:Account);\n\n // output vertex set variable v in JSON format\n PRINT v;\n\n //we can use vertex set variable in the subsequent query block's node pattern.\n //v is placed in the node pattern vertex label position. The result is re-assigned to v. \n v = SELECT a\n FROM (a:v)\n WHERE a.name == \"Scott\";\n\n // output vertex set variable v in JSON format\n PRINT v;\n\n}\n\n# Two methods to run the query. The compiled method gives the best performance. \n\n# Method 1: Run immediately with our interpret engine\ninterpret query q1a()\n\n# Method 2: Compile and install the query as a stored procedure\ninstall query q1a\n\n# run the compiled query\nrun query q1a()\n```\n### SELECT INTO A Table Style\nIf you're familiar with SQL, treat the matched node as a table -- table(a) or table(a.attr1, a.attr2...). You can group by and aggregate on its columns, just like in SQL. Use `SELECT expr1, expr2..` as usual, with the extension `SELECT a` as selecting the graph element a.\n\nCopy [03_q1b.gsql](./gsql/03_q1b.gsql) to your container. \n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q1b () SYNTAX v3 {\n //think the FROM clause as a table (a.attr1, a.attr2,...)\n // you can group by a or its attributes, and do aggregation.\n // \":Account\" is the label of the vertex type, and \"a\" is the\n // vertex type alias, and () symbolize a vertex pattern in ASCII art.\n SELECT a.isBlocked, count(*) INTO T\n FROM (a:Account)\n GROUP BY a.isBlocked;\n\n PRINT T;\n}\n\n# Method 1: Run immediately with our interpret engine\ninterpret query q1b()\n\n# Method 2: Compile and install the query as a stored procedure\ninstall query q1b\n\n# run the compiled query\nrun query q1b()\n```\n\n[Go back to top](#top)\n\n---\n## Edge Pattern \n### SELECT A Vertex Set Style \nCopy [04_q2a.gsql](./gsql/04_q2a.gsql) to your container. \n\n```python\n\nUSE GRAPH financialGraph\n\n# create a query\nCREATE OR REPLACE QUERY q2a (string acctName) SYNTAX v3 {\n\n //Declare a local sum accumulator to add values. Each vertex has its own accumulator of the declared type\n //The vertex instance is selected based on the FROM clause pattern.\n SumAccum @totalTransfer = 0;\n\n // match an edge pattern-- symbolized by ()-[]->(), where () is node, -[]-> is a directed edge\n // \"v\" is a vertex set variable holding the selected vertex set.\n // {name: acctName} is a JSON style filter. It's equivalent to \"a.name == acctName\"\n // \":transfer\" is the label of the edge type \"transfer\". \"e\" is the alias of the matched edge.\n v = SELECT b\n FROM (a:Account {name: acctName})-[e:transfer]->(b:Account)\n //for each matched edge, accumulate e.amount into the local accumulator of b.\n ACCUM b.@totalTransfer += e.amount;\n\n //output each v and their static attribute and runtime accumulators' state\n PRINT v;\n\n}\n\n\n# Two methods to run the query. The compiled method gives the best performance. \n\n# Method 1: Run immediately with our interpret engine\ninterpret query q2a(\"Scott\")\n\n# Method 2: Compile and install the query as a stored procedure\ninstall query q2a\n\n# run the compiled query\nrun query q2a(\"Scott\")\n```\n\n### SELECT INTO A Table Style\nIf you're familiar with SQL, treat the matched edge as a table -- table(a, e, b) or table(a.attr1, a.attr2..., e.attr1, e.attr2...,b.attr1, b.attr2...). You can group by and aggregate on its columns, just like in SQL. Use `SELECT expr1, expr2..` as usual, with the extension \"SELECT a\", \"SELECT e\", \"SELECT b\" as selecting the graph element.\n\nCopy [05_q2b.gsql](./gsql/05_q2b.gsql) to your container. \n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q2b () SYNTAX v3 {\n\n //think the FROM clause is a matched table with columns (a, e, b)\n //you can use SQL syntax to group by the source and target account, and sum the total transfer amount\n SELECT a, b, sum(e.amount) INTO T\n FROM (a:Account)-[e:transfer]->(b:Account)\n GROUP BY a, b;\n\n //output the table in JSON format\n PRINT T;\n\n}\n\n# Two methods to run the query. The compiled method gives the best performance.\n\n# Method 1: Run immediately with our interpret engine\ninterpret query q2b()\n\n# Method 2: Compile and install the query as a stored procedure\ninstall query q2b\n\n# run the compiled query\nrun query q2b()\n```\n\n[Go back to top](#top)\n\n---\n\n## Path Pattern \n\n### SELECT A Vertex Set Style: Fixed Length vs. Variable Length Path Pattern\nCopy [06_q3a.gsql](./gsql/06_q3a.gsql) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE QUERY q3a (datetime low, datetime high, string acctName) SYNTAX v3 {\n\n // a path pattern in ascii art ()-[]->()-[]->(), where alternating node() and edge -[]->.\n // You can also use WHERE clause inside a vertex () or edge-[]->. \n R = SELECT b\n FROM (a:Account WHERE a.name== acctName)-[e:transfer]->()-[e2:transfer]->(b:Account)\n WHERE e.date >= low AND e.date <= high and e.amount >500 and e2.amount>500;\n\n PRINT R;\n\n // below we use variable length path.\n // *1.. means 1 to more steps of the edge type \"transfer\"\n // select the reachable end point and bind it to vertex alias \"b\"\n R = SELECT b\n FROM (a:Account WHERE a.name == acctName)-[:transfer*1..]->(b:Account);\n\n PRINT R;\n\n}\n\n# Two methods to run the query. The compiled method gives the best performance.\n\n# Method 1: Run immediately with our interpret engine\ninterpret query q3a(\"2024-01-01\", \"2024-12-31\", \"Scott\")\n\n# Method 2: Compile and install the query as a stored procedure\ninstall query q3a\n\n# run the compiled query\nrun query q3a(\"2024-01-01\", \"2024-12-31\", \"Scott\")\n```\n\n### SELECT INTO A Table Style: Group By On A Path Table\n\nIf you're familiar with SQL, treat the matched path as a table -- table(a, e, b, e2, c) or unfold their attributes into table(a.attr1, a.attr2..., e.attr1, e.attr2...,b.attr1, b.attr2...). You can group by and aggregate on its columns, just like in SQL. Use `SELECT expr1, expr2..` as usual, with the extension \"SELECT a\", \"SELECT e\", \"SELECT b\" etc. as selecting the graph element.\n\nCopy [07_q3b.gsql](./gsql/07_q3b.gsql) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE QUERY q3b (datetime low, datetime high, string acctName) SYNTAX v3 {\n\n // a path pattern in ascii art () -[]->()-[]->()\n // think the FROM clause is a matched table with columns (a, e, b, e2, c)\n // you can use SQL syntax to group by on the matched table\n // Below query find 2-hop reachable account c from a, and group by the path a, b, c\n // find out how much each hop's total transfer amount.\n SELECT a, b, c, sum(DISTINCT e.amount) AS hop_1_sum, sum(DISTINCT e2.amount) AS hop_2_sum INTO T1\n FROM (a:Account)-[e:transfer]->(b)-[e2:transfer]->(c:Account)\n WHERE e.date >= low AND e.date <= high\n GROUP BY a, b, c;\n\n PRINT T1;\n\n /* below we use variable length path.\n *1.. means 1 to more steps of the edge type \"transfer\"\n select the reachable end point and bind it to vertex alias \"b\"\n note: \n 1. the path has \"shortest path\" semantics. If you have a path that is longer than the shortest,\n we only count the shortest. E.g., scott to scott shortest path length is 4. Any path greater than 4 will\n not be matched.\n 2. we can not put an alias to bind the edge in the variable length part -[:transfer*1..]->, but \n we can bind the end points (a) and (b) in the variable length path, and group by on them.\n */\n SELECT a, b, count(*) AS path_cnt INTO T2\n FROM (a:Account {name: acctName})-[:transfer*1..]->(b:Account)\n GROUP BY a, b;\n\n PRINT T2;\n\n}\n\n# Two methods to run the query. The compiled method gives the best performance.\n\n# Method 1: Run immediately with our interpret engine\ninterpret query q3b(\"2024-01-01\", \"2024-12-31\", \"Scott\")\n\n# Method 2: Compile and install the query as a stored procedure\ninstall query q3b\n\nrun query q3b(\"2024-01-01\", \"2024-12-31\", \"Scott\")\n```\n\n[Go back to top](#top)\n\n---\n\n## Pattern Summary\n\n## Table of Edge Patterns (following ISO GQL Standard Syntax)\n| Orientation | Example | Edge Pattern | \n|------------|---------------|----------------------------|\n| Pointing left | <-[e:transfer]- | <-[alias:type1\\|type2\\|..]- | \n| Pointing right | -[e:transfer]-> | -[alias:type1\\|type2\\|..]-> | \n| undirected | \\~[e:hasPhone]\\~ | \\~[alias:type1\\|type2\\|..]\\~ | \n| Left or undirected | <\\~[e:transfer\\|hasPhone]\\~ | <\\~[alias:type1\\|type2\\|..]\\~ |\n| Right or undirected | \\~[e:transfer\\|hasPhone]\\~> | \\~[alias:type1\\|type2\\|..]\\~> | \n| Left or right | <-[e:transfer]-> | <-[alias:type1\\|type2\\|..]-> | Y |\n| Left, undirected, or right | -[e:transfer\\|hasPhone]- | -[alias:type1\\|type2\\|..]- | \n\n## Variable Length Pattern Quantifier\nWe support two ways to specify repetitions of a pattern. \n\n### GQL Style:\n| Quantifier | Example | Description |\n|------------|---------|--------------|\n| {m,n} | -[:transfer]->{1,2} | between m and n repetitions |\n| {m,} | -[:transfer]->{1,} | m or more repetitions |\n| * | -[:transfer]->* | equivalent to {0,} |\n| + | -[:transfer]->+ | equivalent to {1,} |\n\n\n### GSQL Style: \n| Quantifier | Example | Description |\n|------------|---------|--------------|\n| *m..n | -[:transfer*1..2]-> | beween m and n repetitions |\n| *m.. | -[:transfer*1..]-> | m or more repetitions |\n| *m | -[:transfer*m]-> | equivalent to m..m |\n\n[Go back to top](#top)\n\n---\n# Advanced Topics\n## Accumulators\nGSQL is a Turing-complete graph database query language. One of its key advantages over other graph query languages is its support for accumulators, which can be either global (prefixed with `@@`) or vertex local (prefixed with `@`). \nAccumulators are containers that store a data value, accept inputs, and aggregate these inputs into the stored data value using a binary operation `+=`.\nAn accumulator is used as a state variable in GSQL. Its state is mutable throughout the life cycle of a query.\n\n### Accumulator Operators\nAn accumulator in GSQL supports two operators: assignment (=) and accumulation (+=).\n\n- `=` operator: The assignment operator can be used to reset the state of an accumulator or its current value.\n\n- `+=` operator: The accumulation operator can be used to add new values to the accumulator\u2019s state. Depending on the type of accumulator, different accumulation semantics are applied.\n\n```python\nUSE GRAPH financialGraph\n\n// \"distributed\" key word means this query can be run both on a single node or a cluster of nodes \nCREATE OR REPLACE DISTRIBUTED QUERY q4 (/* parameters */) SYNTAX v3 {\n\n SumAccum @@sum_accum = 0;\n MinAccum @@min_accum = 0;\n MaxAccum @@max_accum = 0;\n AvgAccum @@avg_accum;\n OrAccum @@or_accum = FALSE;\n AndAccum @@and_accum = TRUE;\n ListAccum @@list_accum;\n\n // @@sum_accum will be 3 when printed\n @@sum_accum +=1;\n @@sum_accum +=2;\n PRINT @@sum_accum;\n\n // @@min_accum will be 1 when printed\n @@min_accum +=1;\n @@min_accum +=2;\n PRINT @@min_accum;\n\n // @@max_accum will be 2 when printed\n @@max_accum +=1;\n @@max_accum +=2;\n PRINT @@max_accum;\n\n @@avg_accum +=1;\n @@avg_accum +=2;\n PRINT @@avg_accum;\n\n // @@or_accum will be TRUE when printed\n @@or_accum += TRUE;\n @@or_accum += FALSE;\n PRINT @@or_accum;\n\n // @@and_accum will be FALSE when printed\n @@and_accum += TRUE;\n @@and_accum += FALSE;\n PRINT @@and_accum;\n\n // @@list_accum will be [1,2,3,4] when printed\n @@list_accum += 1;\n @@list_accum += 2;\n @@list_accum += [3,4];\n PRINT @@list_accum;\n\n}\n\n//install the query\ninstall query q4\n\n//run the query\nrun query q4()\n``` \nIn the above example, six different accumulator variables (those with prefix @@) are declared, each with a unique type. Below we explain their semantics and usage.\n\n- `SumAccum` allows user to keep adding INT values\n\n- `MinAccum` keeps the smallest INT number it has seen. As the @@min_accum statements show, we accumulated 1 and 2 to the MinAccum accumulator, and end up with the value 0, as neither of 1 nor 2 is smaller than the initial state value 0.\n\n- `MaxAccum` is the opposite of MinAccum. It returns the MAX INT value it has seen. The max_accum statements accumulate 1 and 2 into it, and end up with the value 2.\n\n- `AvgAccum` keeps the average value it has seen. It returns the AVG INT value it has seen. The avg_accum statements accumulate 1 and 2 into it, and end up with the value 1.5.\n\n- `OrAccum` keeps OR-ing the internal boolean state variable with new boolean variables that accumulate to it. The initial default value is assigned FALSE. We accumulate TRUE and FALSE into it, and end up with the TRUE value.\n\n- `AndAccum` is symmetric to OrAccum. Instead of using OR, it uses the AND accumulation semantics. We accumulate TRUE and FALSE into it, and end up with the FALSE value.\n\n- `ListAccum` keeps appending new integer(s) into its internal list variable. We append 1, 2, and [3,4] to the accumulator, and end up with [1,2,3,4].\n\n[Go back to top](#top)\n\n---\n### Global vs Vertex Attached Accumulator\nAt this point, we have seen that accumulators are special typed variables in GSQL. We are ready to explain their global and local scopes.\n\nGlobal accumulators belong to the entire query. They can be updated anywhere within the query, whether inside or outside a query block. Local accumulators belong to each vertex. The term \"local\" indicates that they are local to the vertex element. These accumulators can only be updated when their owning vertex is accessible within a SELECT-FROM-WHERE-ACCUM query block. To differentiate them, we use specific prefixes in their identifiers when declaring them.\n\n- `@@` is used for declaring global accumulator variables. It is always used stand-alone. E.g @@cnt +=1\n\n- `@` is used for declaring local accumulator variables. It must be used with a vertex alias specified in the FROM clause in a query block. E.g. v.@cnt += 1 where v is a vertex alias specified in a FROM clause of a SELECT-FROM-WHERE query block.\n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q5 (/* parameters */) SYNTAX V3 {\n\n SumAccum @cnt = 0; //local accumulator\n SumAccum @@hasPhoneCnt = 0; //global accumulator\n\n // ~[]~ is an undirected edge.\n S = SELECT a\n FROM (a:Account) ~ [e:hasPhone] ~ (p:Phone)\n WHERE a.isBlocked == FALSE\n ACCUM a.@cnt +=1,\n p.@cnt +=1,\n @@hasPhoneCnt +=1;\n\n PRINT S;\n PRINT @@hasPhoneCnt;\n\n}\n\ninterpret query q5()\n\n```\n\nIn the above example:\n\n- `@cnt` is a local accumulator. Once declared, each vertex alias x specified in a FROM clause can access it in the form x.@cnt. The local accumulator state is mutable by any query block.\n\n- `@@hasPhoneCnt` is a global accumulator.\n\nThe ACCUM clause will execute its statements for each pattern matched in the FROM clause and evaluated as TRUE by the WHERE clause.\n\n**Detailed Explanation:**\n- The `FROM` clause identifies the edge patterns that match Account -[hasPhone]- Phone.\n\n- The `WHERE` clause filters the edge patterns based on the Account.isBlocked attribute.\n\n- The `ACCUM` clause will execute once for each matched pattern instance that passes the WHERE clause.\n\nFor each matching pattern that satisfies the WHERE clause, the following will occur:\n\n- `a.@cnt += 1`\n- `p.@cnt += 1`\n- `@@hasPhoneCnt += 1`\n\nThe accumulator will accumulate based on the accumulator type.\n\n[Go back to top](#top)\n\n---\n### ACCUM vs POST-ACCUM\n\n#### ACCUM\nRunning example. \n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q6 (/* parameters */) SYNTAX V3 {\n\n SumAccum @cnt = 0; //local accumulator\n SumAccum @@hasPhoneCnt = 0; //global accumulator\n\n // ~[]~ is an undirected edge.\n S = SELECT a\n FROM (a:Account) ~ [e:hasPhone] ~ (p:Phone)\n WHERE a.isBlocked == FALSE\n ACCUM a.@cnt +=1,\n p.@cnt +=1,\n @@hasPhoneCnt +=1;\n\n PRINT S;\n PRINT @@hasPhoneCnt;\n\n}\n\ninterpret query q6()\n```\n- `FROM-WHERE` Produces a Binding Table\n \nWe can think of the FROM and WHERE clauses specify a binding table, where the FROM clause specifies the pattern, and the WHERE clause does a post-filter of the matched pattern instances-- the result is a table, each row in the table is a pattern instance with the binding variables specified in the FROM clause as columns. In the above query a2 example, the FROM clause produces a result table (a, e, p) where \u201ca\u201d is the Account variable, \u201ce\u201d is the \u201chasPhone\u201d variable, and \u201cp\u201d is the Phone variable.\n\n- `ACCUM` Process each row independently in the Binding Table\n\nThe `ACCUM` clause executes its statements once for each row in the `FROM-WHERE` binding table. The execution is done in a map-reduce fashion.\n\n**Map-Reduce Interpretation:** The ACCUM clause uses snapshot semantics, executing in two phases:\n\n- **Map Phase:** Each row in the binding table is processed in parallel, applying each statement in the `ACCUM` clause, starting with the same accumulator snapshot as inputs. The snapshot of accumulator values is taken before the start of the ACCUM clause.\n\n- **Reduce Phase:** At the end of the `ACCUM` clause, these Map Phase effect are aggregated into their respective accumulators, creating a new snapshot of accumulator values.\n\nThe new snapshot of accumulate states is available for access after the ACCUM clause.\n\n---\n#### POST-ACCUM\n\nThe optional `POST-ACCUM` clause enables accumulation and other computations across the set of vertices produced by the `FROM-WHERE` binding table. `POST-ACCUM` can be used without `ACCUM`. If it is preceded by an `ACCUM` clause, then its statement can access the new snapshot value of accumulators computed by the `ACCUM` clause.\n\nRunning example. \n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q7 () SYNTAX V3 {\n\n SumAccum @cnt = 0; //local accumulator\n SumAccum @@testCnt1 = 0; //global accumulator\n SumAccum @@testCnt2 = 0; //global accumulator\n\n S = SELECT a\n FROM (a:Account) ~ [e:hasPhone] ~ (p:Phone)\n WHERE a.isBlocked == TRUE\n //a.@cnt snapshot value is 0\n ACCUM a.@cnt +=1, //add 1 to a.@cnt\n @@testCnt1+= a.@cnt //access a.@cnt snapshot value 0\n POST-ACCUM (a) //loop vertex \u201ca\u201d set.\n @@testCnt2 += a.@cnt; //access a.@cnt new snapshot value 1\n\n\n PRINT @@testCnt1;\n PRINT @@testCnt2;\n PRINT S;\n\n}\n\nINTERPRET QUERY q7()\n```\n\n- `POST-ACCUM` Loops A Vertex Set Selected From the Binding Table\n \nThe `POST-ACCUM` clause is designed to do some computation based on a selected vertex set from the binding table. It executes its statements(s) once for each distinct value of a referenced vertex column from the binding table. You can have multiple `POST-ACCUM` clauses. But each `POST-ACCUM` clause can only refer to one vertex alias defined in the `FROM` clause. In query q8, `POST-ACCUM (a)` means we project the vertex \u201ca\u201d column from the binding table, remove the duplicates, and loop through the resulting vertex set.\n\nAnother characteristic of the `POST-ACCUM` clause is that its statements can access the aggregated accumulator value computed in the `ACCUM` clause.\n\nIn query q8, the `POST-ACCUM` statement will loop over the vertex set \u201ca\u201d, and its statement `@@testCnt2+=a.@cnt` will read the updated snapshot value of `a.@cnt`, which is 1.\n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE DISTRIBUTED QUERY q8 () SYNTAX V3 {\n\n SumAccum @@edgeCnt = 0;\n MaxAccum @maxAmount = 0;\n MinAccum @minAmount = 100000;\n\n MaxAccum @@maxSenderAmount = 0;\n MinAccum @@minReceiverAmount = 100000;\n SumAccum @@bCnt = 0;\n SumAccum @@aCnt = 0;\n\n S = SELECT b\n FROM (a:Account) - [e:transfer] -> (b:Account)\n WHERE NOT a.isBlocked\n ACCUM a.@maxAmount += e.amount, //sender max amount\n b.@minAmount += e.amount, //receiver min amount\n @@edgeCnt +=1\n POST-ACCUM (a) @@maxSenderAmount += a.@maxAmount\n POST-ACCUM (b) @@minReceiverAmount += b.@minAmount\n POST-ACCUM (a) @@aCnt +=1\n POST-ACCUM (b) @@bCnt +=1 ;\n\n PRINT @@maxSenderAmount, @@minReceiverAmount;\n PRINT @@edgeCnt, @@aCnt, @@bCnt;\n\n}\n\nINSTALL QUERY q8\n\n\nRUN QUERY q8()\n```\n\nWhen you reference a vertex alias in a `POST-ACCUM` statement, you bind that vertex alias to the `POST-ACCUM` clause implicitly. You can also explicitly bind a vertex alias with a `POST-ACCUM` clause by putting the vertex alias in parentheses immediately after the keyword `POST-ACCUM`. Each `POST-ACCUM` clause must be bound with one and only one vertex alias.\n\nIn query a4(), we have multiple `POST-ACCUM` clauses, each will be looping one selected vertex set.\n\n- `POST-ACCUM (a) @@maxSenderAmount += a.@maxAmount`: In this statement, we loop through the vertex set \"a\", accessing the aggregate result value `a.@maxAmount` from the `ACCUM` clause. We can write the same statement by removing \u201c(a)\u201d: `POST-ACCUM @@maxSenderAmount += a.@maxAmount`. The compiler will infer the `POST-ACCUM` is looping \u201ca\u201d.\n\n- `POST-ACCUM (b) @@minReceiverAmount += b.@minAmount`: In this statement, we loop through the vertex set \u201cb\u201d, accessing the aggregate result value `b.@minAmount`\n\n- `POST-ACCUM (a) @@aCnt +=1`: In this statement, we loop through the vertex set \u201ca\u201d, for each distinct \u201ca\u201d, we increment `@@aCnt`.\n\n- `POST-ACCUM (b) @@bCnt +=1`: in this statement, we loop through the vertex set \u201cb\u201d, for each distinct \u201cb\u201d, we increment `@@bCnt`.\n\nNote that you can only access one vertex alias in a `POST-ACCUM`. Below example is not allowed, as it has two vertex alias (a, b) in `a.@maxAmount` and `b.@maxAmount`, respectively. \n\n```python\n### Example of Incorrect Code \u274c\nPOST-ACCUM @@maxSenderAmount += a.@maxAmount + b.@maxAmount;\n```\n[Go back to top](#top)\n\n---\n### Edge Attached Accumulator\n\nSimilar to attaching accumulator a vertex, you can attach primitive accumulators to an edge instance. \n\nExample. \n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q9 (string acctName) SYNTAX v3 {\n\n OrAccum EDGE @visited;\n\n v = SELECT b\n FROM (a:Account {name: acctName})-[e:transfer]->(b:Account)\n ACCUM e.@visited += TRUE;\n\n v = SELECT b\n FROM (a:Account)-[e:transfer]->(b:Account)\n WHERE NOT e.@visited;\n\n //output each v and their static attribute and runtime accumulators' state\n PRINT v;\n\n}\n\n//it is only supported for single node, or single node mode in distributed enviroment\ninstall query -single q9\nrun query q9(\"Jenny\")\n```\n[Go back to top](#top)\n\n---\n\n## Vertex Set Variables And Accumulators As Composition Tools\n\n**Query Composition** means that one query block's computation result can be used as input to another query block. \n\nUser can use two methods to achieve query composition. \n\n### Using Vertex Set Variables\n\nGSQL query consists of a sequence of query blocks. Each query block will produce a vertex set variable. In top-down syntax order, subsequent query block's `FROM` clause pattern can refer to prior query block's vertex set variable. Thus, achieving query block composition. \n\nHigh level, within the query body brackets, you can define a sequence of connected or unconnected query blocks to make up the query body. Below is the skeleton of a query body.\n\n```python\nCREATE OR REPLACE DISTRIBUTED QUERY q (/* parameters */) SYNTAX V3 {\n // Query body\n\n V1= Query_Block_1;\n\n\n V2= Query_Block_2;\n\n\n V3= Query_Block_3;\n\n .\n .\n .\n\n V_n = Query_Block_n;\n\n PRINT V_i;\n}\n```\n\nA typical GSQL query follows a top-down sequence of query blocks. Each query block generates a vertex set, which can be used by subsequent query blocks to drive pattern matching. For example, \nthe query a5 below achieve query composition via tgtAccnts vertex set variable, where the first SELECT query block compute this variable, and the second SELECT query block uses the variable in its `FROM` clause. \n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE DISTRIBUTED QUERY q10() SYNTAX V3 {\n\n SumAccum @cnt = 0;\n\n //for each blocked account, find its 1-hop-neighbor who has not been blocked.\n tgtAccnts = SELECT y\n FROM (x:Account)- [e:transfer] -> (y:Account)\n WHERE x.isBlocked == TRUE AND y.isBlocked == FALSE\n ACCUM y.@cnt +=1;\n\n // tgtAccnts vertex set drive the query block below. It's placed in the vertex label position.\n tgtPhones = SELECT z\n FROM (x:tgtAccnts) ~ [e:hasPhone] ~ (z:Phone)\n WHERE z.isBlocked\n ACCUM z.@cnt +=1;\n\n PRINT tgtPhones;\n}\n\nINSTALL QUERY q10\n\nRUN QUERY q10()\n```\n[Go back to top](#top)\n\n---\n### Using Accumulators\n \nRecall that vertex-attached accumulator can be accessed in a query block. Across query blocks, if the same vertex is accessed, it's vertex-attached accumulator (a.k.a local accumulator) can be treated as the runtime attribute of the vertex,\neach query block will access the latest value of each vertex's local accumulator, thus achieving composition. \n\nGlobal variable maintains a global state, it can be accessed within query block, or at the same level as a query block. \nFor example, in a6 query below, the first query block accumulate 1 into each `y`'s `@cnt` accumulator, and increment the global accumulator `@@cnt`. In the second query block's `WHERE` clause, we use the `@cnt` and `@@cnt` accumulator, thus achieving composition. \n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE DISTRIBUTED QUERY q11() SYNTAX V3 {\n\n SumAccum @cnt = 0;\n SumAccum @@cnt = 0;\n\n //for each blocked account, find its 1-hop-neighbor who has not been blocked.\n tgtAccnts = SELECT y\n FROM (x:Account)- [e:transfer] -> (y:Account)\n WHERE x.isBlocked == TRUE AND y.isBlocked == FALSE\n ACCUM y.@cnt +=1, @@cnt +=1;\n\n // tgtAccnts vertex set drive the query block below\n tgtPhones = SELECT z\n FROM (x:tgtAccnts)- [e:hasPhone] - (z:Phone)\n WHERE z.isBlocked AND x.@cnt >1 AND @@cnt>0\n ACCUM z.@cnt +=1;\n\n PRINT tgtPhones;\n}\n\n\nINSTALL QUERY q11\n\n\nRUN QUERY q11()\n```\n\n[Go back to top](#top)\n\n---\n## Flow Control\n\n### IF Statement\nThe `IF` statement provides conditional branching: execute a block of statements only if a given condition is true. The `IF` statement allows for zero or more `ELSE-IF` clauses, followed by an optional `ELSE` clause. It is always closed by the `END` keyword.\n\nThe `IF` statement can appear within a query block `ACCUM` or `POST-ACCUM` clause, or at top-statement level-- the same level as the `SELECT` query block.\n\n**Syntax** \n```python\nIF condition1 THEN statement(s)\n ELSE IF condition2 THEN statement(s)\n ...\n ELSE statement(s)\nEND\n```\n**Example**\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE QUERY IfElseTest () SYNTAX V3 {\n\n SumAccum @@isBlocked;\n SumAccum @@unBlocked;\n SumAccum @@others;\n\n S1 = SELECT a\n FROM (a:Account)\n ACCUM\n IF a.isBlocked THEN @@isBlocked += 1\n ELSE IF NOT a.isBlocked THEN @@unBlocked += 1\n ELSE @@others += 1\n END;\n\n PRINT @@isBlocked, @@unBlocked, @@others;\n\n STRING drink = \"Juice\";\n SumAccum @@calories = 0;\n\n //if-else. Top-statement level. Each statement\n //needs to end by a semicolon, including the \u201cEND\u201d.\n\n IF drink == \"Juice\" THEN @@calories += 50;\n ELSE IF drink == \"Soda\" THEN @@calories += 120;\n ELSE @@calories = 0; // Optional else-clause\n END;\n // Since drink = \"Juice\", 50 will be added to calories\n\n PRINT @@calories;\n}\n\nINSTALL QUERY IfElseTest\n\nRUN QUERY IfElseTest()\n```\n[Go back to top](#top)\n\n---\n\n### WHILE Statement\nThe `WHILE` statement provides unbounded iteration over a block of statements. `WHILE` statements can be used at query block level or top-statement level.\n\nThe `WHILE` statement iterates over its body until the condition evaluates to false or until the iteration limit is met. A condition is any expression that evaluates to a boolean. The condition is evaluated before each iteration. `CONTINUE` statements can be used to change the control flow within the while block. `BREAK` statements can be used to exit the while loop.\n\nA `WHILE` statement may have an optional `LIMIT` clause. The `LIMIT` clauses have a constant positive integer value or integer variable to constrain the maximum number of loop iterations.\n\nThe `WHILE` statement can appear within a query block `ACCUM` or `POST-ACCUM` clause, or at top-statement level-- the same level as the `SELECT` query block.\n\n**Syntax** \n```python\nWHILE condition (LIMIT maxIter)? DO\n statement(s)\nEND\n```\n**Example**\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE QUERY WhileTest (VERTEX seed) SYNTAX V3 {\n //mark if a node has been seen\n OrAccum @visited;\n //empty vertex set var\n reachable_vertices = {};\n //declare a visited_vertices var, annotated its type\n //as ANY type so that it can take any vertex\n visited_vertices (ANY) = {seed};\n\n // Loop terminates when all neighbors are visited\n WHILE visited_vertices.size() !=0 DO\n //s is all neighbors of visited_vertices\n //which have not been visited\n visited_vertices = SELECT s\n FROM (:visited_vertices)-[:transfer]->(s)\n WHERE s.@visited == FALSE\n POST-ACCUM\n s.@visited = TRUE;\n\n reachable_vertices = reachable_vertices UNION visited_vertices;\n END;\n\n PRINT reachable_vertices;\n\n //reset vertex set variables\n reachable_vertices = {};\n visited_vertices (ANY) = {seed};\n\n\n //clear the visited flag\n S1 = SELECT s\n FROM (s:Account)\n ACCUM s.@visited = FALSE;\n\n // Loop terminates when condition met or reach 2 iterations\n WHILE visited_vertices.size() !=0 LIMIT 2 DO\n visited_vertices = SELECT s\n FROM (:visited_vertices)-[:transfer]-> (s)\n WHERE s.@visited == FALSE\n POST-ACCUM\n s.@visited = TRUE;\n\n reachable_vertices = reachable_vertices UNION visited_vertices;\n END;\n\n PRINT reachable_vertices;\n}\n\n\nINSTALL QUERY WhileTest\n\nRUN QUERY WhileTest(\"Scott\")\n```\n\n[Go back to top](#top)\n\n---\n### FOREACH Statement\nThe `FOREACH` statement provides bounded iteration over a block of statements.\n\nThe `FOREACH` statement can appear within a query block `ACCUM` or `POST-ACCUM` clause, or at top-statement level-- the same level as the `SELECT` query block.\n\n**Syntax** \n```python\nFOREACH loop_var IN rangeExpr DO\n statements\nEND\n\n//loop_var and rangExpr can be the following forms\nname IN setBagExpr\n(key, value) pair IN setBagExpr // because it\u2019s a Map\nname IN RANGE [ expr, expr ]\nname IN RANGE [ expr, expr ].STEP ( expr )\n```\n**Example**\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE QUERY ForeachTest ( ) SYNTAX V3 {\n\n ListAccum @@listVar = [1, 2, 3];\n SetAccum @@setVar = (1, 2, 3);\n BagAccum @@bagVar = (1, 2, 3);\n\n SetAccum @@set1;\n SetAccum @@set2;\n SetAccum @@set3;\n\n #FOREACH item in collection accumlators variables\n S = SELECT tgt\n FROM (s:Account) -[e:transfer]-> (tgt)\n ACCUM\n @@listVar += e.amount,\n @@setVar += e.amount,\n @@bagVar += e.amount;\n\n PRINT @@listVar, @@setVar, @@bagVar;\n\n //loop element in a list\n FOREACH i IN @@listVar DO\n @@set1 += i;\n END;\n\n //loop element in a set\n FOREACH i IN @@setVar DO\n @@set2 += i;\n END;\n\n //loop element in a bag\n FOREACH i IN @@bagVar DO\n @@set3 += i;\n END;\n\n PRINT @@set1, @@set2, @@set3;\n\n //show step of loop var\n ListAccum @@st;\n FOREACH k IN RANGE[-1,4].STEP(2) DO\n @@st += k;\n END;\n\n PRINT @@st;\n\n ListAccum @@t;\n\n //nested loop:\n // outer loop iterates 0, 1, 2.\n // inner loop iterates 0 to i\n FOREACH i IN RANGE[0, 2] DO\n @@t += i;\n S = SELECT s\n FROM (s:Account)\n WHERE s.name ==\"Scott\"\n ACCUM\n FOREACH j IN RANGE[0, i] DO\n @@t += j\n END;\n END;\n PRINT @@t;\n\n MapAccum @@mapVar, @@mapVarResult;\n S = SELECT s\n FROM (s:Account)\n WHERE s.name ==\"Scott\" OR s.name == \"Jennie\"\n ACCUM @@mapVar += (s.name -> s.isBlocked);\n\n //loop (k,v) pairs of a map\n FOREACH (keyI, valueJ) IN @@mapVar DO\n @@mapVarResult += (keyI -> valueJ);\n END;\n\n PRINT @@mapVar, @@mapVarResult;\n\n}\n\nINSTALL QUERY ForeachTest\n\nRUN QUERY ForeachTest()\n```\n[Go back to top](#top)\n\n---\n### CONTINUE and BREAK Statement\nThe `CONTINUE` and `BREAK` statements can only be used within a block of a `WHILE` or `FOREACH` statement. The `CONTINUE` statement branches control flow to the end of the loop, skipping any remaining statements in the current iteration, and proceeding to the next iteration. That is, everything in the loop block after the `CONTINUE` statement will be skipped, and then the loop will continue as normal. The `BREAK` statement branches control flow out of the loop, i.e., it will exit the loop and stop iteration.\n\n**Example** \n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE QUERY ContinueAndBreakTest ( ) {\n\n //output: 1, 3\n INT i = 0;\n WHILE (i < 3) DO\n i = i + 1;\n IF (i == 2) THEN\n CONTINUE; //go directly to WHILE condition\n END;\n PRINT i;\n END;\n\n //output: 1\n i = 0;\n WHILE (i < 3) DO\n i = i + 1;\n IF (i == 2) THEN\n Break; //jump out of the WHILE loop\n END;\n PRINT i;\n END;\n\n}\n\nINSTALL QUERY ContinueAndBreakTest\n\nRUN QUERY ContinueAndBreakTest()\n```\n[Go back to top](#top)\n\n---\n### CASE WHEN Statement\n\nOne `CASE` statement contains one or more `WHEN-THEN` clauses, each `WHEN` presenting one expression. The `CASE` statement may also have one `ELSE` clause whose statements are executed if none of the preceding conditions are true.\n\nThe `CASE` statement can be used in two different syntaxes: One equivalent to an `IF-ELSE` statement, and the other equivalent to a switch statement.\n\nThe `IF-ELSE` version evaluates the boolean condition within each `WHEN` clause and executes the first block of statements whose condition is true. The optional concluding `ELSE` clause is executed only if all `WHEN` clause conditions are false.\n\nThe switch version evaluates the expression following the keyword `WHEN` and compares its value to the expression immediately following the keyword `CASE`. These expressions do not need to be boolean; the `CASE` statement compares pairs of expressions to see if their values are equal. The first `WHEN-THEN` clause to have an expression value equal to the `CASE` expression value is executed; the remaining clauses are skipped. The optional `ELSE` clause is executed only if no `WHEN` clause expression has a value matching the `CASE` value.\n\nThe `CASE` statement can appear within a query block `ACCUM` or `POST-ACCUM` clause, or at a top-statement level\u2009\u2014\u2009the same level as the `SELECT` query block\n\n**Syntax** \n```python\n//if-else semantics\nCASE\n WHEN condition1 THEN statement(s)\n WHEN condition2 THEN statement(s)\n ...\n ELSE statement(s)\nEND\n\n//or switch semantics\nCASE expr\n WHEN constant1 THEN statement(s)\n WHEN constant2 THEN statement(s)\n ...\n ELSE statement(s)\nEND\n```\n**Example**\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE QUERY CaseWhenTest () SYNTAX V3{\n\n SumAccum @@isBlocked;\n SumAccum @@unBlocked;\n SumAccum @@others;\n\n SumAccum @@isBlocked2;\n SumAccum @@unBlocked2;\n SumAccum @@others2;\n\n\n //case-when in a query block\n S1 = SELECT a\n FROM (a:Account)\n ACCUM\n //if-else semantic: within query block, statement\n //does not need a semicolon to end.\n CASE\n WHEN a.isBlocked THEN @@isBlocked += 1\n WHEN NOT a.isBlocked THEN @@unBlocked += 1\n ELSE @@others += 1\n END;\n\n\n PRINT @@isBlocked, @@unBlocked, @@others;\n\n S2 = SELECT a\n FROM (a:Account)\n ACCUM\n //switch semantic: within query block, statement\n //does not need a semicolon to end.\n CASE a.isBlocked\n WHEN TRUE THEN @@isBlocked2 += 1\n WHEN FALSE THEN @@unBlocked2 += 1\n ELSE @@others2 += 1\n END;\n\n PRINT @@isBlocked2, @@unBlocked2, @@others2;\n\n STRING drink = \"Juice\";\n SumAccum @@calories = 0;\n\n //if-else version. Top-statement level. Each statement\n //needs to end by a semicolon, including the \u201cEND\u201d.\n CASE\n WHEN drink == \"Juice\" THEN @@calories += 50;\n WHEN drink == \"Soda\" THEN @@calories += 120;\n ELSE @@calories = 0; // Optional else-clause\n END;\n // Since drink = \"Juice\", 50 will be added to calories\n\n //switch version. Top-statement level. Each statement\n //needs to end by a semicolon, including the \u201cEND\u201d.\n CASE drink\n WHEN \"Juice\" THEN @@calories += 50;\n WHEN \"Soda\" THEN @@calories += 120;\n ELSE @@calories = 0; // Optional else-clause\n END;\n\n PRINT @@calories;\n}\n\nINSTALL QUERY CaseWhenTest\n\nRUN QUERY CaseWhenTest()\n```\n\n[Go back to top](#top)\n\n---\n## DML\n\n### Update Attribute \nYou can directly update the graph element's attribute in the ACCUM and POST-ACCUM clause by directly assign their attribute with a new value. \n\n**Example**\n```python\nuse graph financialGraph\n/*\n* Update graph element attribute by direct assignment\n* Since GSQL stored procedure has snapshot semantics. The update will\n* only be seen after the query is fully executed.\n*\n*/\nCREATE OR REPLACE QUERY updateAttribute () SYNTAX v3 {\n\n v1 = SELECT a\n FROM (a:Account)-[e:transfer]->(b:Account)\n WHERE a.name = \"Scott\";\n\n PRINT v1;\n\n v2 = SELECT a\n FROM (a:Account)-[e:transfer]->(b:Account)\n WHERE a.name = \"Scott\"\n ACCUM e.amount = e.amount+1 //increment amount for each edge\n POST-ACCUM (a)\n //change isBlocked from FALSE to TRUE of \"Scott\" node\n CASE WHEN NOT a.isBlocked THEN a.isBlocked = TRUE END;\n}\n\n#compile and install the query as a stored procedure\ninstall query updateAttribute\n\n#run the query\nrun query updateAttribute()\n\n//check \"Scott\" isBloked attribute value has been changed to \"TRUE\"\nselect a from (a:Account) where a.name = \"Scott\"\n\n//check \"Scott\" transfer edges' amount value has been incremented\nselect e from (a:Account)-[e:transfer]->(t) where a.name = \"Scott\"\n```\n\n### Insert Edge\nYou can use `INSERT` statement to insert edges in ACCUM clause. \n\n**Example**\n\n```python\nuse graph financialGraph\n\n/*\n* Insert an edge by insert statement in ACCUM\n* Since GSQL stored procedure has snapshot semantics. The update will\n* only be seen after the query is fully executed.\n*\n*/\n\nCREATE OR REPLACE QUERY insertEdge() SYNTAX v3 {\n\n DATETIME date = now();\n v1 = SELECT a\n FROM (a:Account)-[e:transfer]->()-[e2:transfer]->(t)\n WHERE a.name = \"Scott\"\n ACCUM\n INSERT INTO transfer VALUES (a.name, t.name, date, 10);\n\n}\n\n#compile and install the query as a stored procedure\ninstall query insertEdge\n\n#run the query\nrun query insertEdge()\n\n//see a new edge between \"Scott\" and \"Paul\" is inserted\nselect e from (a:Account)-[e:transfer]->(t) where a.name=\"Scott\"\n```\n\n[Go back to top](#top)\n\n### Delete Element\nYou can use the `DELETE` statement to delete graph element.\n\n**Example**\n\n```python\nuse graph financialGraph\n\n/*\n* Delete a graph element by DELETE statement\n* Since GSQL stored procedure has snapshot semantics. The update will\n* only be seen after the query is fully executed.\n*\n*/\nCREATE OR REPLACE QUERY deleteElement() SYNTAX v3 {\n\n DELETE a FROM (a:Account)\n WHERE a.name = \"Scott\";\n\n DELETE e FROM (a:Account)-[e:transfer]->(t)\n WHERE a.name = \"Jenny\";\n}\n\n#compile and install the query as a stored procedure\ninstall query deleteElement\n\n#run the query\nrun query deleteElement()\n```\n\nAfter the above query is run, you can query the latest graph. \n\n```python\nselect s from (s:Account) where s.name = \"Scott\"\nselect s, t, e from (s:Account)-[e:transfer]-(t) where s.name = \"Jenny\"\n```\n\nYou can also use DELETE() to delete graph element. \n\n```python\nuse graph financialGraph\n\n/*\n* Delete a graph element by DELETE()\n* Since GSQL stored procedure has snapshot semantics. The update will\n* only be seen after the query is fully executed.\n*\n*/\nCREATE OR REPLACE QUERY deleteElement2() SYNTAX v3 {\n\n v = SELECT a\n FROM (a:Account)\n WHERE a.name = \"Paul\"\n ACCUM DELETE(a); //delete a vertex\n\n v = SELECT a\n FROM (a:Account)-[e:transfer]-(t)\n WHERE a.name = \"Ed\"\n ACCUM DELETE(e); //delete matched edges\n}\n\n\n#compile and install the query as a stored procedure\ninstall query deleteElement2\n\n#run the query\nrun query deleteElement2()\n```\n\n[Go back to top](#top)\n\n---\n\n## Vertex Set Operators\n\n### Union\nThe `UNION` operator in GSQL is used to combine two or more sets into a single result set. It removes duplicate elements from the input sets. The set could be vertex set or some type set.\n\n**Example**\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE QUERY unionTest () SYNTAX V3 {\n S1 = SELECT s\n FROM (s:Phone)\n WHERE s.number == \"111\" OR s.number == \"222\";\n\n //show what's in S1\n PRINT S1[S1.number];\n\n S2 = SELECT s\n FROM (s:Phone)\n WHERE s.number == \"222\";\n\n //show what's in S2\n PRINT S2[S2.number];\n\n S3 = S1 UNION S2;\n\n //show what's in S3\n PRINT S3[S3.number];\n\n S4 = SELECT c\n FROM (c:City);\n\n S5 = S3 UNION S4;\n\n //show what's in S5\n PRINT S5[S5.number];\n\n}\n```\n[Go back to top](#top)\n\n---\n### Intersect\nThe `INTERSECT` operator in GSQL is used to return the common vertices between two vertex sets. It only returns the vertices that are present in both vertex sets.\n\n**Example**\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE QUERY intersectTest () SYNTAX V3{\n S1 = SELECT s\n FROM (s:Phone)\n WHERE s.number == \"111\" OR s.number == \"222\";\n\n //show what's in S1\n PRINT S1[S1.number];\n\n S2 = SELECT s\n FROM (s:Phone)\n WHERE s.number == \"222\";\n\n //show what's in S2\n PRINT S2[S2.number];\n\n S3 = S1 INTERSECT S2;\n\n //show what's in S3\n PRINT S3[S3.number];\n\n}\n```\n[Go back to top](#top)\n\n---\n### Minus\nThe `MINUS` operator in GSQL is used to return the difference between two vertex sets. It essentially subtracts one vertex set from the other, returning only the vertices that are present in the first vertex set but not in the second.\n\n**Example**\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY minusTest () SYNTAX V3 {\n S1 = SELECT s\n FROM (s:Phone)\n WHERE s.number == \"111\" OR s.number == \"222\";\n\n //show what's in S1\n PRINT S1[S1.number];\n\n S2 = SELECT s\n FROM (s:Phone)\n WHERE s.number == \"222\";\n\n //show what's in S2\n PRINT S2[S2.number];\n\n S3 = S1 MINUS S2;\n\n //show what's in S3\n PRINT S3[S3.number];\n\n}\n```\n[Go back to top](#top)\n\n---\n## Vector Search\nTigerGraph has extended the vertex type to support vectors, enabling users to query both structured data (nodes and edges) and unstructured data (embedding attributes) in GSQL.\nFor more details on vector support, refer to [Vector Search](https://github.com/tigergraph/ecosys/blob/master/tutorials/VectorSearch.md)\n\n---\n## OpenCypher Query\n\nTigerGraph also supports OpenCypher. For more details on querying using OpenCypher, please refer to [OpenCypher Tutorial](https://github.com/tigergraph/ecosys/blob/master/tutorials/Cypher.md)\n\n---\n## REST API For GSQL\n\nTigerGraph provides seamless interaction with the GSQL server through a comprehensive suite of [GSQL REST APIs](https://docs.tigergraph.com/gsql-ref/4.1/api/gsql-endpoints#_run_query). Below, we demonstrate how to invoke an installed stored procedure via a REST call, passing parameters using a JSON object.\n\n### Parameter JSON object\nTo pass query parameters by name with a JSON object, map the parameter names to their values in a JSON object enclosed in parentheses. Parameters that are not named in the JSON object will keep their default values for the execution of the query.\n\nFor example, if we have the following query:\n\n```python\nUSE GRAPH financialGraph\n\nCREATE QUERY greet_person(INT age = 3, STRING name = \"John\",\n DATETIME birthday = to_datetime(\"2019-02-19 19:19:19\"))\n{\n PRINT age, name, birthday;\n}\n\nINSTALL QUERY greet_person\nRUN QUERY greet_person( {\"name\": \"Emma\", \"age\": 21} )\n\n//During installation, you will the see generated REST end point for this query. You can call the query via REST API.\n//Supplying the parameters with a JSON object will look like the following. The parameter birthday is not named in the parameter JSON object and therefore takes the default value\ncurl -u \"tigergraph:tigergraph\" -H 'Content-Type: application/json' -X POST 'http://127.0.0.1:14240/gsql/v1/queries/greet_person?graph=financialGraph' -d '{\"diagnose\":false,\"denseMode\":false,\"allvertex\":false,\"asyncRun\":false,\"parameters\":{\"name\":\"Emma\",\"age\":21}}' | jq .\n```\n\nThe above example use \"username:password\" as an authentication method. There are token-based authentication methods. Please refer to [Enable REST Authentication](https://docs.tigergraph.com/tigergraph-server/4.1/user-access/enabling-user-authentication#_enable_restpp_authentication) \n\nAnother example-- find the shortest path between two vertices, and output one such path. \n\n```python\n\n/*\nThis algorithm is to find and return only the first full path between two vertices\n\n Parameters:\n v_source: source vertex\n target_v: target vertex\n depth: maxmium path length\n print_results: print JSON output\n */\nuse graph financialGraph\n\nCREATE OR REPLACE DISTRIBUTED QUERY first_shortest_path(VERTEX v_source, VERTEX target_v, INT depth =8, BOOL print_results = TRUE ) SYNTAX v3 {\n\n OrAccum @end_point, @visited, @@hit= FALSE;\n ListAccum @path_list; // the first list of vertices out of many paths\n ListAccum @@first_full_path;\n\n // 1. mark the target node as true\n endset = {target_v};\n endset = SELECT s\n From (s:endset)\n POST-ACCUM s.@end_point = true;\n\n // 2. start from the initial node, save the node to the patt_list, and find all nodes connected through the given name\n Source = {v_source};\n Source = SELECT s\n FROM (s:Source)\n POST-ACCUM s.@path_list = s, s.@visited = true;\n\n WHILE Source.size() > 0 AND NOT @@hit LIMIT depth DO\n Source = SELECT t\n FROM (s:Source) -[e]-> (t)\n WHERE t.@visited == false\n ACCUM\n t.@path_list = s.@path_list\n POST-ACCUM s.@path_list.clear()\n POST-ACCUM t.@path_list += t,\n t.@visited = true,\n IF t.@end_point ==TRUE THEN\n @@first_full_path += t.@path_list,\n @@hit += TRUE\n END;\n END;\n\n // 3. return the final result\n IF print_results THEN\n PRINT @@first_full_path as path;\n END;\n}\n\ninstall query first_shortest_path\nRUN QUERY first_shortest_path( {\"v_source\": {\"id\": \"Scott\", \"type\": \"Account\"}, \"target_v\": {\"id\": \"Steven\", \"type\": \"Account\"}, \"depth\": 8, \"print_result\": true})\n\n//we can also use JSON payload to pass in parameters via REST API call. We need to specify the v_source.type and the target_v.type in the JSON payload. \n curl -u \"tigergraph:tigergraph\" -H 'Content-Type: application/json' -X POST 'http://127.0.0.1:14240/gsql/v1/queries/first_shortest_path?graph=financialGraph' -d '{\n \"diagnose\":false,\n \"denseMode\":false,\n \"allvertex\":false,\n \"asyncRun\":false,\n \"parameters\":{\n \"v_source\": \"Scott\",\n \"v_source.type\": \"Account\",\n \"target_v\": \"Steven\",\n \"target_v.type\": \"Account\",\n \"depth\": 8,\n \"print_result\": true\n }\n }' | jq .\n```\n\n## Virtual Edge\nIn a graph schema, vertex and edge types define the data model at design time. However, at query time, users often perform repetitive multi-step traversals between connected vertices, which can be cumbersome and inefficient. To address this, we are introducing the Virtual Edge feature\u2014 lightweight, in-memory edges dynamically created at query runtime, and discarded upon query completion. Virtual Edges simplify traversal and enable predicate application across non-adjacent vertices, significantly improving query efficiency and flexibility.\n\n\nFor example, in the above graph we create an in-memory \u201cshortcut\u201d edge \u201cFOF\u201d, so that we can bypass the interim node and find the second neighbor in 1-hop. \n### Syntax\n```Python\nCREATE DIRECTED|UNDIRECTED VIRTUAL EDGE Virtual_Edge_Type_Name \"(\"\n FROM Vertex_Type_Name (\"|\" Vertex_Type_Name)* \",\"\n TO Vertex_Type_Name (\"|\" Vertex_Type_Name)*\n [\",\" attribute_name type [DEFAULT default_value]]* \")\"\n```\n\n### Example\nCurrently, to use virtual edge we must\n\n- Use the \"DISTRIBUTED\" keyword to use the virtual edge feature. It can run on both single node machine and multi-node cluster. \n- Use Syntax v2 of GSQL.\n- It can be run on TG cloud Savanna Read-only workspace.\n\n```Python\n#enter the graph\nUSE GRAPH financialGraph\n\n# create a query\nCREATE OR REPLACE DISTRIBUTED QUERY VirtualEdgeQuery () SYNTAX v2 {\n\n \n // First we create a virtual edge type\n CREATE DIRECTED VIRTUAL EDGE VirtualE1(FROM City, TO Phone, ts datetime);\n\n\n //Insert into the virtual edge of connected Phone and City. \n v = SELECT c\n FROM Phone:b - (hasPhone)- Account -(isLocatedIn>)-City:c\n ACCUM INSERT INTO VirtualE1 VALUES(c, b, to_datetime(\"2025-02-13\"));\n\n\n ListAccum @@result;\n // traverse the virtual edge (shortcut) computed by the prior query block,\n // store them in a ListAccum, and output in the end.\n v = SELECT p\n FROM City:c -(VirtualE1>)- Phone:p\n ACCUM @@result += c.name + \"->\" + to_string(p.nmuber);\n\n //output all virtual edges\n PRINT @@result;\n\n}\n\ninstall query VirtualEdgeQuery\nrun query VirtualEdgeQuery()\n\n```\n\nTo see more how to use this feature, you can refer to [Virtual Edge](https://docs.tigergraph.com/gsql-ref/4.1/querying/data-types#_virtual_edge)\n\n[Go back to top](#top)\n\n---\n## Query Tuning And Debug\n### Batch Processing to Avoid OOM\nSometimes, you start with a set of vertices, referred to as the Seed set. Each vertex in the Seed set will traverse the graph, performing the same operation. If this process consumes too much memory, a divide-and-conquer approach can help prevent out-of-memory errors.\n\nIn the example below, we partition the Seed set into 1000 batches. To select vertices for each batch, we use the condition mod 1000 == batch_number. This groups vertices based on their remainder when divided by 1000.\n\n```python\nCREATE OR REPLACE QUERY BatchCount (INT batch_num) SYNTAX v3 {\n SumAccum @@count;\n batch1 = SELECT s\n FROM (s:Account)\n WHERE getvid(s) % 1000 == batch_num; //only select all vertices that mod 1000 == batch_num\n\n // 1000 is how many batch you will have. You can adjust the batch number to balance performance and memory usage\n tmp = SELECT a1\n FROM (a1:batch1)-[:transfer]->(b1:Account)-[:transfer]->(a2:Account)-[:transfer]->(b2:batch1)\n WHERE a1.name != a2.name AND b1.name != b2.name\n ACCUM @@count +=1;\n \n PRINT @@count;\n}\n```\nYou can use a Shell script to invoke the above query with each batch id. \n\n```bash\n#!/bin/bash\n\n# Loop from 0 to 999\nfor i in {0...999}\ndo\n # Execute the curl command with the current batch_number\n curl -X GET -H \"GSQL-TIMEOUT: 500000\" \"http://127.0.0.1:9000/query/financialGraph/BatchCount?batch_number=$i\"\ndone\n```\n[Go back to top](#top)\n\n---\n### Debug Using PRINT Statement\n\nWe have shown many examples in this document using PRINT. You can refer to the official document on [PRINT](https://docs.tigergraph.com/gsql-ref/4.2/querying/output-statements-and-file-objects#_print_statement_api_v2).\n\n[Go back to top](#top)\n\n---\n\n\n### Debug Using LOG Statement\nThe LOG statement is another method for outputting debug data. It functions as a command that writes information to a log file. The statement first evaluates the boolean condition, and if it is true, it outputs the remaining expressions to the log file. The syntax is as follows:\n\n```python\nlogStmt := LOG \"(\" condition \",\" expression* \")\"\n```\n\nE.g., \n```python\nLOG(true, \"hello world\", 1+2); //it will print \"hello world\" and \"3\" to the compute engine log.\n```\nThe `LOG` statement can appear within any query block or as a standalone statement in the query body. Here is an example:\n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY logTest (VERTEX seed) SYNTAX V3 {\n //mark if a node has been seen\n OrAccum @visited;\n //empty vertex set var\n reachable_vertices = {};\n //declare a visited_vertices var, annotated its type\n //as ANY type so that it can take any vertex\n visited_vertices = {seed};\n\n // Loop terminates when all neighbors are visited\n WHILE visited_vertices.size() !=0 DO\n //s is all neighbors of visited_vertices\n //which have not been visited\n visited_vertices = SELECT s\n FROM (:visited_vertices)-[:transfer]->(s)\n WHERE s.@visited == FALSE\n POST-ACCUM\n s.@visited = TRUE, log(true, \"s.@visited\", s, s.@visited);//log statement in the POST-ACCUM clause\n\n log (true, \"while loop\", visited_vertices.size()); //a standalone log statement \n\n reachable_vertices = reachable_vertices UNION visited_vertices;\n END;\n\n\n PRINT reachable_vertices;\n}\n\n\nINSTALL QUERY logTest\n\nRUN QUERY logTest(\"Scott\")\n```\n\nAfter you add log statement, you can check the log. Under bash command line, find the log location.\n\n```python\n gadmin log gpe #find the location of the compute engine log\n```\n\nYou may see \n\n```python\nGPE : /home/tigergraph/tigergraph/log/gpe/GPE_1#1.out\nGPE : /home/tigergraph/tigergraph/log/gpe/log.INFO\n```\nNext, you can use vim or any bash command line editor to open the log to search the log.INFO file. E.g., search \"while loop\" in our example. \n\n```python\nvim /home/tigergraph/tigergraph/log/gpe/log.INFO\n```\n\n[Go back to top](#top)\n\n---\n\n## Explore Catalog\n\n### Global Scope vs. Graph Scope\nA database uses Data Definition Language (DDL) to create and modify schema objects. Schema objects include vertex types, edge types, graph types, etc. These schema objects reside in the metadata store, known as the Catalog.\nEach schema object is visible within a scope. In a graph database, there are two scopes:\n\n- ***Global Scope***: This is the default scope for schema objects. By default, all objects created using the \"CREATE\" DDL statement belong to the global scope.\n- ***Graph Scope***: Each graph has its own scope. A schema change job can be used to add schema objects (vertex or edge types) to a specific graph\u2019s scope.\n\n\n\nAs illustrated in above figure, we can use `CREATE` statements to create the `Account`, `City`, and `Phone` vertex schema object, and the `isLocatedIn`, `hasPhone`, `Transfer`, and `Transfer_Reverse` edge schema object, and the `financialGraph` graph schema object. They are all visible in the Global scope.\n\nTo enter the global scope, type the `use global` command in the GSQL shell. Next, use the `ls` command to list all the schema objects under this scope.\n\n```python\n> USE GLOBAL\n> ls\n```\n\nThe figure above shows that the `financialGraph` is composed of global scope schema objects such as `Account`, `City`, `Phone`, `isLocatedIn`, and `hasPhone` etc. The `privateGraph` also uses the global schema object `Phone`. Thus, both the `financialGraph` and the `privateGraph` share the `Phone` schema object. Additionally, the `privateGraph` has its own private schema objects\u2014 `Loan` and `Person` vertex objects.\n\nTo enter a graph scope, type the `USE GRAPH` graphName command in the GSQL shell. Then, use the `ls` command to list all the schema objects under this graph scope.\n\n```python\n> USE GRAPH financialGraph\n> ls\n```\n\nTo see how to do schema change at global or local level, please refer to [Modify a Graph Schema](https://docs.tigergraph.com/gsql-ref/4.1/ddl-and-loading/modifying-a-graph-schema)\n\n---\n### SHOW - View Parts of the Catalog\nThe `SHOW` command can be used to show certain catalog objects, instead of manually filtering through the entire scope when using the ls command. You can either type the exact identifier or use regular expressions / Linux globbing to search.\n\nThe syntax is \n\n```python\nSHOW | | | | | [ | | -r ]\n```\n\n`SHOW GRAPH graphName` lists vertices and edges without giving their properties. `SHOW VERTEX vertexName` and `SHOW EDGE edgeName` list the properties of the desired vertex or edge respectively. `SHOW PACKAGE packageName` lists the packages that are available, such as the packaged template queries. \n\nThis feature supports the ? and * from linux globbing operations, and also regular expression matching. Usage of the feature is limited to the scope of the graph the user is currently in - if you are using a global graph, you will not be able to see vertices that are not included in your current graph.\n\nBelow are some examples to inspect object in the catalog. \n\n```python\n//show what's in financialGraph scope\nuse graph financialGraph\nls\n\n//list what's in global scope\nuse GLOBAL\nls\n\n//show vertex types\nSHOW VERTEX Acc* //shows all vertex types that start with the letters \"Acc\"\nSHOW VERTEX Ac?*t //shows the vertext types that starts with \"Ac\" and ends with \"t\"\nSHOW VERTEX ????? //shows all vertices that are 5 letters long\n\n\n//show query c1 content\nUSE GRAPH financialGraph\nLS\nSHOW QUERY c1\n\n```\n---\n# Experimental Features\nWe also provide relational table and related operators on tables such as join etc. These experimental features are available starting from 4.1.2. They are working on *single machine* and *compiled mode* only.\n\n## Table \nIn GSQL, the TABLE is used to define intermediate or temporary tables that store query results during execution. These tables are not persistent and exist only within the scope of a query. They help structure and organize data before producing the final result.\n\n### SELECT INTO TABLE statement\nThe `SELECT INTO TABLE` statement in GSQL is used to retrieve data and store it into a new table. This allows you to perform queries and store results for further operations or transformations.\n\n#### Syntax\n\n```python\nSELECT column1 AS alias1, column2 AS alias2, ... INTO newTable\nFROM pattern \n[WHERE condition] \n[HAVING condition] \n[ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...] \n[LIMIT number] \n[OFFSET number]\n;\n```\n- `column1, column2, ...`: Specifies the columns to retrieve.\n- `AS alias1, AS alias2...`: specifies column alias of the selected column. \n- `newTable`: The name of the new table where the query results will be stored.\n- `pattern`: Defines the data pattern, which can be a node, relationship, or a linear path.\n- `WHERE condition`: Optional. Used to filter rows that satisfies a specific condition.\n- `HAVING condition`: Optional. Used to filter the aggregated results.\n- `ORDER BY`: Optional. Specifies how the results should be sorted (either `ASC` for ascending or `DESC` for descending).\n- `LIMIT`: Optional. Limits the number of rows returned.\n- `OFFSET`: Optional. Specifies the number of rows to skip.\n\n#### Example\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY selectExample2() SYNTAX v3 {\n SELECT s.name AS acct, SUM(e.amount) AS totalAmt INTO T1\n FROM (s:Account)- [e:transfer]-> (t:Account)\n WHERE not s.isBlocked\n HAVING totalAmt > 1000\n ORDER BY totalAmt DESC\n LIMIT 5 OFFSET 0\n ;\n\n PRINT T1;\n}\n\ninstall query selectExample2\nrun query selectExample2()\n```\n#### Explanation\n\n- In this example, we are retrieving data from the `Account` nodes `s` and `t` connected by the `transfer` relationship.\n- The query aggregates the `amount` of the transfer and stores the result into a new table `T1`.\n- The `HAVING` clause filters the results to only include those with a `totalAmt` greater than 1000, and the results are ordered by `totalAmt` in descending order, limiting the output to the top 5 entries.\n- Finally, the content of table `T1` is printed.\n \n[Go back to top](#top)\n\n---\n## Init Table Statement\nThe `INIT` statement in GSQL is used to initialize a table row containing constant values. This allows you to create a table with predefined constant values, which can be used for further operations or queries.\n\n#### Syntax\n\n```python\nINIT tableName value1 AS column1, value2 AS column2, ...;\n```\n- `tableName`: The name of the table to be initialized.\n- `value1, value2, ...`: The constant values to be assigned to the table.\n- `column1, column2, ...`: The names of the columns in the table that correspond to the values.\n\n---\n\n#### Example\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY initExample(int intVal = 10) syntax v3{\n // Initialize a table with different constant values.\n INIT T1 1 as col_1, true as col_2, [0.1, -1.1] as col_3;\n PRINT T1;\n\n // Initialize a table with variables and function calls\n DATETIME date = now();\n INIT T2 date as col_1, intVal as col_2, SPLIT(\"a,b,c\", \",\") as col_3;\n PRINT T2;\n\n}\n\ninstall query initExample\nrun query initExample()\n```\n\n#### Explanation\n\n- **Table Initialization with Constant Values**: The first `INIT` statement creates table `T1` with three columns (`col_1`, `col_2`, `col_3`). The values `1`, `true`, and `[0.1, -1.1]` are assigned to `col_1`, `col_2`, and `col_3`\n- **Table Initialization with Variables**: The second `INIT` statement initializes table `T2`. It uses the current date and time (`now()`) for `col_1`, the input variable `intVal` for `col_2`, and splits a string `\"a,b,c\"` into an array and assigns it to `col_3`.\n\n[Go back to top](#top)\n\n---\n## Order Table Statement\nThe `ORDER` statement in GSQL is used to sort tables based on one or more columns, with the optional `LIMIT` and `OFFSET` clauses. This allows efficient ordering and retrieval of a subset of data.\n\n#### Syntax\n\n```python\nORDER tableName BY column1 [ASC|DESC], column2 [ASC|DESC] ... LIMIT number OFFSET number;\n```\n- `tableName`: The table to be ordered.\n- `column1, column2, ...`: Columns to sort by.\n- `ASC | DESC`: Sorting order (ascending by default).\n- `LIMIT number`: Restricts the number of rows in the result.\n- `OFFSET number`: Skips a number of rows before returning results.\n\n---\n\n#### Example\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY orderExample(INT page=1) syntax v3{\n\n SELECT s.name as acct, max(e.amount) as maxTransferAmt INTO T1\n FROM (s:Account)- [e:transfer]-> (t:Account)\n ;\n\n ORDER T1 BY maxTransferAmt DESC, acct LIMIT 3 OFFSET 1 * page;\n\n PRINT T1;\n}\n\ninstall query orderExample\nrun query orderExample()\n```\n\n#### Explanation\n - Selects account names and their maximum transfer amounts.\n - Sorts by `maxTransferAmt` (descending) and `acct` (ascending).\n - Returns 3 rows per page, skipping first `page` rows for pagination.\n\n[Go back to top](#top)\n\n---\n## Filter Table Statement\nThe `FILTER` statement works by applying the condition to the specified `target_table`, modifying its rows based on the logical expression provided. Filters are applied sequentially, so each subsequent filter operates on the results of the previous one.\n\n#### Syntax\n\n```python\nFILTER ON ;\n```\n\n#### Example\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY filterExample() SYNTAX v3 {\n SELECT s.name as srcAccount, e.amount as amt, t.name as tgtAccount INTO T\n FROM (s:Account) - [e:transfer]-> (t)\n ;\n\n FILTER T ON srcAccount == \"Scott\" OR amt > 10000;\n\n PRINT T;\n\n FILTER T ON srcAccount != \"Scott\";\n\n PRINT T;\n}\n\ninstall query filterExample\nrun query filterExample()\n```\n\n#### Explanation\n\n- The first `FILTER` statement retains rows where `srcAccount` is \"Scott\" or `amt` is greater than `10000`.\n- The second `FILTER` statement removes rows where `srcAccount` is \"Scott\".\n- The `PRINT` statements display the intermediate and final results after filtering.\n\n[Go back to top](#top)\n\n---\n## Project Table Statement\n\nThe `PROJECT` statement reshapes a table by creating new derived columns based on selected expressions. These columns can result from arithmetic operations, string concatenation, or logical conditions. The `PROJECT` statement is particularly useful for preparing data for further analysis without altering the original table.\n\n#### Syntax\n\n **1. Transforming Table Data**\n ```python\nPROJECT tableName ON\n columnExpression AS newColumnName,\n ...\nINTO newTableName;\n```\n\n **2. Extracting Vertex Sets**\n \n Converts a table column containing vertex objects into a vertex set.\n \n ```python\n PROJECT tableName ON VERTEX COLUMN vertexColumnName INTO VSET;\n ```\n\n\n#### Example 1: Transforming Table Data\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY projectExample() syntax v3{\n SELECT s.name as srcAccount, p.number as phoneNumber, sum(e.amount) as amt INTO T1\n FROM (s:Account {name: \"Scott\"}) - [e:transfer]-> (t),\n (s) - [:hasPhone]- (p);\n\n PRINT T1;\n\n PROJECT T1 ON\n T1.srcAccount + \":\" + T1.phoneNumber as acct,\n T1.amt * 2 as doubleAmt,\n T1.amt % 7 as mod7Amt,\n T1.amt > 10000 as flag\n INTO T2;\n\n PRINT T2;\n}\n\ninstall query projectExample\nrun query projectExample()\n```\n\n#### Explanation\nThe `PROJECT` statement transforms the data by adding new calculated columns:\n\n - `acct`: Concatenates the account name (`srcAccount`) with the phone number (`phoneNumber`) into a single string.\n - `doubleAmt`: Doubles the value of `amt`.\n - `mod7Amt`: Computes the remainder when `amt` is divided by 7.\n - `flag`: Creates a boolean flag indicating whether `amt` is greater than `10,000`.\n\nThe `PROJECT` statement does not modify the original table (`T1`) but instead creates a new table (`T2`) with the transformed data. This ensures that the original data remains unchanged for future use.\n\n#### Example 2: Extracting Vertex Sets from a Table\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY projectExample2() syntax v3{\n SELECT tgt as tgtAcct, phone as tgtPhone INTO T1\n FROM (s:Account {name: \"Scott\"}) - [e:transfer]-> (tgt:Account) - [:hasPhone] - (phone);\n\n PRINT T1;\n\n PROJECT T1 ON VERTEX COLUMN\n tgtAcct INTO vSet1,\n tgtPhone INTO vSet2\n ;\n\n VS_1 = SELECT s FROM (s:vSet1);\n VS_2 = SELECT s FROM (s:vSet2);\n\n PRINT VS_1, VS_2;\n}\n\ninstall query projectExample2\nrun query projectExample2()\n```\n**Explanation**\n\nWe first create an intermediate table (`T1`), which contains `tgtAcct` (target account) and `tgtPhone` (phone number linked to the account).\nNext, we extract vertex sets using the `PROJECT tableName ON VERTEX COLUMN` syntax.\n\n - `PROJECT T1 ON VERTEX COLUMN tgtAcct INTO vSet1`: Extracts `tgtAcct` vertices into `vSet1`.\n - `PROJECT T1 ON VERTEX COLUMN tgtPhone INTO vSet2`: Extracts `tgtPhone` vertices into `vSet2`.\n\n[Go back to top](#top)\n\n---\n## Join Statement\nIn GSQL queries, the `JOIN` operation is commonly used to combine data from multiple tables (or nodes and relationships). Depending on the specific requirements, different types of `JOIN` operations are used. Common `JOIN` types include `INNER JOIN`, `CROSS JOIN`, `SEMIJOIN`, and `LEFT JOIN`.\n\n#### INNER JOIN\n\nThe `INNER JOIN` statement combines rows from both tables where the join condition is true. Only rows that have matching values in both tables are returned.\n\n**Syntax:**\n```python\nJOIN table1_alias WITH table2_alias\n ON \nPROJECT \n\t as columnName1, \n\t as columnName2,\n\t...\nINTO newTableName;\n```\n\n**Example**\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY innerJoinExample(STRING accountName = \"Scott\") syntax v3{\n SELECT s.name as srcAccount, sum(e.amount) as amt INTO T1\n FROM (s:Account {name: accountName}) - [e:transfer]-> (t);\n\n SELECT s.name, t.number as phoneNumber INTO T2\n FROM (s:Account) - [:hasPhone]- (t:Phone);\n\n JOIN T1 t1 WITH T2 t2\n ON t1.srcAccount == t2.name\n PROJECT\n t1.srcAccount + \":\" + t2.phoneNumber as acct,\n t1.amt as totalAmt\n INTO T3;\n\n PRINT T3;\n}\n\ninstall query innerJoinExample\nrun query innerJoinExample()\n```\n**Explanation:**\n\n- **`INNER JOIN`** combines data from `T1` and `T2` based on matching `srcAccount` and `name`. Only rows with a match in both tables are returned, which results in a joined set of data containing the account name and the total transfer amount.\n\n---\n#### CROSS JOIN\n\nThe `CROSS JOIN` statement combines each row from the first table with all rows from the second table, producing the Cartesian product. This type of join does not require a condition and can potentially result in a large number of rows. If you want to eliminate duplicate rows from the result, you can use the `DISTINCT` keyword to return only unique combinations.\n\n**Syntax:**\n```python\nJOIN table1_alias WITH table2_alias\nPROJECT \n\t as columnName1, \n\t as columnName2,\n\t...\nINTO newTableName;\n```\n\n**Example**\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY crossJoinExample(STRING accountName = \"Scott\") syntax v3{\n SELECT s.name as srcAccount, sum(e.amount) as amt INTO T1\n FROM (s:Account {name: accountName}) - [e:transfer]-> (t);\n\n SELECT s.name, t.number as phoneNumber INTO T2\n FROM (s:Account) - [:hasPhone]- (t:Phone);\n\n JOIN T1 t1 WITH T2 t2\n PROJECT distinct\n t1.srcAccount + \":\" + t2.phoneNumber as acct,\n t1.amt as totalAmt\n INTO T3;\n\n PRINT T3;\n}\n\ninstall query crossJoinExample\nrun query crossJoinExample()\n```\n**Explanation**\n\n- **`CROSS JOIN`** produces a Cartesian product between `T1` and `T2`. In this example, every `srcAccount` will be paired with every phone number from `T2`, resulting in all combinations of accounts and phone numbers.\n- **`DISTINCT`** is used to remove any duplicate combinations from the result. Without `DISTINCT`, you might get repeated rows if there are multiple matching rows in `T2` for each row in `T1`.\n---\n\n#### SEMIJOIN\n\nThe `SEMIJOIN` statement filters rows from the first table based on whether they have a matching row in the second table. It returns rows from the first table where the join condition is true, but **only columns from the left (first) table can be accessed**. The right table's columns are not included in the result.\n\n**Syntax:**\n```python\nSEMIJOIN table1_alias WITH table2_alias\n ON \n PROJECT \n\t as columnName1,\n\t as columnName2,\n\t...\nINTO newTableName;\n```\n\n**Example Usage:**\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY semiJoinExample(STRING accountName = \"Scott\") syntax v3{\n SELECT s.name as srcAccount, sum(e.amount) as amt INTO T1\n FROM (s:Account) - [e:transfer]-> (t);\n\n SELECT s.name, t.number as phoneNumber INTO T2\n FROM (s:Account {name: accountName}) - [:hasPhone]- (t:Phone);\n\n SEMIJOIN T1 t1 WITH T2 t2\n ON t1.srcAccount == t2.name\n PROJECT\n t1.srcAccount as acct,\n t1.amt as totalAmt\n INTO T3;\n\n PRINT T3;\n}\n\ninstall query semiJoinExample\nrun query semiJoinExample()\n```\n\n**Explanation**\n\n- **`SEMIJOIN`** returns rows from `T1` where there is a matching row in `T2`, but only the columns from `T1` are included in the result. Even though there is a match between the two tables on `srcAccount` and `name`, **the result only includes columns from the left table (`T1`)**. This is useful when you want to check for the existence of matching rows without including data from the second table.\n\n---\n\n#### LEFT JOIN\n\nThe `LEFT JOIN` statement combines rows from both tables, but ensures that all rows from the left table (first table) are included, even if there is no matching row in the right table (second table). If no match exists, the right table's columns will have `NULL` values.\n\n**Syntax:**\n\n```python\nLEFT JOIN table1_alias WITH table2_alias\n ON \nPROJECT \n\t as columnName1, \n\t as columnName2,\n\t...\nINTO newTableName;\n```\n\n**Example Usage:**\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY leftJoinExample(STRING accountName = \"Scott\") syntax v3{\n SELECT s.name as srcAccount, sum(e.amount) as amt INTO T1\n FROM (s:Account) - [e:transfer]-> (t);\n\n SELECT s.name, t.number as phoneNumber INTO T2\n FROM (s:Account) - [:hasPhone]- (t:Phone) ;\n\n LEFT JOIN T1 t1 WITH T2 t2\n ON t1.srcAccount == t2.name\n PROJECT\n t1.srcAccount as acct,\n t2.phoneNumber as phoneNum,\n t1.amt as totalAmt\n INTO T3;\n\n PRINT T3;\n}\n\ninstall query leftJoinExample\nrun query leftJoinExample()\n```\n\n**Explanation**\n\n- **`LEFT JOIN`** returns all rows from `T1` (the left table), even if there is no matching row in `T2` (the right table). If no match is found, the columns from `T2` will be filled with `NULL`. In this example, even accounts without a phone number will appear in the result, with `phoneNumber` as `NULL`.\n\n[Go back to top](#top)\n\n---\n## Union Statement\nThe `UNION` statement in GSQL combines the results of two compatible tables into a new table, ensuring **no duplicate rows** in the output by default. This operation is useful when merging data sets from different sources or when performing set operations on table results.\n\n#### Syntax\n\n```python\nUNION table1 WITH table2 [WITH table3 ...] INTO newTable;\n```\n- `table1`, `table2`, ...: Input tables that need to be combined.\n- `newTable`: The resulting table that stores the merged data.\n- The input tables **must have the same schema** (i.e., the same number of columns with matching data types).\n- Important Note: After UNION, the original tables (table1, table2, etc.) are destroyed and cannot be used in subsequent query operations.\n\n#### Example \n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY unionExample(STRING accountName = \"Scott\") syntax v3{\n // Select accounts that transferred money\n SELECT s as acct INTO T1\n FROM (s:Account {name: accountName}) - [e:transfer]-> (t);\n\n // Select accounts by name\n SELECT s as acct INTO T2\n FROM (s:Account {name: accountName})\n ;\n\n // Combine both results into table T3\n UNION T1 WITH T2 INTO T3;\n\n PRINT T3;\n}\n\ninstall query unionExample\nrun query unionExample()\n```\n\n#### Explanation\n\n**Selecting Data Into Temporary Tables**\n\n - `T1`: Selects accounts that have transferred money.\n - `T2`: Selects accounts that match the given name.\n\n**Performing the UNION Operation**\n `UNION T1 WITH T2 INTO T3;`\uff1aMerges results from `T1` and `T2`, removing duplicates.\n\n**Printing the Final Result**\n `PRINT T3;` outputs the combined dataset.\n \n[Go back to top](#top)\n\n---\n\n## Union All Statement\nThe `UNION ALL` statement functions similarly to `UNION`, but **does not remove duplicate rows**. This operation is useful when preserving all records from input tables, even if they are identical.\n\n#### Syntax\n\n```python\nUNION ALL table1 WITH table2 [WITH table3 ...] INTO newTable;\n```\n\n#### Example Usage:\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY unionAllExample(STRING accountName = \"Scott\") syntax v3{\n SELECT s as acct INTO T1\n FROM (s:Account {name: accountName}) - [e:transfer]-> (t);\n\n SELECT s as acct INTO T2\n FROM (s:Account {name: accountName})\n ;\n\n // Combine both results into table T3, keeping all duplicate rows\n UNION ALL T1 WITH T2 INTO T3;\n\n PRINT T3;\n}\n\ninstall query unionAllExample\nrun query unionAllExample()\n```\n\n#### Explanation:\n\nUsing `UNION ALL` can improve performance when duplicate elimination is unnecessary, as it avoids the extra computation required to filter out duplicates.\n\n[Go back to top](#top)\n\n---\n## Unwind Statement\nThe `UNWIND` statement in GSQL is used to expand a list into multiple rows, allowing iteration over list elements and their combination with other tables. This is particularly useful when applying transformations or computations based on a set of values.\n\nThere are two main forms of `UNWIND`:\n\n1. **Expanding a fixed list** to initialize a new table.\n2. **Expanding a list column per row** in an existing table.\n\n#### Syntax\n\n**Expanding a Fixed List (UNWIND INIT)**\n\n```python\nUNWIND [value1, value2, ...] AS columnName INTO newTable;\n```\n\n- `[value1, value2, ...]`: A list of values to be expanded.\n- `columnName`: The alias for each value in the generated table.\n- `newTable`: The resulting table that stores the expanded rows.\n\n**Expanding a List Per Row (UNWIND A TABLE)**\n\n```python\nUNWIND tableName ON list AS columnName INTO newTable;\n```\n\n- `tableName`: The input table whose rows will be expanded.\n- `list`: A list to be expanded for each row of `tableName`.\n- `columnName`: The alias for each value in the expanded list.\n- `newTable`: The resulting table that stores expanded rows while preserving columns from `tableName`.\n\n#### Expanding a fixed list to init a new table Example:\n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE QUERY unwindExample() syntax v3{\n\n // Creates table `T1`, where each value from the list `[0.9, 1.0, 1.1]` is inserted as a separate row under the column `ratio`.\n UNWIND [0.9, 1.0, 1.1] AS ratio INTO T1;\n PRINT T1;\n\n SELECT s.name as acct, sum(e.amount) as totalAmt INTO T2\n FROM (s:Account)- [e:transfer]-> (t:Account)\n WHERE s.isBlocked\n ;\n\n // 1. Joins `T1` (containing ratios) with `T2` (containing account transfer sums).\n // 2. Computes a new column \"resAmt = totalAmt * ratio\" to adjust transfer amounts based on different ratios.\n // 3. Stores the result in new table T3.\n JOIN T1 t1 WITH T2 t2\n PROJECT t2.acct as acct, t1.ratio as ratio, t2.totalAmt * t1.ratio as resAmt\n INTO T3;\n\n PRINT T3;\n}\n\ninstall query unwindExample\nrun query unwindExample()\n```\n#### Explanation\n\n- The list `[0.9, 1.0, 1.1]` is expanded **independently** into a new table (`T1`).\n- Later, `T1` is **joined** with `T2` to apply multipliers to `totalAmt`.\n\n\n#### Expanding a list for each row in an existing table Example:\n\n```python\nuse graph financialGraph\n\nSET opencypher_mode = true #unwind needs to set openCypher mode\nCREATE OR REPLACE QUERY unwindExample2() syntax v3{\n\n SELECT s.name as acct, [0.9, 1.0, 1.1] as ratioList, sum(e.amount) as totalAmt INTO T1\n FROM (s:Account)- [e:transfer]-> (t:Account)\n WHERE s.isBlocked\n ;\n\n UNWIND T1 ON ratioList AS ratio INTO T2;\n\n PRINT T2;\n}\n\ninstall query unwindExample2\nrun query unwindExample2()\n```\n#### Explanation:\n\n- Instead of creating a separate table first (`T1`), the list column `ratioList` is **expanded per row of `T1`** directly into `T2`.\n- The columns from `T1` (like `acct` and `totalAmt`) are preserved in `T2`, with additional rows for each `ratio`.\n\n[Go back to top](#top)\n\n---\n# Support \nIf you like the tutorial and want to explore more, join the GSQL developer community at \n\nhttps://community.tigergraph.com/\n\n[Go back to top](#top)\n\n---\n\n# Contact\nTo contact us for commercial support and purchase, please email us at [info@tigergraph.com](mailto:info@tigergraph.com)\n\n[Go back to top](#top)\n\n# References\nThe following academic papers have more technical depth for interested readers. \n\n[1] [Aggregation Support for Modern Graph Analytics in TigerGraph](https://dl.acm.org/doi/pdf/10.1145/3318464.3386144), in [SIGMOD 2020 proceedings](https://sigmod2020.org/).\n\n[2] [Graph Pattern Matching in GQL and SQL/PGQ](https://arxiv.org/pdf/2112.06217), in [SIGMOD 2022 proceedings](https://sigmod2022.org/).\n\n[3] [The LDBC Social Network Benchmark: Business Intelligence Workload](https://www.vldb.org/pvldb/vol16/p877-szarnyas.pdf), in [VLDB 2022 proceedings](https://vldb.org/2022/).\n\n[4] [PG-Schema: Schemas for Property Graphs](https://arxiv.org/pdf/2211.10962), in [SIGMOD 2023 proceedings](https://2023.sigmod.org/).\n\n[5] [Chasing Parallelism in Aggregating Graph Queries](https://drops.dagstuhl.de/storage/01oasics/oasics-vol119-tannens-festschrift/OASIcs.Tannen.5/OASIcs.Tannen.5.pdf), in [Tannen's Festschrift 2024](https://dblp.org/db/conf/birthday/tannen2024.html#Deutsch24).\n\n[6] [TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs](https://arxiv.org/pdf/2501.11216), in [SIGMOD 2025 proceedings](https://2025.sigmod.org/).\n\n[7] [Product Manual](https://docs.tigergraph.com/gsql-ref/current/intro/). \n\n[Go back to top](#top)\n\n"}
{"doc_id": "CypherTutorial", "doc_type": "markdown", "content": "\n# Introduction \n\nThis OpenCypher tutorial provides a hands-on introduction to new users. The software program is the TigerGraph comprehensive environment for designing graph schemas, loading and managing data to build a graph, and querying the graph to perform data analysis\n\nOpenCypher syntax emphasizes ASCII art in its syntax.\n\nA more exhaustive description of functionality and behavior of OpenCypher is available from the [OpenCypher Language Reference](https://opencypher.org/).\n\nTo follow this tutorial, install the TigerGraph Docker image (configured with 8 CPUs and 20 GB of RAM or at minimum 4 CPUs and 16 GB of RAM) or set up a Linux instance with Bash access. Download our free [Community Edition](https://dl.tigergraph.com/) to get started.\n\n\n# Table of Contents\n\n- [Sample Graph](#sample-graph-for-tutorial)\n- [Setup Environment](#setup-Environment)\n- [Setup Schema (model)](#setup-schema)\n- [Load Data](#load-data)\n- [Cypher Syntax Overview](#cypher-syntax-overview)\n- [Query Examples](#query-examples)\n - [Node Pattern](#node-pattern)\n - [Edge Pattern](#edge-pattern)\n - [Path Pattern](#path-pattern)\n - [Optional Match](#optional-match)\n - [With Clause](#with-clause)\n - [Sorting and Limiting Results](#sorting-and-limiting-results)\n - [Working With List](#working-with-list)\n - [Combining MATCH Pattern Results](#combining-match-pattern-results)\n - [Conditional Logic](#conditional-logic)\n - [Aggregate Functions](#aggregate-functions)\n - [Other Expression Functions](#other-expression-functions)\n - [CRUD Statements](#crud-statements)\n - [Support](#support)\n - [Contact](#contact)\n\n---\n# Sample Graph For Tutorial\nThis graph is a simplifed version of a real-world financial transaction graph. There are 5 _Account_ vertices, with 8 _transfer_ edges between Accounts. An account may be associated with a _City_ and a _Phone_.\nThe use case is to analyze which other accounts are connected to 'blocked' accounts.\n\n\n\n# Setup Environment \n\nIf you have your own machine (including Windows and Mac laptops), the easiest way to run TigerGraph is to install it as a Docker image. Download [Community Edition Docker Image](https://dl.tigergraph.com/). Follow the [Docker setup instructions](https://github.com/tigergraph/ecosys/blob/master/demos/guru_scripts/docker/README.md) to set up the environment on your machine.\n\n**Note**: TigerGraph does not currently support the ARM architecture and relies on Rosetta to emulate x86 instructions. For production environments, we recommend using an x86-based system.\nFor optimal performance, configure your Docker environment with **8 CPUs and 20+ GB** of memory. If your laptop has limited resources, the minimum recommended configuration is **4 CPUs and 16 GB** of memory.\n\nAfter installing TigerGraph, the `gadmin` command-line tool is automatically included, enabling you to easily start or stop services directly from your bash terminal.\n```python\n docker load -i ./tigergraph-4.2.0-alpha-community-docker-image.tar.gz # the xxx.gz file name are what you have downloaded. Change the gz file name depending on what you have downloaded\n docker images #find image id\n docker run -d -p 14240:14240 --name mySandbox imageId #start a container, name it \u201cmySandbox\u201d using the image id you see from previous command\n docker exec -it mySandbox /bin/bash #start a shell on this container. \n gadmin start all #start all tigergraph component services\n gadmin status #should see all services are up.\n```\n\nFor the impatient, load the sample data from the tutorial/gsql folder and run your first query.\n```python\n cd tutorial/gsql/ \n gsql 00_schema.gsql #setup sample schema in catalog\n gsql 01_load.gsql #load sample data \n gsql #launch gsql shell\n GSQL> use graph financialGraph #enter sample graph\n GSQL> ls #see the catalog content\n GSQL> select a from (a:Account) #query Account vertex\n GSQL> select s, e, t from (s:Account)-[e:transfer]->(t:Account) limit 2 #query edge\n GSQL> select count(*) from (s:Account) #query Account node count\n GSQL> select s, t, sum(e.amount) as transfer_amt from (s:Account)-[e:transfer]->(t:Account) # query s->t transfer ammount\n GSQL> exit #quit the gsql shell \n```\n\nYou can also access the GraphStudio visual IDE directly through your browser:\n```python\n http://localhost:14240/\n```\n\nA login page will automatically open. Use the default credentials: user is `tigergraph`, password is `tigergraph`. \nOnce logged in, click the GraphStudio icon. Assuming you've set up the tutorial schema and loaded the data, navigate by selecting `Global View`, then choose `financialGraph` from the pop up menu. Click Explore Graph to start interacting with your data visually.\n\nTo further explore the features of GraphStudio, you can view these concise introductory [videos](https://www.youtube.com/watch?v=29PCZEhyx8M&list=PLq4l3NnrSRp7RfZqrtsievDjpSV8lHhe-), and [product manual](https://docs.tigergraph.com/gui/4.2/intro/). \n\nThe following command is good for operation.\n\n```python\n#To stop the server, you can use\n gadmin stop all\n#Check `gadmin status` to verify if the gsql service is running, then use the following command to reset (clear) the database.\n gsql 'drop all'\n```\n\n**Note that**, our fully managed service -- [TigerGraph Savanna](https://savanna.tgcloud.io/) is entirely GUI-based and does not provide access to a bash shell. To execute the GSQL examples in this tutorial, simply copy the query into the Savanna GSQL editor and click Run.\n\nAdditionally, all Cypher examples referenced in this tutorial can be found in your TigerGraph tutorials/cypher folder.\n\n[Go back to top](#top)\n\n---\n# Setup Schema \nWe use an artificial financial schema and dataset as a running example to demonstrate the usability of graph searches. The figure above provides a visualization of all the graph data in the database.\n\nCopy [00_schema.gsql](./gsql/00_schema.gsql) to your container. \nNext, run the following in your container's bash command line. \n```\ngsql 00_schema.gsql\n```\nAs seen below, the declarative DDL create vertex and edge types. Vertex type requires a `PRIMARY KEY`. Edge types requires a `FROM` and `TO` vertex types as the key. We allow edges of the same type share endpoints. In such case, a `DISCRIMINATOR` attribute is needed to differentiate edges sharing the same endpoints. `REVERSE_EDGE` specifies a twin edge type excep the direction is reversed. \n\n```python\n//install gds functions\nimport package gds\ninstall function gds.**\n\n//create vertex types\nCREATE VERTEX Account ( name STRING PRIMARY KEY, isBlocked BOOL)\nCREATE VERTEX City ( name STRING PRIMARY KEY)\nCREATE VERTEX Phone (number STRING PRIMARY KEY, isBlocked BOOL)\n\n//create edge types\nCREATE DIRECTED EDGE transfer (FROM Account, TO Account, DISCRIMINATOR(date DATETIME), amount UINT) WITH REVERSE_EDGE=\"transfer_reverse\"\nCREATE UNDIRECTED EDGE hasPhone (FROM Account, TO Phone)\nCREATE DIRECTED EDGE isLocatedIn (FROM Account, TO City)\n\n//create graph; * means include all graph element types in the graph.\nCREATE GRAPH financialGraph (*)\n```\n\n[Go back to top](#top)\n\n---\n\n# Load Data \n\nYou can choose one of the following methods. \n\n- Load sample data from our publicly accessible s3 bucket \n \n Copy [01_load.gsql](./gsql/01_load.gsql) to your container. \n Next, run the following in your container's bash command line. \n ```\n gsql 01_load.gsql\n ```\n or in GSQL Shell editor, copy the content of [01_load.gsql](./gsql/01_load.gsql), and paste it into the GSQL shell editor to run.\n \n- Load from local file in your container\n - Copy the following data files to your container.\n - [account.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/account.csv)\n - [phone.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/phone.csv)\n - [city.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/city.csv)\n - [hasPhone.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/hasPhone.csv)\n - [locate.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/locate.csv)\n - [transfer.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/transfer.csv)\n\n - Copy [25_load.gsql](./gsql/25_load.gsql) to your container. Modify the script with your local file path. Next, run the following in your container's bash command line. \n ```\n gsql 25_load2.gsql\n ``` \n or in GSQL Shell editor, copy the content of [25_load.gsql](./script/25_load.gsql), and paste in GSQL shell editor to run.\n\n The declarative loading script is self-explanatory. You define the filename alias for each data source, and use the the LOAD statement to map the data source to the target schema elements-- vertex types, edge types, and vector attributes.\n ```python\n USE GRAPH financialGraph\n\n DROP JOB load_local_file\n\n //load from local file\n CREATE LOADING JOB load_local_file {\n // define the location of the source files; each file path is assigned a filename variable. \n DEFINE FILENAME account=\"/home/tigergraph/data/account.csv\";\n DEFINE FILENAME phone=\"/home/tigergraph/data/phone.csv\";\n DEFINE FILENAME city=\"/home/tigergraph/data/city.csv\";\n DEFINE FILENAME hasPhone=\"/home/tigergraph/data/hasPhone.csv\";\n DEFINE FILENAME locatedIn=\"/home/tigergraph/data/locate.csv\";\n DEFINE FILENAME transferdata=\"/home/tigergraph/data/transfer.csv\";\n //define the mapping from the source file to the target graph element type. The mapping is specified by VALUES clause. \n LOAD account TO VERTEX Account VALUES ($\"name\", gsql_to_bool(gsql_trim($\"isBlocked\"))) USING header=\"true\", separator=\",\";\n LOAD phone TO VERTEX Phone VALUES ($\"number\", gsql_to_bool(gsql_trim($\"isBlocked\"))) USING header=\"true\", separator=\",\";\n LOAD city TO VERTEX City VALUES ($\"name\") USING header=\"true\", separator=\",\";\n LOAD hasPhone TO Edge hasPhone VALUES ($\"accnt\", gsql_trim($\"phone\")) USING header=\"true\", separator=\",\";\n LOAD locatedIn TO Edge isLocatedIn VALUES ($\"accnt\", gsql_trim($\"city\")) USING header=\"true\", separator=\",\";\n LOAD transferdata TO Edge transfer VALUES ($\"src\", $\"tgt\", $\"date\", $\"amount\") USING header=\"true\", separator=\",\";\n }\n\n run loading job load_local_file\n ```\n \n[Go back to top](#top)\n\n---\n\n# Cypher Syntax Overview\n\nOpenCypher is a declarative query language designed for interacting with graph databases. It enables the retrieval and manipulation of nodes, relationships, and their properties.\n\nThe core syntax of openCypher follows the MATCH-WHERE-RETURN pattern.\n\n- `MATCH` is used to specify graph patterns in an intuitive ASCII-art style, such as `()-[]->()-[]->()`. Here, `()` represents nodes, and `-[]->` represents relationships. By alternating nodes and relationships, users can define linear paths or complex patterns within the graph schema.\n- The results of a `MATCH` operation are stored in an implicit working table, where the columns correspond to the aliases of graph elements (nodes or relationships) in the declared pattern. These columns can then be referenced in subsequent clauses, including MATCH, OPTIONAL MATCH, WITH, or RETURN. Each subsequent clause can transform the invisible working table by projecting aways columns, adding new columns, and rows. \n\nIn the next section, we will explore Cypher syntax in detail through practical examples.\n\n# Query Examples \n\nIn OpenCypher, the main statement is a pattern match statement in the form of MATCH-WHERE-RETURN. Each MATCH statement will create or update an invisible working table. The working table consists all the alias (vertex/edge) and columns specified in the current and previous MATCH statements. Other statement will also work on the working table to drive the final result.\n\nWe will use examples to illustrate Cypher syntax. In TigerGraph, each Cypher query is installed as a stored procedure using a code generation technique for optimal performance, enabling repeated execution by its query name.\n\n---\n\n## Node Pattern\n### MATCH A Vertex Set \nCopy [c1.cypher](./cypher/c1.cypher) to your container. \n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE OPENCYPHER QUERY c1() {\n // MATCH a node pattern-- symbolized by (),\n //\":Account\" is the label of the vertex type Account, \"a\" is a binding variable to the matched node. \n // return will print out all the bound Account vertices in JSON format.\n MATCH (a:Account)\n RETURN a\n}\n\n# To run the query, we need to install it first.\n# Compile and install the query as a stored procedure\ninstall query c1\n\n# run the compiled query\nrun query c1()\n```\nThe result is shown in [c1.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/cypher/c1.out) under `/home/tigergraph/tutorial/cypher/c1.out`\n\n[Go back to top](#top)\n\n### MATCH A Vertex Set With Filter\nCopy [c2.cypher](./cypher/c2.cypher) to your container. \n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE OPENCYPHER QUERY c2() {\n // MATCH a node pattern-- symbolized by (),\n //\":Account\" is the label of the vertex type Account, \"a\" is a binding variable to the matched node. \n // WHERE clause specify a boolean condition to filter the matched Accounts. \n // return will print out all the bound Account vertices in JSOn format.\n MATCH (a:Account)\n WHERE a.name = \"Scott\"\n RETURN a\n}\n\n# To run the query, we need to install it first.\n# Compile and install the query as a stored procedure\ninstall query c2\n\n# run the compiled query\nrun query c2()\n```\nThe result is shown in [c2.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/cypher/c2.out) under `/home/tigergraph/tutorial/cypher/c2.out`\n\n[Go back to top](#top)\n\n---\n\n## Edge Pattern \n### MATCH 1-hop Edge Pattern\nCopy [c3.cypher](./cypher/c3.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n# create a query\nCREATE OR REPLACE OPENCYPHER QUERY c3(string accntName) {\n\n // match an edge pattern-- symbolized by ()-[]->(), where () is node, -[]-> is a directed edge\n // In cypher, we use $param to denote the binding literal\n // {name: $acctName} is a JSON style filter. It's equivalent to \"a.name = $acctName\".\n // \":transfer\" is the label of the edge type \"transfer\". \"e\" is the alias of the matched edge.\n MATCH (a:Account {name: $accntName})-[e:transfer]->(b:Account)\n RETURN b, sum(e.amount) AS totalTransfer\n\n}\n\n# compile and install the query as a stored procedure\ninstall query c3\n\n# run the compiled query\nrun query c3(\"Scott\")\n```\nThe result is shown in [c3.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/cypher/c3.out) under `/home/tigergraph/tutorial/cypher/c3.out`\n\nCopy [c4.cypher](./cypher/c4.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE OPENCYPHER QUERY c4() {\n\n //think the MATCH clause is a matched table with columns (a, e, b)\n //you can use SQL syntax to group by the source and target account, and sum the total transfer amount\n MATCH (a:Account)-[e:transfer]->(b:Account)\n RETURN a, b, sum(e.amount) AS transfer_total\n\n}\n\n#compile and install the query as a stored procedure\ninstall query c4\n\n#run the query\nrun query c4()\n```\nThe result is shown in [c4.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c4.out) under `/home/tigergraph/tutorial/cypher/c4.out` \n\n[Go back to top](#top)\n\n---\n\n## Path Pattern \n\n### Fixed Length Path Pattern\nCopy [c5.cypher](./cypher/c5.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE OPENCYPHER QUERY c5(datetime low, datetime high, string accntName) {\n\n // a path pattern in ascii art () -[]->()-[]->()\n MATCH (a:Account {name: $accntName})-[e:transfer]->()-[e2:transfer]->(b:Account)\n WHERE e.date >= $low AND e.date <= $high and e.amount >500 and e2.amount>500\n RETURN b.isBlocked, b.name \n \n}\n\n#compile and install the query as a stored procedure\ninstall query c5\n\n#run the query\nrun query c5(\"2024-01-01\", \"2024-12-31\", \"Scott\")\n```\n[Go back to top](#top)\n\n---\n\n### Variable Length Path Pattern\nCopy [c6.cypher](./cypher/c6.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE OPENCYPHER QUERY c6 (string accntName) {\n\n // a path pattern in ascii art () -[]->()-[]->()\n MATCH (a:Account {name: $accntName})-[:transfer*1..]->(b:Account)\n RETURN a, b \n\n}\n\n#compile and install the query as a stored procedure\ninstall query c6\n\n#run the query\nrun query c6(\"Scott\")\n```\n\nThe result is shown in [c6.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c6.out) under `/home/tigergraph/tutorial/cypher/c6.out` \n\nCopy [c7.cypher](./cypher/c7.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE OPENCYPHER QUERY c7(datetime low, datetime high, string accntName) {\n\n // below we use variable length path.\n // *1.. means 1 to more steps of the edge type \"transfer\"\n // select the reachable end point and bind it to vertex alias \"b\"\n // note:\n // 1. the path has \"shortest path\" semantics. If you have a path that is longer than the shortest,\n // we only count the shortest. E.g., scott to scott shortest path length is 4. Any path greater than 4 will\n // not be matched.\n // 2. we can not put an alias to bind the edge in the the variable length part -[:transfer*1..]->, but\n // we can bind the end points (a) and (b) in the variable length path, and group by on them.\n MATCH (a:Account {name: $accntName})-[:transfer*1..]->(b:Account)\n RETURN a, b, count(*) AS path_cnt \n}\n\ninstall query c7\n\nrun query c7(\"2024-01-01\", \"2024-12-31\", \"Scott\")\n```\n\nThe result is shown in [c7.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c7.out) under `/home/tigergraph/tutorial/cypher/c7.out` \n\n[Go back to top](#top)\n\n### Sum Distinct On 1-hop Within A Path\nPath pattern has multiple hops. To sum each hop's edge attributes, we need `DISTINCT` keyword. \nCopy [c8.cypher](./cypher/c8.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE OPENCYPHER QUERY c8 (datetime low, datetime high) {\n\n // a path pattern in ascii art () -[]->()-[]->()\n // think the FROM clause is a matched table with columns (a, e, b, e2, c)\n // you can use SQL syntax to group by on the matched table\n // Below query find 2-hop reachable account c from a, and group by the path a, b, c\n // find out how much each hop's total transfer amount.\n MATCH (a:Account)-[e:transfer]->(b)-[e2:transfer]->(c:Account)\n WHERE e.date >= $low AND e.date <= $high\n RETURN a, b, c, sum(DISTINCT e.amount) AS hop_1_sum, sum(DISTINCT e2.amount) AS hop_2_sum \n \n}\n\n#compile and install the query as a stored procedure\ninstall query c8\n\n#run the query\nrun query c8(\"2024-01-01\", \"2024-12-31\")\n```\n\nThe result is shown in [c8.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c8.out) under `/home/tigergraph/tutorial/cypher/c8.out` \n\n[Go back to top](#top)\n\n---\n\n## Optional Match\n`OPTIONAL MATCH` matches patterns against your graph, just like MATCH does. The difference is that if no matches are found, OPTIONAL MATCH will use a null for missing parts of the pattern.\n\nIn query c21, we first match `Account` whose name is $accntName. Next, we find if the matched `Account` satisfies the `OPTIONAL MATCH` clause. If not, we pad `null` on the `MATCH` clause produced match table row. If yes, we pad the `OPTIONAL MATCH` table to the `MATCH` clause matched row. \n\nCopy [c21.cypher](./cypher/c21.cypher) to your container. \n\n```python\nuse graph financialGraph\n\nCREATE OR REPLACE OPENCYPHER QUERY c21(String accntName){\n MATCH (srcAccount:Account {name: $accntName})\n OPTIONAL MATCH (srcAccount)- [e:transfer]-> (tgtAccount:Account)\n WHERE srcAccount.isBlocked\n RETURN srcAccount, tgtAccount\n}\n\ninstall query c21\nrun query c21(\"Jenny\")\n```\n\nThe result is shown in [c21.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c21.out) under `/home/tigergraph/tutorial/cypher/c21.out` \n\n---\n\n## With Clause\n\nThe WITH clause in Cypher is used to chain parts of a query, pass intermediate results to the next part, or perform transformations like aggregation. It acts as a way to manage query scope and handle intermediate data without exposing everything to the final result.\n\n### Key Uses:\n- **Filter intermediate results**: Apply conditions on data before proceeding.\n- **Aggregation**: Perform calculations and pass the results further.\n- **Variable scope management**: Avoid cluttering query scope by controlling what gets passed forward.\n\n### Filter intermediate result\n\nIn the example below, the `WITH a` passes the filtered `Account` (names starting with \"J\") to the next part of the query.\nThe `RETURN a.name` outputs the names.\n\nCopy [c9.cypher](./cypher/c9.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE OPENCYPHER QUERY c9() {\n\n MATCH (a:Account)\n WHERE a.name STARTS WITH \"J\"\n WITH a\n RETURN a.name\n}\n\ninstall query c9\n\nrun query c9()\n```\n\nThe result is shown in [c9.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c9.out) under `/home/tigergraph/tutorial/cypher/c9.out` \n\n[Go back to top](#top)\n\n---\n\n### Aggregation\nIn c10 query below, the `WITH a.isBlocked AS Blocked, COUNT(a) AS blocked_count` groups data by `isBlocked` and calculates the count of Account.\n`RETURN Blocked, blocked_count` outputs the aggregated results.\n\nCopy [c10.cypher](./cypher/c10.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE OPENCYPHER QUERY c10() {\n\n MATCH (a:Account)\n WITH a.isBlocked AS Blocked, COUNT(a) AS blocked_count\n RETURN Blocked, blocked_count\n\n}\n\ninstall query c10\n\nrun query c10()\n```\n\nThe result is shown in [c10.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c10.out) under `/home/tigergraph/tutorial/cypher/c10.out` \n\n[Go back to top](#top)\n\n---\n\n### Variable scope management\nIn query c11 below, the ` WITH a.name AS name` narrows the scope to only the name property.\n`WHERE name STARTS WITH \"J\"` filters names starting with 'J'. `RETURN name` outputs the filtered names.\n\nCopy [c11.cypher](./cypher/c11.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\n\n// create a query\nCREATE OR REPLACE OPENCYPHER QUERY c11() {\n\n MATCH (a:Account)\n WITH a.name AS name\n WHERE name STARTS WITH \"J\"\n RETURN name\n}\n\ninstall query c11\n\nrun query c11()\n```\n\nThe result is shown in [c11.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c11.out) under `/home/tigergraph/tutorial/cypher/c11.out` \n\n[Go back to top](#top)\n\n---\n\n## Sorting and Limiting Results\n`ORDER BY` is a sub-clause following `RETURN` or `WITH`, and it specifies that the output should be sorted and how. `SKIP` defines from which record to start including the records in the output. `LIMIT` constrains the number of records in the output.\n\nIn query c12 below, the sorting (`ORDER BY`), skipping (`SKIP`), and limiting (`LIMIT`) operations occur after the WITH clause, which means they are applied to the intermediate results.\n\nThe query first aggregates data by counting `tgt2` for each `srcAccountName` using `WITH`. Next, `ORDER BY` sorts the results by `tgt2Cnt` (descending) and `srcAccountName` (descending). `SKIP 1` skips the first record from the sorted intermediate result set. `LIMIT 3` restricts the output to the next 3 records.\n\nCopy [c12.cypher](./cypher/c12.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph \n\nCREATE OR REPLACE OPENCYPHER QUERY c12(){ \n MATCH (src)-[e:transfer]-> (tgt1) \n MATCH (tgt1)-[e:transfer]-> (tgt2) \n WITH src.name AS srcAccountName, COUNT(tgt2) AS tgt2Cnt \n ORDER BY tgt2Cnt DESC, srcAccountName DESC \n SKIP 1 \n LIMIT 3 \n RETURN srcAccountName, tgt2Cnt \n} \n\ninstall query c12\n\nrun query c12()\n```\n\nThe result is shown in [c12.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c12.out) under `/home/tigergraph/tutorial/cypher/c12.out` \n\n[Go back to top](#top)\n\nIn query c13 below, the sorting (`ORDER BY`), skipping (`SKIP`), and limiting (`LIMIT`) operations are applied after the `RETURN` clause, meaning they act on the final result set. The final result is the same as c12, but the order of operations is different.\n\nCopy [c13.cypher](./cypher/c13.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph \nCREATE OR REPLACE OPENCYPHER QUERY c13(){ \n MATCH (src)-[e:transfer]-> (tgt1) \n MATCH (tgt1)-[e:transfer]-> (tgt2) \n WITH src.name AS srcAccountName, COUNT(tgt2) AS tgt2Cnt \n RETURN srcAccountName, tgt2Cnt \n ORDER BY tgt2Cnt DESC, srcAccountName DESC \n SKIP 1 \n LIMIT 3 \n} \n\ninstall query c13\n\nrun query c13()\n```\n\nThe result is shown in [c13.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c13.out) under `/home/tigergraph/tutorial/cypher/c13.out` \n\n[Go back to top](#top)\n\n---\n\n## Working With List\n\n### UNWIND Clause\nIn Cypher, the `UNWIND` clause is used to transform a list into individual rows. It's helpful when you have a list of values and want to treat each value as a separate row in your query.\n\nIn query c14 below, the `UNWIND` is used to expand existing rows with each element of the list `[1, 2, 3]`. \nIn each expanded row, the variable `x` will hold each element of the list, allowing you to perform further operations on it in the subsequent parts of the query.\n\nCopy [c14.cypher](./cypher/c14.cypher) to your container. \n\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE OPENCYPHER QUERY c14(){\n MATCH (src)-[e:transfer]-> (tgt1)\n WHERE src.name in [\"Jenny\", \"Paul\"]\n UNWIND [1, 2, 3] AS x //the \"Jenny\" row will be expanded to [Jenny, 1], [Jenny,2], [Jenny, 3]. Same fashion applies to the \"Paul\" row.\n WITH src AS srcAccount, e.amount * x AS res\n RETURN srcAccount, res\n}\n\ninstall query c14\n\nrun query c14()\n```\n\nThe result is shown in [c14.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c14.out) under `/home/tigergraph/tutorial/cypher/c14.out` \n\n[Go back to top](#top)\n\n---\n\n### COLLECT Function\nIn Cypher, the `collect()` function is used to aggregate values into a list. It is often used in conjunction with `RETURN` or `WITH` to group and organize data into collections.\n\nIn query c15() below, `MATCH (src)-[e:transfer]->(tgt)` finds all 1-hop transfers started with \"Jenny\" or \"Paul\". `COLLECT(e.amount)` gathers all the `e.amount` into a single list, grouped by `srcAccount`. `RETURN` outputs the `amounts` list per `srcAccount`.\n\nCopy [c15.cypher](./cypher/c15.cypher) to your container.\n\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE OPENCYPHER QUERY c15(){\n MATCH (src)-[e:transfer]->(tgt)\n WHERE src.name in [\"Jenny\", \"Paul\"]\n WITH src AS srcAccount, COLLECT(e.amount) AS amounts\n RETURN srcAccount, amounts\n}\n\ninstall query c15\n\nrun query c15()\n```\nThe result is shown in [c15.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c15.out) under `/home/tigergraph/tutorial/cypher/c15.out`\n\n[Go back to top](#top)\n\n---\n\n### Using WITH and COLLECT()\nYou can use `UNWIND` to decompose a list column produced by `COLLECT()`.\n\nIn query c16() below, `MATCH (src)-[e:transfer]->(tgt)` finds all 1-hop transfers started with \"Jenny\" or \"Paul\". `COLLECT(e.amount)` gathers all the `e.amount` into a single list, grouped by `srcAccount`. `UNWIND` expand the list, and append each element on the list's belonging row. `WITH` will double the amount column. `RETURN` output the `UNWIND` result.\n\nCopy [c16.cypher](./cypher/c16.cypher) to your container.\n\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE OPENCYPHER QUERY c16(){\n MATCH (src)-[e:transfer]->(tgt)\n WHERE src.name in [\"Jenny\", \"Paul\"]\n WITH src AS srcAccount, COLLECT(e.amount) AS amounts //collect will create ammounts list for each srcAccount\n UNWIND amounts as amount //for each source account row, inflate the row to a list of rows with each element in the amounts list\n WITH srcAccount, amount*2 AS doubleAmount\n RETURN srcAccount, doubleAmount\n}\n\ninstall query c16\nrun query c16()\n```\n\nThe result is shown in [c16.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c16.out) under `/home/tigergraph/tutorial/cypher/c16.out`\n\n[Go back to top](#top)\n\n---\n\n## Combining MATCH Pattern Results\n\nEach `MATCH` clause will create a match table. You can use `UNION` and `UNION ALL` to combine schema compatiable match tables.\n\n### UNION\nIn query c17() below, ` MATCH (s:Account {name: \"Paul\"})` finds \"Paul\". `MATCH (s:Account) WHERE s.isBlocked` finds all blocked accounts. `UNION` will combine these two with duplicates removed.\n\nCopy [c17.cypher](./cypher/c17.cypher) to your container.\n\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE OPENCYPHER QUERY c17(){\n MATCH (s:Account {name: \"Paul\"})\n RETURN s AS srcAccount\n UNION\n MATCH (s:Account)\n WHERE s.isBlocked\n RETURN s AS srcAccount\n}\n\ninstall query c17\n```\n\nThe result is shown in [c17.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c17.out) under `/home/tigergraph/tutorial/cypher/c17.out`\n\n[Go back to top](#top)\n\n---\n\n### UNION ALL\nIn query c18() below, ` MATCH (s:Account {name: \"Steven\"})` finds \"Steven\". `MATCH (s:Account) WHERE s.isBlocked` finds all blocked accounts--\"Steven\". `UNION ALL` will combine these two, keeping the duplicates.\n\nCopy [c18.cypher](./cypher/c18.cypher) to your container.\n\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE OPENCYPHER QUERY c18(){\n MATCH (s:Account {name: \"Steven\"})\n RETURN s AS srcAccount\n UNION ALL\n MATCH (s:Account)\n WHERE s.isBlocked\n RETURN s AS srcAccount\n}\n\ninstall query c18\nrun query c18()\n```\n\nThe result is shown in [c18.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c18.out) under `/home/tigergraph/tutorial/cypher/c18.out`\n\n[Go back to top](#top)\n\n---\n\n## Conditional Logic\n### CASE Expression\nThe CASE expression in OpenCypher allows you to implement conditional logic within a query, enabling dynamic result customization based on specific conditions.\n\n***Syntax***\n```python\nCASE\n WHEN THEN \n WHEN THEN \n ...\n ELSE \nEND\n```\n\nIn query c19() below, `CASE WHEN` will produce 0 for blcoked account, and 1 for non-blocked account.\n\nCopy [c19.cypher](./cypher/c19.cypher) to your container.\n\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE OPENCYPHER QUERY c19(){\n MATCH (s:Account {name: \"Steven\"})- [:transfer]-> (t)\n WITH\n s.name AS srcAccount,\n t.name AS tgtAccount,\n CASE\n WHEN s.isBlocked = true THEN 0\n ELSE 1\n END AS tgt\n RETURN srcAccount, SUM(tgt) as tgtCnt\n}\n\ninstall query c19\nrun query c19()\n```\n\nThe result is shown in [c19.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c19.out) under `/home/tigergraph/tutorial/cypher/c19.out`\n\n[Go back to top](#top)\n\n---\n\n## Aggregate Functions\nAggregation functions in OpenCypher allow you to perform calculations over a set of values, summarizing or transforming the data into a single result. These functions are typically used in combination with the `WITH` or `RETURN` clauses to compute aggregate values based on certain criteria. In `WITH` and `RETURN`, other non-aggregate expressions are used form groups of the matched rows. \n\n### Common Aggregation Functions:\n- ***COUNT()***: Counts the number of items in a given set. e.g. COUNT(*), COUNT(1), COUNT(DISTINCT columnName).\n- ***SUM()***: Computes the sum of numeric values. It is often used to calculate totals, such as the total amount transferred.\n- ***AVG()***: Calculates the average of numeric values.\n- ***MIN()***: Finds the smallest value in a set. Often used to determine the minimum amount or value.\n- ***MAX()***: Finds the largest value in a set. This is useful for identifying the highest value.\n- ***COLLECT()***: Aggregates values into a list. Can be used to collect nodes or relationships into a list for further processing.\n- ***STDEV()***: Computes the standard deviation of values.\n- ***STDEVP()***: Computes the population standard deviation of values.\n\nIn query c20 below, we group by `src.name` and aggregate on other matched attributes in the matched table. \n\nCopy [c20.cypher](./cypher/c20.cypher) to your container.\n\n```python\nUSE GRAPH financialGraph\nCREATE OR REPLACE OPENCYPHER QUERY c20(){\n MATCH (src)-[e:transfer]->(tgt)\n WITH src.name AS srcAccount, \n COUNT(DISTINCT tgt) AS transferCount, \n SUM(e.amount) AS totalAmount,\n STDEV(e.amount) AS stdevAmmount\n RETURN srcAccount, transferCount, totalAmount, stdevAmmount\n}\n\nINSTALL query c20\n\nrun query c20()\n```\n\nThe result is shown in [c20.out](https://github.com/tigergraph/ecosys/blob/master/tutorials/cypher/c20.out) under `/home/tigergraph/tutorial/cypher/c20.out`\n\n[Go back to top](#top)\n\n---\n\n## Other Expression Functions\nThere are many expression functions openCypher supports. Please refer to [openCypher functions](https://docs.tigergraph.com/gsql-ref/4.1/opencypher-in-gsql/opencypher-in-gsql) \n\n[Go back to top](#top)\n\n---\n\n## CRUD Statements\n\nOpenCypher offers comprehensive support for performing Data Modification (Create, Update, Delete) operations on graph data. It provides an intuitive syntax to handle node and relationship manipulation, including their attributes.\n\n### Insert Data\n\nThe `CREATE` statement in OpenCypher is used to add new nodes or relationships to the graph. If the specified node or relationship doesn't exist, it will be created. If it does exist, it will be replaced with the new data.\n\n#### Insert Node\n\nThe following query creates a new `Account` node with properties `name` and `isBlocked`:\n\n```python\nCREATE OR REPLACE OPENCYPHER QUERY insertVertex(STRING name, BOOL isBlocked){\n CREATE (p:Account {name: $name, isBlocked: $isBlocked})\n}\n\n# This will create an `Account` node with `name=\"Abby\"` and `isBlocked=true`.\ninterpret query insertVertex(\"Abby\", true)\n```\n\n#### Insert Relationship\n\nThe following query creates a `transfer` edge between two `Account` nodes with properties `date` and `amount`\n\n```python\nCREATE OR REPLACE OPENCYPHER QUERY insertEdge(VERTEX s, VERTEX t, DATETIME dt, UINT amt){\n CREATE (s) -[:transfer {date: $dt, amount: $amt}]-> (t)\n}\n\n# Create two `transfer` relationships from \"Abby\" to \"Ed\"\ninterpret query insertEdge(\"Abby\", \"Ed\", \"2025-01-01\", 100)\ninterpret query insertEdge(\"Abby\", \"Ed\", \"2025-01-09\", 200)\n```\n\nYou can use the `SELECT` statement to check if the insertion was successful.\n\n```python\nGSQL > select e from (s:Account {name: \"Abby\"}) -[e:transfer]-> (t:Account {name: \"Ed\"})\n{\n \"version\": {\n \"edition\": \"enterprise\",\n \"api\": \"v2\",\n \"schema\": 0\n },\n \"error\": false,\n \"message\": \"\",\n \"results\": [\n {\n \"Result_Table\": [\n {\n \"e\": {\n \"e_type\": \"transfer\",\n \"from_id\": \"Abby\",\n \"from_type\": \"Account\",\n \"to_id\": \"Ed\",\n \"to_type\": \"Account\",\n \"directed\": true,\n \"discriminator\": \"2025-01-01 00:00:00\",\n \"attributes\": {\n \"date\": \"2025-01-01 00:00:00\",\n \"amount\": 100\n }\n }\n },\n {\n \"e\": {\n \"e_type\": \"transfer\",\n \"from_id\": \"Abby\",\n \"from_type\": \"Account\",\n \"to_id\": \"Ed\",\n \"to_type\": \"Account\",\n \"directed\": true,\n \"discriminator\": \"2025-01-09 00:00:00\",\n \"attributes\": {\n \"date\": \"2025-01-09 00:00:00\",\n \"amount\": 200\n }\n }\n }\n ]\n }\n ]\n}\n```\n\n---\n\n### Delete Data\n\nThe `DELETE` statement in OpenCypher is used to **remove nodes and relationships** from the graph. When deleting a node, all its associated relationships will also be deleted.\n\n#### Delete a single node\n\nWhen you delete a node, if it has relationships, all of its relationships will also be deleted.\n\n```python\nCREATE OR REPLACE OPENCYPHER QUERY deleteOneVertex(STRING name=\"Abby\"){\n MATCH (s:Account {name: $name})\n DELETE s\n}\n\n# delete \"Abby\"\ninterpret query deleteOneVertex(\"Abby\")\n```\n\nYou can use the `SELECT` statement to check if the deletion was successful.\n\n```python\nGSQL > select s from (s:Account) where s.name=\"Abby\"\n{\n \"version\": {\n \"edition\": \"enterprise\",\n \"api\": \"v2\",\n \"schema\": 0\n },\n \"error\": false,\n \"message\": \"\",\n \"results\": [\n {\n \"Result_Vertex_Set\": []\n }\n ]\n}\n```\n\n#### Delete all nodes of the specified type\n\nYou can delete all nodes of a particular label type.\n\n**Single Label Type**\n\n```python\n### single type\nCREATE OR REPLACE OPENCYPHER QUERY deleteAllVertexWithType01(){\n MATCH (s:Account)\n DELETE s\n}\n\n# Delete all nodes with the label `Account`\ninterpret query deleteAllVertexWithType01()\n```\n\n**Multiple Label Types**\n\n```python\n### multiple types\nCREATE OR REPLACE OPENCYPHER QUERY deleteVertexWithType02(){\n MATCH (s:Account:Phone)\n DELETE s\n}\n\n# Delete all nodes with the label `Account` or `Phone`\ninterpret query deleteVertexWithType02()\n```\n\n#### Delete all nodes\n\nThis query deletes all nodes in the graph.\n\n```python\nCREATE OR REPLACE OPENCYPHER QUERY deleteAllVertex(){\n MATCH (s)\n DELETE s\n}\n\ninterpret query deleteAllVertex()\n```\n\nYou can use the `SELECT` statement to check if the deletion was successful.\n\n```python\nGSQL > select count(*) from (s)\n{\n \"version\": {\n \"edition\": \"enterprise\",\n \"api\": \"v2\",\n \"schema\": 0\n },\n \"error\": false,\n \"message\": \"\",\n \"results\": [\n {\n \"Result_Table\": {\n \"count_lparen_1_rparen_\": 0\n }\n }\n ]\n}\n```\n\n\n#### Delete relationships\n\nYou can delete relationships based on specific conditions.\n\n**Delete `transfer` Relationships with a Date Filter**\n\nThis query deletes all `transfer` relationships where the date is earlier than the specified filter date.\n\n```python\nCREATE OR REPLACE OPENCYPHER QUERY deleteEdge(STRING name=\"Abby\", DATETIME filterDate=\"2024-02-01\"){\n MATCH (s:Account {name: $name}) -[e:transfer] -> (t:Account)\n WHERE e.date < $filterDate\n DELETE e\n}\n\ninterpret query deleteEdge()\n```\n\n**Delete all outgoing edges of a specific account**\n\n```python\n//default parameter is \"Abby\"\nCREATE OR REPLACE OPENCYPHER QUERY deleteAllEdge(STRING name=\"Abby\"){\n MATCH (s:Account {name: $name}) -[e] -> ()\n DELETE e\n}\n\n# Delete all outgoing relationships from the node with the name \"Abby\"\ninterpret query deleteAllEdge()\n```\n\n---\n\n### Update Data\n\nUpdating data in OpenCypher allows you to modify node and relationship attributes. The primary mechanism for updating attributes is the `SET` clause, which is used to assign or change the properties of nodes or relationships.\n\n#### Update vertex attributes\n\nYou can update the attributes of a node. In this example, the `isBlocked` attribute of the `Account` node is set to `false` for a given account name.\n\n```python\nCREATE OR REPLACE OPENCYPHER QUERY updateAccountAttr(STRING name=\"Abby\"){\n MATCH (s:Account {name: $name})\n SET s.isBlocked = false\n}\n\n# Update the `isBlocked` attribute of the `Account` node with name \"Abby\" to false\ninterpret query updateAccountAttr()\n```\n\n#### Update edge attributes\n\nYou can also update the attributes of a relationship. In this example, the `amount` attribute of a `transfer` relationship is updated for a specified account, as long as the target account is not blocked.\n\n```python\nCREATE OR REPLACE OPENCYPHER QUERY updateTransferAmt(STRING startAcct=\"Jenny\", UINT newAmt=100){\n MATCH (s:Account {name: $startAcct})- [e:transfer]-> (t)\n WHERE NOT t.isBlocked\n SET e.amount = $newAmt\n}\n\ninterpret query updateTransferAmt(_, 300)\n```\n\nYou can use the `SELECT` statement to check if the update was successful.\n\n```python\nGSQL > select e from (s:Account {name: \"Jenny\"}) - [e:transfer]-> (t)\n{\n \"version\": {\n \"edition\": \"enterprise\",\n \"api\": \"v2\",\n \"schema\": 0\n },\n \"error\": false,\n \"message\": \"\",\n \"results\": [\n {\n \"Result_Table\": [\n {\n \"e\": {\n \"e_type\": \"transfer\",\n \"from_id\": \"Jenny\",\n \"from_type\": \"Account\",\n \"to_id\": \"Scott\",\n \"to_type\": \"Account\",\n \"directed\": true,\n \"discriminator\": \"2024-04-04 00:00:00\",\n \"attributes\": {\n \"date\": \"2024-04-04 00:00:00\",\n \"amount\": 300\n }\n }\n }\n ]\n }\n ]\n}\n```\n\n\n[Go back to top](#top)\n\n---\n# Support \nIf you like the tutorial and want to explore more, join the GSQL developer community at \n\nhttps://community.tigergraph.com/\n\nOr, study our product document at\n\nhttps://docs.tigergraph.com/gsql-ref/current/intro/\n\n[Go back to top](#top)\n\n---\n\n# Contact\nTo contact us for commercial support and purchase, please email us at [info@tigergraph.com](mailto:info@tigergraph.com)\n\n[Go back to top](#top)\n\n\n"}
{"doc_id": "VectorTurorial", "doc_type": "markdown", "content": "# Native Vector Support in TigerGraph\nTigerGraph offers native vector support, making it easier to perform vector searches on graph patterns. This feature combines the strengths of graph and vector databases, enabling powerful data analysis and seamless query integration. We believe agentic AI and GraphRAG will benefit from this powerful combination!\n\nTo follow this tutorial, install the TigerGraph Docker image (configured with 8 CPUs and 20 GB of RAM or at minimum 4 CPUs and 16 GB of RAM) or set up a Linux instance with Bash access. Download our free [Community Edition](https://dl.tigergraph.com/) to get started.\n\n---\n# Table of Contents\n\n- [Sample Graph](#sample-graph-for-tutorial)\n- [Setup Environment](#setup-environment)\n- [Setup Schema (model)](#setup-schema)\n- [Load Data](#load-data)\n- [Install GDS Functions](#install-gds-functions)\n- [Vector Search Functions](#vector-search-functions)\n - [Vector Search Architecture](#vector-search-architecture)\n - [vectorSearch Function](#vectorsearch-function)\n - [Vector Built-in Functions](#vector-built-in-functions) \n- [Query Examples](#query-examples)\n - [Vector Search](#vector-search)\n - [Range Vector Search](#range-vector-search)\n - [Filtered Vector Search](#filtered-vector-search)\n - [Vector Search on Graph Patterns](#vector-search-on-graph-patterns)\n - [Vector Similarity Join on Graph Patterns](#vector-similarity-join-on-graph-patterns)\n - [Vector Search Driven Pattern Match](#vector-search-driven-pattern-match)\n- [Essential Operations and Tools](#Essential-operations-and-tools)\n - [Global and Local Schema Change](#global-and-local-schema-change)\n - [Vector Data Loading](#vector-data-loading)\n - [Python Integration](#python-integration)\n- [Vector Update](#vector-update)\n- [Support](#support)\n- [Reference](#reference)\n- [Contact](#contact)\n\n---\n# Sample Graph For Tutorial\nThis graph is a simplifed version of a real-world financial transaction graph. There are 5 _Account_ vertices, with 8 _transfer_ edges between Accounts. An account may be associated with a _City_ and a _Phone_.\nThe use case is to analyze which other accounts are connected to 'blocked' accounts.\n\n\n\n---\n \n# Setup Environment \n\nIf you have your own machine (including Windows and Mac laptops), the easiest way to run TigerGraph is to install it as a Docker image. Download [Community Edition Docker Image](https://dl.tigergraph.com/). Follow the [Docker setup instructions](https://github.com/tigergraph/ecosys/blob/master/demos/guru_scripts/docker/README.md) to set up the environment on your machine.\n\n**Note**: TigerGraph does not currently support the ARM architecture and relies on Rosetta to emulate x86 instructions. For production environments, we recommend using an x86-based system.\nFor optimal performance, configure your Docker environment with **8 CPUs and 20+ GB** of memory. If your laptop has limited resources, the minimum recommended configuration is **4 CPUs and 16 GB** of memory.\n\nAfter installing TigerGraph, the `gadmin` command-line tool is automatically included, enabling you to easily start or stop services directly from your bash terminal.\n\n```python\n docker load -i ./tigergraph-4.2.0-alpha-community-docker-image.tar.gz # the xxx.gz file name are what you have downloaded. Change the gz file name depending on what you have downloaded\n docker images #find image id\n docker run -d -p 14240:14240 --name mySandbox imageId #start a container, name it \u201cmySandbox\u201d using the image id you see from previous command\n docker exec -it mySandbox /bin/bash #start a shell on this container. \n gadmin start all #start all tigergraph component services\n gadmin status #should see all services are up.\n```\n\nFor the impatient, load the sample data from the tutorial/gsql folder and run your first query.\n\n```python\n cd tutorial/gsql/ \n gsql 00_schema.gsql #setup sample schema in catalog\n gsql 01_load.gsql #load sample data \n gsql #launch gsql shell\n GSQL> use graph financialGraph #enter sample graph\n GSQL> ls #see the catalog content\n GSQL> select a from (a:Account) #query Account vertex\n GSQL> select s, e, t from (s:Account)-[e:transfer]->(t:Account) limit 2 #query edge\n GSQL> select count(*) from (s:Account) #query Account node count\n GSQL> select s, t, sum(e.amount) as transfer_amt from (s:Account)-[e:transfer]->(t:Account) # query s->t transfer ammount\n GSQL> exit #quit the gsql shell \n```\n\nYou can also access the GraphStudio visual IDE directly through your browser:\n```python\n http://localhost:14240/\n```\nA login page will automatically open. Use the default credentials: user is `tigergraph`, password is `tigergraph`. \nOnce logged in, click the GraphStudio icon. Assuming you've set up the tutorial schema and loaded the data, navigate by selecting `Global View`, then choose `financialGraph` from the pop up menu. Click Explore Graph to start interacting with your data visually.\n\nTo further explore the features of GraphStudio, you can view these concise introductory [videos](https://www.youtube.com/watch?v=29PCZEhyx8M&list=PLq4l3NnrSRp7RfZqrtsievDjpSV8lHhe-), and [product manual](https://docs.tigergraph.com/gui/4.2/intro/). \n\nThe following command is good for operation.\n\n```python\n#To stop the server, you can use\n gadmin stop all\n#Check `gadmin status` to verify if the gsql service is running, then use the following command to reset (clear) the database.\n gsql 'drop all'\n```\n\n**Note that**, our fully managed service -- [TigerGraph Savanna](https://savanna.tgcloud.io/) is entirely GUI-based and does not provide access to a bash shell. To execute the GSQL examples in this tutorial, simply copy the query into the Savanna GSQL editor and click Run.\n\nAdditionally, all GSQL examples referenced in this tutorial can be found in your TigerGraph tutorials/vector folder.\n\n[Go back to top](#top)\n\n# Setup Schema \nWe use an artificial financial schema and dataset as a running example to demonstrate the usability of hybrid vector and graph searches. The figure above provides a visualization of all the graph data in the database.\n\nTo augment the graph dataset with vector data, for each Account and Phone node, we generated a 3-dimensional random vector data. By default, the cosine metric is used to measure the distance between vectors.\n\nLocate [00_ddl.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/00_ddl.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container. \nNext, run the following in your container's bash command line. \n```\ngsql /home/tigergraph/tutorial/vector/00_ddl.gsql\n```\n\nAs seen below, `Account` and `Phone` vertex types are extended with `emb1` vector attribute, which is 3-dimensional vector. By default, the `emb1` will use `consine` metric. An ANN search index will be automatically built and maintained as vector data is loaded and updated. \n\n```python\n//install gds functions\nimport package gds\ninstall function gds.**\n\n//create vertex types\nCREATE VERTEX Account ( name STRING PRIMARY KEY, isBlocked BOOL)\nCREATE VERTEX City ( name STRING PRIMARY KEY)\nCREATE VERTEX Phone (number STRING PRIMARY KEY, isBlocked BOOL)\n\n//create edge types\nCREATE DIRECTED EDGE transfer (FROM Account, TO Account, DISCRIMINATOR(date DATETIME), amount UINT) WITH REVERSE_EDGE=\"transfer_reverse\"\nCREATE UNDIRECTED EDGE hasPhone (FROM Account, TO Phone)\nCREATE DIRECTED EDGE isLocatedIn (FROM Account, TO City)\n\n//create vectors\nCREATE GLOBAL SCHEMA_CHANGE JOB fin_add_vector {\n //add an embedding attribute \"emb1\" to vertex type \"Account\"\n ALTER VERTEX Account ADD VECTOR ATTRIBUTE emb1(dimension=3);\n ALTER VERTEX Phone ADD VECTOR ATTRIBUTE emb1(dimension=3);\n}\nrun global schema_change job fin_add_vector\n\n//create graph; * means include all graph element types in the graph.\nCREATE GRAPH financialGraph (*)\n```\n\n[Go back to top](#top)\n\n# Load Data \n\nYou can choose one of the following methods. \n\n- Load sample data from our publicly accessible s3 bucket \n \n Locate [01_load.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/01_load.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container. \n Next, run the following in your container's bash command line. Wait 2 mintues as it's pulling data from s3. \n\n ```\n gsql /home/tigergraph/tutorial/vector/01_load.gsql\n ```\n or in GSQL Shell editor, copy the content of [01_load.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/01_load.gsql), and paste it into the GSQL shell editor to run.\n \n- Load from local file in your container\n - Locate the following data files under `/home/tigergraph/tutorial/data` or copy them to your container:\n - [account.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/account.csv)\n - [phone.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/phone.csv)\n - [city.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/city.csv)\n - [hasPhone.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/hasPhone.csv)\n - [locate.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/locate.csv)\n - [transfer.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/transfer.csv)\n - [account_emb.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/account_emb.csv)\n - [phone_emb.csv](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/data/phone_emb.csv)\n\n - Locate [02_load2.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/02_load2.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container. Modify the script with your local file path if necessary. Next, run the following in your container's bash command line. \n ```\n gsql /home/tigergraph/tutorial/vector/02_load2.gsql\n ``` \n or in GSQL Shell editor, copy the content of [02_load2.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/02_load2.gsql), and paste in GSQL shell editor to run.\n \n The declarative loading script is self-explanatory. You define the filename alias for each data source, and use the the `LOAD` statement to map the data source to the target schema elements-- vertex types, edge types, and vector attributes. \n\n ```python\n USE GRAPH financialGraph\n\n DROP JOB load_local_file\n\n //define a loading job to load from local file\n CREATE LOADING JOB load_local_file {\n // define the location of the source files; each file path is assigned a filename variable. \n DEFINE FILENAME account=\"/home/tigergraph/tutorial/data/account.csv\";\n DEFINE FILENAME phone=\"/home/tigergraph/tutorial/data/phone.csv\";\n DEFINE FILENAME city=\"/home/tigergraph/tutorial/data/city.csv\";\n DEFINE FILENAME hasPhone=\"/home/tigergraph/tutorial/data/hasPhone.csv\";\n DEFINE FILENAME locatedIn=\"/home/tigergraph/tutorial/data/locate.csv\";\n DEFINE FILENAME transferdata=\"/home/tigergraph/tutorial/data/transfer.csv\";\n DEFINE FILENAME accountEmb=\"/home/tigergraph/tutorial/data/account_emb.csv\";\n DEFINE FILENAME phoneEmb=\"/home/tigergraph/tutorial/data/phone_emb.csv\";\n //define the mapping from the source file to the target graph element type. The mapping is specified by VALUES clause. \n LOAD account TO VERTEX Account VALUES ($\"name\", gsql_to_bool(gsql_trim($\"isBlocked\"))) USING header=\"true\", separator=\",\";\n LOAD phone TO VERTEX Phone VALUES ($\"number\", gsql_to_bool(gsql_trim($\"isBlocked\"))) USING header=\"true\", separator=\",\";\n LOAD city TO VERTEX City VALUES ($\"name\") USING header=\"true\", separator=\",\";\n LOAD hasPhone TO Edge hasPhone VALUES ($\"accnt\", gsql_trim($\"phone\")) USING header=\"true\", separator=\",\";\n LOAD locatedIn TO Edge isLocatedIn VALUES ($\"accnt\", gsql_trim($\"city\")) USING header=\"true\", separator=\",\";\n LOAD transferdata TO Edge transfer VALUES ($\"src\", $\"tgt\", $\"date\", $\"amount\") USING header=\"true\", separator=\",\";\n LOAD accountEmb TO VECTOR ATTRIBUTE emb1 ON VERTEX Account VALUES ($0, SPLIT($1, \",\")) USING SEPARATOR=\"|\", header=\"true\";\n LOAD phoneEmb TO VECTOR ATTRIBUTE emb1 ON VERTEX Phone VALUES ($0, SPLIT($1, \",\")) USING SEPARATOR=\"|\", header=\"true\";\n }\n\n run loading job load_local_file\n ```\n \n[Go back to top](#top)\n\n# Install GDS functions\nGDS functions to be used in the queries need to be installed in advance\n\n```python\ngsql \nGSQL> import package gds\nGSQL> install function gds.**\nGSQL> show package gds.vector\n```\n[Go back to top](#top)\n# Vector Search Functions\n## Vector Search Architecture\nTigerGraph supports both ANN vector search and exact vector search.\n\n### Approximate Nearest Neighbors (ANN)\nANN is a technique for identifying points that are approximately closest to a query point in high-dimensional spaces. It offers significant improvements in speed and scalability compared to exact methods, with only a slight trade-off in accuracy.\n\nTigerGraph enhances vertex capabilities by introducing support for vector attributes. When vector data is loaded as an attribute, the engine automatically indexes it to facilitate ANN searches. This indexing process leverages TigerGraph\u2019s Massively Parallel Processing (MPP) architecture, enabling efficient parallel processing across multiple compute cores or machines. By default, the HNSW algorithm is used, with future releases planned to support additional indexing methods.\n\nTigerGraph provides a user-friendly vectorSearch function for performing ANN searches within a GSQL query. This built-in function integrates seamlessly with other GSQL query blocks and accumulators, supporting both basic and advanced use cases. These include pure vector searches, filtered vector searches, and searches based on graph patterns.\n\n### Exact Vector Search \nTo support exact searches, TigerGraph includes a set of built-in vector functions. These functions allow users to perform operations on vector attributes, enabling advanced capabilities such as exact top-k vector searches, similarity joins on graph patterns, and innovative fusions of structured and unstructured data.\n\n\n[Go back to top](#top)\n\n## vectorSearch Function\n### Syntax\n```\n//result is a vertex set variable, storing the top-k most similar vertices. \nresult = vectorSearch(VectorAttributes, QueryVector, K, optionalParam)\n```\n### Function name and return type\nIn GSQL, we support top-k ANN (approximate nearest neighbor) vector search via the function `vectorSearch()`, which will return the top k most similar vectors to an input `QueryVector`.\nThe result will be assigned to a vertex set variable, which can be used by subsequent GSQL query blocks. E.g., `result` will hold the top-k most similar vertices based on their embedding distance to the query embedding.\n\n### Parameter\n|Parameter\t|Description\n|-------|--------\n|`VectorAttributes`\t|A set of vector attributes we will search, the items should be in format **VertexType.VectorName**. E.g., `{Account.eb1, Phone.eb1}`.\n|`QueryVector`\t|The query embedding constant to search the top K most similar vectors.\n|`K`\t|The top k cutoff--where K most similar vectors will be returned.\n|`optionalParam` | A map of optional params, including vertex candidate set, EF-- the exploration factor in HNSW algorithm, and a global MapAccum storing top-k (vertex, distance score) pairs. E.g., `{candidate_set: vset1, ef: 20, distance_map: @@distmap}`.\n\n[Go back to top](#top)\n## Vector Built-in Functions\nIn order to support vector type computation, GSQL provides a list of built-in vector functions. You can see the function signatures by typing the following command in GSQL shell.\n\n\n```python\nGSQL> show package gds.vector\n````\nYou will see\n```\nPackages \"gds.vector\":\n - Object:\n - Functions:\n - gds.vector.cosine_distance(list list1, list list2) RETURNS (float) (installed)\n - gds.vector.dimension_count(list list1) RETURNS (int) (installed)\n - gds.vector.distance(list list1, list list2, string metric) RETURNS (float) (installed)\n - gds.vector.elements_sum(list list1) RETURNS (float) (installed)\n - gds.vector.ip_distance(list list1, list list2) RETURNS (float) (installed)\n - gds.vector.kth_element(list list1, int kth_index) RETURNS (float) (installed)\n - gds.vector.l2_distance(list list1, list list2) RETURNS (float) (installed)\n - gds.vector.norm(list list1, string metric) RETURNS (float) (installed)\n```\n\n| Function | Parameter | Return Type | Description |\n|------------|---------|--------------|--------------|\n|gds.vector.distance |`list list1, list list2, string metric` |float|Calculates the distance between two vectors represented as lists of double values, based on a specified distance metric: \"cosine\", \"l2\", \"ip\".\n|gds.vector.cosine_distance |`list list1, list list2` |float|Calculates the cosine distance between two vectors represented as lists of doubles.\n|gds.vector.ip_distance |`list list1, list list2` |float|Calculates the inner product (dot product) between two vectors represented as lists of double values.\n|gds.vector.l2_distance |`list list1, list list2` |float|Calculates the Euclidean distance between two vectors represented as lists of double values.\n|gds.vector.norm |`list list1, string metric` |float|Computes the norm (magnitude) of a vector based on a specified metric.\n|gds.vector.dimension_count |`list list1` |int|Returns the number of dimensions (elements) in a given vector, represented as a list of double values.\n|gds.vector.elements_sum |`list list1` |float|Calculates the sum of all elements in a vector, represented as a list of double values.\n|gds.vector.kth_element |`list list1, int index` |float|Retrieves the k-th element from a vector, represented as a list of double values.\n\nYou can also see these built-in function implementations, which is GSQL code. For example, if we want to see the `distance` function implementation, we can do\n```python\nGSQL>show function gds.vector.distance\n```\n[Go back to top](#top)\n\n# Query Examples\n## Vector Search\n### Top-k vector search on a given vertex type's vector attribute. \n\nLocate [03_q1.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/03_q1.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/03_q1.gsql\n```\n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q1 (LIST query_vector) SYNTAX v3 {\n MapAccum @@distances;\n\n //find top-5 similar embeddings from Account's embedding attribute emb1, store the distance in @@distance\n v = vectorSearch({Account.emb1}, query_vector, 5, { distance_map: @@distances});\n\n print v WITH VECTOR; //show the embeddings\n print @@distances; //show the distance map\n}\n\n#compile and install the query as a stored procedure\ninstall query q1\n\n#run the query\nrun query q1([-0.017733968794345856, -0.01019224338233471, -0.016571875661611557])\n```\nThe result is shown in [q1.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q1.out) under `/home/tigergraph/tutorial/vector/q1.out` \n\nYou can also use POST method to call REST api to invoke the installed query. By default, the query will be located at URL \"restpp/query/{graphName}/{queryName}\". \nOn the payload, you specify the parameter using \"key:value\" by escaping the quotes of the parameter name.\n\n```python\ncurl -u \"tigergraph:tigergraph\" -H 'Content-Type: application/json' -X POST \"http://127.0.0.1:14240/gsql/v1/queries/q1?graph=financialGraph\" -d '{\n \"parameters\":{\"query_vector\":[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557]}}' | jq\n```\n\n### Top-k vector search on a set of vertex types' vector attributes. \n\nLocate [04_q1a.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/04_q1a.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/04_q1a.gsql\n```\n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q1a (LIST query_vector) SYNTAX v3 {\n MapAccum @@distances;\n //specify vector search on Account and Phone's emb1 attribute. \n v = vectorSearch({Account.emb1, Phone.emb1}, query_vector, 8, { distance_map: @@distances});\n\n print v WITH VECTOR;\n print @@distances;\n}\n\n#compile and install the query as a stored procedure\ninstall query q1a\n\n#run the query\nrun query q1a ([-0.017733968794345856, -0.01019224338233471, -0.016571875661611557])\n```\nThe result is shown in [q1a.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q1a.out) under `/home/tigergraph/tutorial/vector/q1a.out` \n### Top-k vector search using a vertex embedding as the query vector\n\nLocate [05_q1b.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/05_q1b.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/05_q1b.gsql\n```\n```python\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q1b () SYNTAX v3 {\n //this global accumulator will be storing the query vector. \n //You can retrieve an embedding attribute and accmulate it into a ListAccum\n ListAccum @@query_vector;\n MapAccum @@distances;\n \n //find Scott's embedding, store it in @@query_vector\n s = SELECT a\n FROM (a:Account)\n WHERE a.name == \"Scott\"\n POST-ACCUM @@query_vector += a.emb1;\n\n //find top-5 similar to Scott's embedding from Account's embedding attribute emb1, store the distance in @@distance\n v = vectorSearch({Account.emb1}, @@query_vector, 5, { distance_map: @@distances});\n\n print v WITH VECTOR; //show the embeddings\n print @@distances; //show the distance map\n}\n\n#compile and install the query as a stored procedure\ninstall query q1b\n\n#run the query\nrun query q1b()\n```\nThe result is shown in [q1b.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q1b.out) under `/home/tigergraph/tutorial/vector/q1b.out` \n\n### Top-k vector search from a vertex set parameter\n\nLocate [06_q1c.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/06_q1c.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n\n```\ngsql /home/tigergraph/tutorial/vector/06_q1c.gsql\n```\n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\n# create a query\nCREATE OR REPLACE QUERY q1c (VERTEX name, SET> slist, LIST query_vector) SYNTAX v3 {\n // Define a vextex set from the vertex parameter\n v = {name};\n\n // output vertex set variable v in JSON format with embedding\n print v WITH VECTOR;\n\n // Define a vextex set from the vertex set parameter\n v = {slist};\n\n // Get the most similar vector from the list\n // The result is re-assigned to v. \n v = vectorSearch({Account.emb1}, query_vector, 1, {candidate_set: v});\n\n // output vertex set variable v in JSON format with embedding\n print v WITH VECTOR;\n}\n\n#compile and install the query as a stored procedure\ninstall query q1c\n\n#run the query\nrun query q1c(\"Scott\", [\"Steven\", \"Jenny\"], [-0.017733968794345856, -0.01019224338233471, -0.016571875661611557])\n```\nThe result is shown in [q1c.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q1c.out) under `/home/tigergraph/tutorial/vector/q1c.out` \n\n[Go back to top](#top)\n\n## Range Vector Search\nDo a range vector search with a given query embedding and a distance threshold. \n\nLocate [07_q2.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/07_q2.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/07_q2.gsql\n```\n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q2 (LIST query_vector, double threshold) SYNTAX v3 {\n //find Account whose emb1 distance to a query_vector is less than a threshold\n v = SELECT a\n FROM (a:Account)\n WHERE gds.vector.distance(a.emb1, query_vector, \"COSINE\") < threshold;\n\n print v WITH VECTOR;\n}\n\n#compile and install the query as a stored procedure\ninstall query q2\n\n#run the query\nrun query q2([-0.017733968794345856, -0.01019224338233471, -0.016571875661611557], 0.394)\n```\nThe result is shown in [q2.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q2.out) under `/home/tigergraph/tutorial/vector/q2.out` \n\n[Go back to top](#top)\n## Filtered Vector Search\nDo a GSQL query block to select a vertex candidate set, then do vector top-k search on the candidate set. \n\nLocate [08_q3.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/08_q3.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/08_q3.gsql\n```\n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q3 (LIST query_vector, int k) SYNTAX v3 {\n MapAccum @@distances;\n //select candidate for vector search\n c = SELECT a\n FROM (a:Account)\n WHERE a.name in (\"Scott\", \"Paul\", \"Steven\");\n //do top-k vector search within the vertex set \"c\", store the top-k distances to the distance_map\n v = vectorSearch({Account.emb1}, query_vector, k, {candidate_set: c, distance_map: @@distances});\n\n print v WITH VECTOR;\n print @@distances;\n\n}\n\n#compile and install the query as a stored procedure\ninstall query q3\n\n#run the query\nrun query q3([-0.017733968794345856, -0.01019224338233471, -0.016571875661611557], 2)\n```\n\nThe result is shown in [q3.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q3.out) under `/home/tigergraph/tutorial/vector/q3.out` \n\nYou can also use POST method to call REST api to invoke the installed query. By default, the query will be located at URL \"restpp/query/{graphName}/{queryName}\". \nOn the payload, you specify the parameter using \"key:value\" by escaping the quotes of the parameter name.\n```python\ncurl -X POST \"http://127.0.0.1:14240/restpp/query/financialGraph/q3\" -d '{\"query_vector\":[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557], \"k\": 2}' | jq\n```\n\n[Go back to top](#top)\n## Vector Search on Graph Patterns\n\n### Approximate Nearest Neighbor (ANN) vector search on a graph pattern\nDo a pattern match first to find candidate vertex set. Then, do a vector search. \n\nLocate [09_q4.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/09_q4.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/09_q4.gsql\n```\n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\n# create a query\nCREATE OR REPLACE QUERY q4 (datetime low, datetime high, LIST query_vector) SYNTAX v3 {\n\n\n MapAccum @@distances1;\n MapAccum @@distances2;\n\n // a path pattern in ascii art ()-[]->()-[]->()\n c1 = SELECT b\n FROM (a:Account {name: \"Scott\"})-[e:transfer]->()-[e2:transfer]->(b:Account)\n WHERE e.date >= low AND e.date <= high and e.amount >500 and e2.amount>500;\n\n //ANN search. Do top-k search on the vertex set \"c1\".\n v = vectorSearch({Account.emb1}, query_vector, 2, {candidate_set: c1, distance_map: @@distances1});\n\n PRINT v WITH VECTOR;\n PRINT @@distances1;\n\n // below we use variable length path.\n // *1.. means 1 to more steps of the edge type \"transfer\"\n // select the reachable end point and bind it to vertex alias \"b\"\n c2 = SELECT b\n FROM (a:Account {name: \"Scott\"})-[:transfer*1..]->(b:Account)\n WHERE a.name != b.name;\n //ANN search. Do top-k search on the vertex set \"c2\"\n v = vectorSearch({Account.emb1}, query_vector, 2, {candidate_set: c2, distance_map: @@distances2});\n\n PRINT v WITH VECTOR;\n PRINT @@distances2;\n\n}\n\n#compile and install the query as a stored procedure\ninstall query q4\n\n#run the query\nrun query q4(\"2024-01-01\", \"2024-12-31\", [-0.017733968794345856, -0.01019224338233471, -0.016571875661611557])\n```\nThe result is shown in [q4.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q4.out) under `/home/tigergraph/tutorial/vector/q4.out` \n\n### Exact vector search on a graph pattern \n\nUse `ORDER BY ASC` or `ORDER BY DESC` to do exact top-k vector search. This method is exepensive. \n\nLocate [10_q4a.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/10_q4a.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/10_q4a.gsql\n```\n\n```python\nUSE GRAPH financialGraph\n\n# create a query\nCREATE OR REPLACE QUERY q4a (LIST query_vector) SYNTAX v3 {\n\n\n MapAccum @@distances1;\n MapAccum @@distances2;\n\n // do an exact top-k search on \"b\" using the ORDER BY clause with ASC keyword\n c1 = SELECT b\n FROM (a:Account)-[e:transfer]->(b:Account)\n ORDER BY gds.vector.cosine_distance(b.emb1, query_vector) ASC\n LIMIT 3;\n\n PRINT c1 WITH VECTOR;\n\n\n // an approximate top-k search on the Account vertex set\n v = vectorSearch({Account.emb1}, query_vector, 3, {distance_map: @@distances1});\n\n PRINT v WITH VECTOR;\n PRINT @@distances1;\n\n // below we use variable length path.\n // *1.. means 1 to more steps of the edge type \"transfer\"\n // select the reachable end point and bind it to vertex alias \"b\"\n // do an exact top-k reverse-search on \"b\" using the ORDER BY clause with DESC keyword\n c2 = SELECT b\n FROM (a:Account {name: \"Scott\"})-[:transfer*1..]->(b:Account)\n WHERE a.name != b.name\n ORDER BY gds.vector.cosine_distance(b.emb1, query_vector) DESC\n LIMIT 3;\n\n PRINT c2 WITH VECTOR;\n\n // an approximate top-k search on the Account vertex set\n v = vectorSearch({Account.emb1}, query_vector, 5, {distance_map: @@distances2});\n\n PRINT v WITH VECTOR;\n PRINT @@distances2;\n\n}\n\n#compile and install the query as a stored procedure\ninstall query q4a\n\n#run the query\nrun query q4a([-0.017733968794345856, -0.01019224338233471, -0.016571875661611557])\n```\nThe result is shown in [q4a.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q4a.out) under `/home/tigergraph/tutorial/vector/q4a.out` \n\n[Go back to top](#top)\n## Vector Similarity Join on Graph Patterns\n### Top-K similarity join on graph patterns\nFind most similar pairs from a graph pattern. Exhaustive search any two pairs specified by vertex alias from a given graph pattern. \n\nLocate [11_q5.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/11_q5.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/11_q5.gsql\n```\n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\n# create a query\nCREATE OR REPLACE QUERY q5() SYNTAX v3 {\n\n //Define a custom tuple to store the vertex pairs and their distance\n TYPEDEF TUPLE pair;\n\n //Declare a global heap accumulator to store the top 2 similar pairs\n HeapAccum(2, distance ASC) @@result;\n\n // a path pattern in ascii art () -[]->()-[]->().\n // for each (a,b) pair, we calculate their \"CONSINE\" distance. and store them in a Heap.\n // only the top-2 pair will be kept in the Heap\n v = SELECT b\n FROM (a:Account)-[e:transfer]->()-[e2:transfer]->(b:Account)\n ACCUM @@result += pair(a, b, gds.vector.distance(a.emb1, b.emb1, \"COSINE\"));\n\n PRINT @@result;\n}\n\n#compile and install the query as a stored procedure\ninstall query q5\n\n#run the query\nrun query q5()\n```\nThe result is shown in [q5.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q5.out) under `/home/tigergraph/tutorial/vector/q5.out`\n\n### Range similarity join on graph patterns. \nFind similar pairs whose distance is less than a threshold from a graph pattern. Exhaustive search any two pairs specified by vertex alias from a given graph pattern. \n\nLocate [12_q5a.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/12_q5a.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/12_q5a.gsql\n```\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\n# create a query\nCREATE OR REPLACE QUERY q5a() SYNTAX v3 {\n\n //find close pairs that has distance less than 0.8\n SELECT a, b, gds.vector.distance(b.emb1, a.emb1, \"COSINE\") AS dist INTO T\n FROM (a:Account)-[e:transfer]->()-[e2:transfer]->(b:Account)\n WHERE gds.vector.distance(a.emb1, b.emb1, \"COSINE\") < 0.8;\n\n PRINT T;\n}\n\n#compile and install the query as a stored procedure\ninstall query q5a\n\n#run the query\nrun query q5a()\n```\nThe result is shown in [q5a.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q5a.out) under `/home/tigergraph/tutorial/vector/q5a.out` \n[Go back to top](#top)\n\n## Vector Search Driven Pattern Match\nDo vector search first, the result drive the next pattern match. \n\nLocate [13_q6.gsql](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/13_q6.gsql) under `/home/tigergraph/tutorial/vector` or copy it to your container.\nNext, run the following in your container's bash command line.\n```\ngsql /home/tigergraph/tutorial/vector/13_q6.gsql\n```\n\n```python\n#enter the graph\nUSE GRAPH financialGraph\n\nCREATE OR REPLACE QUERY q6 (LIST query_vector) SYNTAX v3 {\n //find top-3 vectors from Account.emb1 that are closest to query_vector\n R = vectorSearch({Account.emb1}, query_vector, 3);\n\n PRINT R;\n\n //query composition via vector search result R\n V = SELECT b\n FROM (a:R)-[e:transfer]->()-[e2:transfer]->(b:Account);\n\n print V ;\n}\n\n#compile and install the query as a stored procedure\ninstall query q6\n\n#run the query\nrun query q6([-0.017733968794345856, -0.01019224338233471, -0.016571875661611557])\n```\nThe result is shown in [q6.out](https://raw.githubusercontent.com/tigergraph/ecosys/master/tutorials/vector/q6.out) under `/home/tigergraph/tutorial/vector/q6.out` \n[Go back to top](#top)\n# Essential Operations and Tools\n\n## Global and Local Schema Change\n\n### Global Vertex and Edge\nGlobal vertex/edge is the vertex/edge type created in global scope and shared with multiple graphs, which can only be modified from the global scope.\n\n#### Add a Vector To Global Vertex\n\n```python\n# enter global\nUSE GLOBAL\n\n# create a global schema change job to modify the global vertex\nCREATE GLOBAL SCHEMA_CHANGE JOB add_emb2 {\n ALTER VERTEX Account ADD VECTOR ATTRIBUTE emb2(DIMENSION=3, METRIC=\"L2\");\n}\n\n# run the global schema_change job\nrun global schema_change job add_emb2\n```\n\n#### Remove a Vector From Global Vertex\n\n```python\n# enter global\nUSE GLOBAL\n\n# create a global schema change job to modify the global vertex\nCREATE GLOBAL SCHEMA_CHANGE JOB drop_emb2 {\n ALTER VERTEX Account DROP VECTOR ATTRIBUTE emb2;\n}\n\n# run the global schema_change job\nrun global schema_change job drop_emb2\n```\n\n### Local Graph and Local Vertex\nLocal graph contains its own vertex and edge types as well as data, which is invisible from other local graphs.\n\n#### Create a Local Graph\n```python\n# enter global\nUSE GLOBAL\n\n# create an empty local graph\nCREATE GRAPH localGraph()\n```\n\n#### Create Local Vertex and Edge\n```python\n#enter local graph\nUSE GRAPH localGraph\n\n# create a local schema change job to create local vertex with or without vector\nCREATE SCHEMA_CHANGE JOB add_local_vertex FOR GRAPH localGraph {\n ADD VERTEX Account (name STRING PRIMARY KEY, isBlocked BOOL);\n ADD VERTEX Phone (number STRING PRIMARY KEY, isBlocked BOOL);\n ADD DIRECTED EDGE transfer (FROM Account, TO Account, DISCRIMINATOR(date DATETIME), amount UINT) WITH REVERSE_EDGE=\"transfer_reverse\";\n}\nrun schema_change job add_local_vertex\n```\n\n#### Add a Vector To Local Vertex\n```python\n#enter local graph\nUSE GRAPH localGraph\n\n# create a local schema change job to modify the local vertex\nCREATE SCHEMA_CHANGE JOB add_local_emb1 FOR GRAPH localGraph {\n ALTER VERTEX Account ADD VECTOR ATTRIBUTE emb1(DIMENSION=3, METRIC=\"COSINE\");\n ALTER VERTEX Account ADD VECTOR ATTRIBUTE emb2(DIMENSION=10, METRIC=\"L2\");\n ALTER VERTEX Phone ADD VECTOR ATTRIBUTE emb1(DIMENSION=3);\n}\n\n# run the local schema_change job\nrun schema_change job add_local_emb1\n```\n\n#### Remove a Vector From Local Vertex\n\n```python\n#enter local graph\nUSE GRAPH localGraph\n\n# create a local schema change job to modify the global vertex\nCREATE SCHEMA_CHANGE JOB drop_local_emb1 FOR GRAPH localGraph {\n ALTER VERTEX Account DROP VECTOR ATTRIBUTE emb1;\n}\n\n# run the local schema_change job\nrun schema_change job drop_local_emb1\n```\n\n#### Remove Local Vertex and Edge\n```python\n#enter local graph\nUSE GRAPH localGraph\n\n# create a local schema change job to drop local vertex with or without vector\nCREATE SCHEMA_CHANGE JOB drop_local_vertex FOR GRAPH localGraph {\n DROP VERTEX Account, Phone;\n DROP EDGE transfer;\n}\nRUN SCHEMA_CHANGE JOB drop_local_vertex\n```\n\n#### Remove a Local Graph\nDropping a local graph will also drop all of its vertex, edge and data.\n```python\n# enter global\nUSE GLOBAL\n\n# drop the whole local graph\nDROP GRAPH localGraph CASCADE\n```\n\nFor more details, please visit [https://docs.tigergraph.com/gsql-ref/4.1/ddl-and-loading/](https://docs.tigergraph.com/gsql-ref/4.1/ddl-and-loading/).\n\n## Vector Data Loading\n\n### File Loading\n#### Identify Data Format\nIt is crucial to find the proper data format for embedding loading, mainly to identify the possible values of the primary key, text or binary contents, and the embedding values, in order to define appropriate headers, separator and end-of-line character to have the data parsed by the loading job correctly.\n* Field Separator - If the content contains comma, it's recommended to use `|` instead.\n* Newline Character - If the content contains newline character, it's recommended to escape it or define another end-of-line character.\n* Header line - Headers can make the fields human-friendly, otherwise the fields will be referrd according to their positions.\n\nBelow is a typical data format for embedding values:\n```python\nid|name|isBlocked|embedding\n1|Scott|n|-0.017733968794345856, -0.01019224338233471, -0.016571875661611557\n```\n\n#### Create Loading Job\n```python\n# enter graph\nUSE GRAPH financialGraph\n\n#create a loading job for the vetex and edge\nCREATE LOADING JOB load_local_file FOR GRAPH financialGraph {\n // define the location of the source files; each file path is assigned a filename variable. \n DEFINE FILENAME file1=\"/home/tigergraph/data/account_emb.csv\";\n\n //define the mapping from the source file to the target graph element type. The mapping is specified by VALUES clause. \n LOAD file1 TO VERTEX Account VALUES ($\"name\", gsql_to_bool(gsql_trim($\"isBlocked\"))) USING header=\"true\", separator=\",\";\n LOAD file1 TO VECTOR ATTRIBUTE emb1 ON VERTEX Account VALUES ($1, SPLIT($3, \",\")) USING SEPARATOR=\"|\", HEADER=\"true\";\n}\n```\n\n#### Run Loading Job Locally\nIf the source file location has been defined in the loading job directly, use the following command:\n```python\nUSE GRAPH financialGraph\nrun loading job load_local_file\n```\n\nIt can also provide a file path in the command to override the file path defined inside the loading job:\n```python\nUSE GRAPH financialGraph\nrun loading job load_local_file using file1=\"/home/tigergraph/data/account_emb_no_header.csv\", header=\"false\"\n```\n\n#### Run Loading Job Remotely\nTigerGraph also supports run a loading job remotely via DDL endpoint `POST /restpp/ddl/{graph_name}?tag={loading_job_name}&filename={file_variable_name}`.\n\nFor example:\n```python\ncurl -X POST --data-binary @./account_emb.csv \"http://localhost:14240/restpp/ddl/financialGraph?tag=load_local_file&filename=file1&sep=|\"\n```\n\n### RESTPP Loading\nYou can follow the official documentation on RESTPP loading https://docs.tigergraph.com/tigergraph-server/4.1/api/upsert-rest. \nBelow is a simple example. \n```python\ncurl -X POST \"http://localhost:14240/restpp/graph/financialGraph\" -d '\n{\n \"vertices\": {\n \"Account\": {\n \"Scott\": {\n \"name\": {\n \"value\": \"Curry\"\n },\n \"isBlocked\": {\n \"value\": false\n },\n \"emb1\": {\n \"value\": [-0.017733968794345856, -0.01019224338233471, -0.016571875661611557]\n }\n }\n }\n }\n}\n'\n```\n\n### Other Data Source\nTigerGraph supports various ways to load data, including loading from cloud storage and parquet file format. \n\nPlease refer to [https://docs.tigergraph.com/tigergraph-server/4.1/data-loading/](https://docs.tigergraph.com/tigergraph-server/4.1/data-loading/) for more details.\n\n## Python Integration\nTigerGraph's Python integration is done via pyTigerGraph mainly using the following functions:\n\n|Function\t|Description\n|-------|--------\n|`TigerGraphConnection()`\t|Construct a connection to TigerGraph database\n|`gsql()`\t|Run gsql command same as in a gsql console\n|`runLoadingJobWithFile()`\t|Load data to TigerGraph database using a text file as Data Source\n|`runLoadingJobWithDataFrame()`\t|Load data to TigerGraph database using a pandas.DataFrame as Data Source\n|`runLoadingJobWithData()`\t|Load data to TigerGraph database using a string variable as Data Source\n|`runInstalledQuery()`\t|Run an installed query via RESTPP endpoint\n\nFor more details, please refer to the [pyTigerGraph Doc](https://docs.tigergraph.com/pytigergraph/1.8/intro/).\n\n### Manage TigerGraph Connections\nBelow example connects to a TigerGraph server with host as localhost and port as 14240 and disconnects from it.\n\n#### Connect to a TigerGraph server\nConstruct a TigerGraph connection. \n\n```python\n# Establish a connection to the TigerGraph database\nimport pyTigerGraph as tg\nconn = tg.TigerGraphConnection(\n host=\"http://127.0.0.1\",\n restppPort=\"14240\",\n graphname=\"financialGraph\",\n username=\"tigergraph\",\n password=\"tigergraph\"\n)\n```\n\n#### Parameter\n|Parameter\t|Description\n|-------|--------\n|`host`\t|IP address of the TigerGraph server.\n|`restppPort`\t|REST port of the TigerGraph server.\n|`graphname`\t|Graph name to be used for the schema.\n|`username` |User name to connect to the TigerGraph server.\n|`password` |Password to connect to the TigerGraph server.\n\n#### Return\nA TigerGraph connection created by the passed parameters.\n\n#### Raises\n* **TigerGraphException**: In case on invalid URL scheme.\n\n### Create Schema\nSchema creation in Python needs to be done by running a gsql command via the pyTigerGraph.gsql() function.\n\n```python\n# Create a vector with 3 dimension in TigerGraph database\n# Ensure to connect to TigerGraph server before any operations.\nresult = conn.gsql(\"\"\"\n USE GLOBAL\n CREATE VERTEX Account(\n name STRING PRIMARY KEY, \n isBlocked BOOL\n )\n CREATE GLOBAL SCHEMA_CHANGE JOB fin_add_vector {\n ALTER VERTEX Account ADD VECTOR ATTRIBUTE emb1(dimension=3);\n }\n RUN GLOBAL SCHEMA_CHANGE JOB fin_add_vector\n CREATE GRAPH financialGraph(*)\n\"\"\")\nprint(result)\n```\n\n### Load Data\nOnce a schema is created in TigerGraph database, a corresponding Loading Job needs to be created in order to define the data format and mapping to the schema. Given that the embedding data is usually separated by comma, it is recommended to use `|` as the separator for both of the data file and loading job. For example:\n```\n1|Scott|n|-0.017733968794345856, -0.01019224338233471, -0.016571875661611557\n```\n\n#### Create Loading Job\n\n```python\n# Create a loading job for the vector schema in TigerGraph database\n# Ensure to connect to TigerGraph server before any operations.\nresult = conn.gsql(\"\"\"\n CREATE LOADING JOB load_emb {\n DEFINE FILENAME file1;\n LOAD file1 TO VERTEX Account VALUES ($1, $2) USING SEPARATOR=\"|\";\n LOAD file1 TO VECTOR ATTRIBUTE emb1 ON VERTEX Account VALUES ($1, SPLIT($3, \",\")) USING SEPARATOR=\"|\", HEADER=\"false\";\n }\n\"\"\")\nprint(result)\n```\n\nIn case the vector data contains square brackets, the loading job should be revised to handle the extra brackets accordingly.\n\nData:\n```python\n1|Scott|n|[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557]\n```\n\nLoading job:\n```python\nLOAD file1 TO VECTR ATTRIBUTE emb1 ON VERTEX Account VALUES ($1, SPLIT(gsql_replace(gsql_replace($2,\"[\",\"\"),\"]\",\"\"),\",\")) USING SEPARATOR=\"|\";\n```\n\nFor more details about loading jobs, please refer to [https://docs.tigergraph.com/gsql-ref/4.1/ddl-and-loading/loading-jobs/](https://docs.tigergraph.com/gsql-ref/4.1/ddl-and-loading/loading-jobs/).\n\n#### Load From DataFrame\n```python\n# Generate and load data from pandas.DataFrame\n# Ensure to connect to TigerGraph server before any operations.\nimport pandas as pd\n\nembeddings = OpenAIEmbeddings()\n\ntext_data = {\n \"sentences\": [\n \"Scott\",\n \"Jenny\"\n ]\n}\n\ndf = pd.DataFrame(text_data)\ndf['embedding'] = df['sentences'].apply(lambda t: embeddings.embed_query(t))\ndf['embedding'] = df['embedding'].apply(lambda x: \",\".join(str(y) for y in x))\ndf['sentences'] = df['sentences'].apply(lambda x: x.replace(\"\\n\", \"\\\\n\"))\n\ncols=[\"sentences\", \"embedding\"]\nresult = conn.runLoadingJobWithDataFrame(df, \"file1\", \"load_emb\", \"|\", columns=cols)\nprint(result)\n```\n\n#### Load From Data File\n```python\ndatafile = \"openai_embedding.csv\"\nresult = conn.runLoadingJobWithFile(datafile, \"file1\", \"load_emb\", \"|\")\nprint(result)\n```\n\n### Run a Query\n\nA query accessing vector data needs to be created and installed in order to be called from gsql console or via RESTPP endpoint.\n\n#### GSQL Console\n```python\n# Run a query to get the Top 3 vectors similar to the query vector\n# Ensure to connect to TigerGraph server before any operations.\nquery = \"Scott\"\nembeddings = OpenAIEmbeddings()\nquery_embedding = embeddings.embed_query(query)\n\nresult = conn.gsql(f\"\"\"\nrun query q1({query_embeddings})\n\"\"\")\nprint(result)\n```\n\n#### RESTPP endpoint\n```python\n# Run a RESTPP call to get the Top 3 vectors similar to the query vector\n# List needs to be specified in format of \n# Ensure to connect to TigerGraph server before any operations.\nquery = \"Scott\"\nembeddings = OpenAIEmbeddings()\nquery_embedding = embeddings.embed_query(query)\nresult = conn.runInstalledQuery(\n \"q1\",\n \"query_vector=\"+\"&query_vector=\".join(str(y) for y in query_embedding),\n timeout=864000\n)\nprint(result)\n```\n\n[Go back to top](#top)\n\n# Vector Update\n\n## Delayed Update Visibility\n\nVector attributes are fully editable, allowing users to create, read, update, and delete them like any other vertex attribute. However, since they are indexed using HNSW for fast ANN search, updates may not be immediately visible until the index is rebuilt. To track the rebuild status, we provide a REST endpoint for real-time status check.\n\n```python\n/vector/status/{graph_name}/{vertex_type}/{vector_name}\n```\n\n**Example**\n\nCheck a vector attribute of vertex type `v1`.\n```python\ncurl -X GET \"http://localhost:14240/restpp/vector/status/g1/v1/embAttr1\"\n\n#sample output\n{\"version\":{\"edition\":\"enterprise\",\"api\":\"v2\",\"schema\":0},\"error\":false,\"message\":\"fetched status success\",\"results\":{\"NeedRebuildServers\":[\"GPE_1#1\"]},\"code\":\"REST-0000\"}\n```\n\nAlso, we can check by vertex type. \n\n```python\ncurl -X GET \"http://localhost:14240/restpp/vector/status/g1/v1\"\n\n#sample output\n{\"version\":{\"edition\":\"enterprise\",\"api\":\"v2\",\"schema\":0},\"error\":false,\"message\":\"fetched status success\",\"results\":{\"NeedRebuildServers\":[\"GPE_1#1\"]},\"code\":\"REST-0000\"}\n```\n\nWe can add `verbose` flag, which will show all needed rebuild vector instances.\n\n```python\ncurl -X GET \"http://localhost:14240/restpp/vector/status/g1/v1/embAttr1?verbose=true\"\n\n#sample output\n{\"version\":{\"edition\":\"enterprise\",\"api\":\"v2\",\"schema\":0},\"error\":false,\"message\":\"fetched status success\",\"results\":{\"NeedRebuildInstances({SEGID}_{VECTOR_ID})\":{\"GPE_1#1\":[1_1]}},\"code\":\"REST-0000\"}\n```\n\n## Frequent Update Consume More Memory and Disk\n\nBased on our experiments, a large volume of updates may lead to high memory and disk consumption. We observed that the index file size closely matches memory usage, while the main file size grows as expected, accumulating all update records.\n\n# Support\nIf you like the tutorial and want to explore more, join the GSQL developer community at\n\n[https://community.tigergraph.com/](https://community.tigergraph.com/)\n\n[Go back to top](#top)\n\n# Reference\n[TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs](https://arxiv.org/abs/2501.11216), to appear in [SIGMOD 2025 proceedings](https://2025.sigmod.org/).\n\nFor GSQL quick start, please refer to [GSQL Tutorial](https://github.com/tigergraph/ecosys/blob/master/tutorials/README.md)\n\n[Go back to top](#top)\n\n# Contact\nTo contact us for commercial support and purchase, please email us at [info@tigergraph.com](mailto:info@tigergraph.com)\n\n[Go back to top](#top)\n\n"}