--- name: btdp-it-masterdata-retrieval description: |- Expert in retrieving IT masterdata and data lineage from L'Oréal's BTDP infrastructure. **Use this skill when user asks to search, find, or query for any BTDP/L'Oréal resource including: GCP projects, BigQuery datasets, BigQuery tables, Google Groups, applications, repositories, domains, IT organizations, people/users, GCP services, GCP SKUs, or APIs.** Also use for data lineage queries (upstream/downstream dependencies, parents/children). **Trigger keywords:** "search for project", "find the project", "find dataset", "search table", "what is the project", "what is the dataset", "what is the table", "what is the group", "what is the application", "find group", "search application", "show me project", "get project", "list projects", "lineage", "masterdata", "master data", "BTDP", "SDDS" **DO NOT use filesystem commands** (find, grep, ls) to search for BTDP resources. Always use this skill's RAG indices, MCP tools, or BigQuery SQL queries instead. --- # Data Retrieval Specialist Expert in retrieving IT masterdata from L'Oréal's Beauty Tech Data Platform (BTDP) using various retrieval methods including RAG indices, MCP tools, and SQL queries. ## Quick Decision Tree ``` What type of data? ├── **Lineage** → ./get_lineage.sh [--parents|--children] ├── People → RAG: oa_pass_identities_name_v1 ├── Code/Repos → RAG: smo_repository_v1 ├── Applications → RAG: application_service_name_v1 OR SQL on application_service_v1 ├── GCP Projects → RAG: smo_project_v1 or RAG: projects_v3 ├── GCP Services → RAG: gcp_services_v1 ├── GCP SKU → RAG: gcp_skus_v1 ├── Datasets → RAG: smo_dataset_v1 ├── Tables → RAG: smo_table_v1 ├── Domains → RAG: domains_v1 ├── IT Orgs → RAG: it_organizations_v3 ├── Groups (Google Cloud Identity) → RAG: groups_v1 OR SQL on groups_v1 └── APIs → Use loreal-api-search skill ``` If you have a master data, you are not sure the kind of master data it is (Applications, GCP Service, IT Organization, Domains, People), you should try in the same time the main data type: - Applications - GCP Services - IT Orgs - Domains - People You take the most relevant and clearly mention to the user your assumption. ## Core Retrieval Methods ### 0. Data Lineage (BTDP Data Health Check API) **When to use**: For table lineage - upstream dependencies (parents) or downstream impact (children) **Script**: `./get_lineage.sh [--parents|--children]` **Flags**: - `--parents` - Get upstream lineage (source tables) - `--children` - Get downstream lineage (dependent tables) - No flag (default) - Get both parents and children **Authentication**: Uses `gcloud auth print-access-token` automatically **Response**: JSON with nodes and connections graph from Neo4j **Examples**: ```bash # Get both upstream and downstream lineage (default) ./get_lineage.sh itg-btdppublished-gbl-ww-pd.btdp_ds_c1_0a1_gcpassets_eu_pd.tables_v2 # Get only upstream lineage (parents) ./get_lineage.sh itg-btdppublished-gbl-ww-pd.btdp_ds_c1_0a1_gcpassets_eu_pd.tables_v2 --parents # Get only downstream lineage (children) ./get_lineage.sh itg-btdppublished-gbl-ww-pd.btdp_ds_c1_0a1_gcpassets_eu_pd.tables_v2 --children ``` **Workflow**: ``` User asks: "What is the lineage for tables_v2" or "Show parents of tables_v2" 1. Run ./get_lineage.sh with appropriate flag 2. Parse JSON response (nodes + connections) 3. Present lineage graph to user ``` ### 1. RAG Index (Fastest, Semantic Search) **When to use**: For semantic search over indexed masterdata (groups, projects, domains, applications, etc.) **How to use**: Use the **rag-manager skill** for all RAG operations: - List available RAG indices: Use `rag-manager` skill - Search a RAG index: Use `rag-manager` skill - Create a new RAG index: Use `rag-manager` skill - Delete a RAG index: Use `rag-manager` skill **Available RAG Indices for Masterdata**: - `groups_v1` - Google Cloud Identity groups (12,808 records) - `projects_v3` - GCP projects (5,577 records) - `application_service_name_v1` - Application services (19,936 records) - `business_application_v1` - Business applications (18,137 records) - `domains_v1` - Data domains (19 records) - `it_organizations_v3` - IT organizations (15 records) - `oa_pass_identities_name_v1` - User identities (309,529 records) - `smo_dataset_v1` - SMO datasets (30 records) - `smo_project_v1` - SMO projects (12 records) - `smo_table_v1` - SMO tables (48 records) **Example workflow**: ``` User asks: "Find the group for exploradvanced" 1. Trigger rag-manager skill 2. Search groups_v1 index with query "exploradvanced" 3. Return results ``` ### 2. MCP Tools (API-based retrieval) **When to use**: For structured API-based data retrieval - **APIs**: Use the `loreal-api-search` skill to search L'Oréal API Portal - **Confluence Pages**: Use `mcp__atlassian__confluence_search` for Confluence content ### 3. SQL Queries (Direct BigQuery access) **When to use**: For complex filtering, aggregations, or when RAG index is not available **General SQL Pattern**: 1. **Find the table** using metadata table: ```bash bq --project_id oa-data-btdpexploration-np query --nouse_legacy_sql --max_rows 10 \ 'SELECT table_name AS table_ref FROM `itg-btdppublished-gbl-ww-pd.btdp_ds_c1_0a1_gcpassets_eu_pd.tables_v2` WHERE project_id = "itg-btdppublished-gbl-ww-pd" AND table_id = "{table_to_search}";' ``` 2. **Get the schema**: ```bash bq show --schema --format=prettyjson {table_ref_with_colon} ``` **Important:** When fetching the schema, replace the first dot with a colon. Example: `itg-btdppublished-gbl-ww-pd.btdp_ds_c1_0a1_gcpassets_eu_pd.tables_v2` becomes `itg-btdppublished-gbl-ww-pd:btdp_ds_c1_0a1_gcpassets_eu_pd.tables_v2` 3. **Execute the query**: ```bash bq --project_id oa-data-btdpexploration-np query --nouse_legacy_sql --max_rows 10 '{SQL_QUERY}'; ``` **Important:** Use `--max_rows` cleverly to ensure not too many results are loaded. For large results, use an intermediate file with stdout redirection. ## Environment Context ### Environments There are 4 environments in the platform: - **dv**: Dev/integration project. Developers have full permissions. - **qa**: Quality assurance/test project. For non-regression tests. Dev team has full access. - **np**: Pre-production/UAT/staging. Protected like production, with production data regularly loaded. - **pd**: Production environment. Protected with limited viewer access for dev team. ### SDDS Project The SDDS (Shared Domain Data Sets) project: `itg-btdppublished-gbl-ww-pd` The exploration project: `oa-data-btdpexploration-np` - Can be referred to as "btdp exploration", "btdp explo" - When user says "show me my ... exploration", search resources in this project ### SDDS Naming Convention - **SDDS Project**: `itg-btdppublished-gbl-ww-pd` - **Dataset Naming**: `btdp_ds_c[123]__