= fetchData();
```
## Tracking Usages and Dependencies
Like other symbols, Assignments support [usages](/api-reference/core/Assignment#usages) and [dependencies](/api-reference/core/Assignment#dependencies).
```python
assignment = file.code_block.get_local_var_assignment("userId")
# Get all usages of the assignment
usages = assignment.usages
# Get all dependencies of the assignment
dependencies = assignment.dependencies
```
See [Dependencies and Usages](/building-with-codegen/dependencies-and-usages)
for more details.
---
title: "Local Variables"
sidebarTitle: "Local Variables"
icon: "cube"
iconType: "solid"
---
This document explains how to work with local variables in Codegen.
## Overview
Through the [CodeBlock](../api-reference/core/CodeBlock) class, Codegen exposes APIs for analyzing and manipulating local variables within code blocks.
- [local_var_assignments](../api-reference/core/CodeBlock#local-var-assignments): find all [Assignments](../api-reference/core/Assignment) in this scope
- [get_local_var_assignment(...)](../api-reference/core/CodeBlock#get-local-var-assignment): get specific [Assignments](../api-reference/core/Assignment) by name
- [rename_local_variable(...)](../api-reference/core/CodeBlock#rename-local-variable): rename variables safely across the current scope
## Basic Usage
Every code block (function body, loop body, etc.) provides access to its local variables:
```python
# Get all local variables in a function
function = codebase.get_function("process_data")
local_vars = function.code_block.local_var_assignments
for var in local_vars:
print(var.name)
# Find a specific variable
config_var = function.code_block.get_local_var_assignment("config")
config_var.rename("settings") # Updates all references safely
# Rename a variable used in this scope (but not necessarily declared here)
function.rename_local_variable("foo", "bar")
```
## Fuzzy Matching
Codegen supports fuzzy matching when searching for local variables. This allows you to find variables whose names contain a substring, rather than requiring exact matches:
```python
# Get all local variables containing "config"
function = codebase.get_function("process_data")
# Exact match - only finds variables named exactly "config"
exact_matches = function.code_block.get_local_var_assignments("config")
# Returns: config = {...}
# Fuzzy match - finds any variable containing "config"
fuzzy_matches = function.code_block.get_local_var_assignments("config", fuzzy_match=True)
# Returns: config = {...}, app_config = {...}, config_settings = {...}
# Fuzzy matching also works for variable usages
usages = function.code_block.get_variable_usages("config", fuzzy_match=True)
# And for renaming variables
function.code_block.rename_variable_usages("config", "settings", fuzzy_match=True)
# Renames: config -> settings, app_config -> app_settings, config_settings -> settings_settings
```
Be careful with fuzzy matching when renaming variables, as it will replace the
matched substring in all variable names. This might lead to unintended renames
like `config_settings` becoming `settings_settings`.
---
title: "Comments and Docstrings"
sidebarTitle: "Comments & Docstrings"
icon: "comment"
iconType: "solid"
---
Codegen enables reading, modifying, and manipulating comments and docstrings while preserving proper formatting.
This guide describes proper usage of the following classes:
- [Comment](/api-reference/core/Comment) - Represents a single comment.
- [CommentGroup](/api-reference/core/CommentGroup) - Represents a group of comments.
## Accessing with Comments
Comments can be accessed through any symbol or directly from code blocks. Each comment is represented by a `Comment` object that provides access to both the raw source and parsed text:
```python
# Find all comments in a file
file = codebase.get_file("my_file.py")
for comment in file.code_block.comments:
print(comment.text)
# Access comments associated with a symbol
symbol = file.get_symbol("my_function")
if symbol.comment:
print(symbol.comment.text) # Comment text without delimiters
print(symbol.comment.source) # Full comment including delimiters
# Access inline comments
if symbol.inline_comment:
print(symbol.inline_comment.text)
# Accessing all comments in a function
for comment in symbol.code_block.comments:
print(comment.text)
```
### Editing Comments
Comments can be modified using the `edit_text()` method, which handles formatting and delimiters automatically:
```python
# Edit a regular comment
symbol.comment.edit_text("Updated comment text")
# Edit an inline comment
symbol.set_inline_comment("New inline comment")
```
### Comment Groups
Multiple consecutive comments are automatically grouped into a `CommentGroup`, which can be edited as a single unit:
```python
# Original comments:
# First line
# Second line
# Third line
comment_group = symbol.comment
print(comment_group.text) # "First line\nSecond line\nThird line"
# Edit the entire group at once
comment_group.edit_text("New first line\nNew second line")
```
## Working with Docstrings
Docstrings are special comments that document functions, classes, and modules. Codegen provides similar APIs for working with docstrings:
```python
function = file.get_symbol("my_function")
if function.docstring:
print(function.docstring.text) # Docstring content
print(function.docstring.source) # Full docstring with delimiters
```
### Adding Docstrings
You can add docstrings to any symbol that supports them:
```python
# Add a single-line docstring
function.set_docstring("A brief description")
# Add a multi-line docstring
function.set_docstring("""
A longer description that
spans multiple lines.
Args:
param1: Description of first parameter
""")
```
### Language-Specific Formatting
Codegen automatically handles language-specific docstring formatting:
```python
# Python: Uses triple quotes
def my_function():
"""Docstring is formatted with triple quotes."""
pass
```
```typescript
// TypeScript: Uses JSDoc style
function myFunction() {
/** Docstring is formatted as JSDoc */
}
```
### Editing Docstrings
Like comments, docstrings can be modified while preserving formatting:
```python
# Edit a docstring
function.docstring.edit_text("Updated documentation")
# Edit a multi-line docstring
function.docstring.edit_text("""
Updated multi-line documentation
that preserves indentation and formatting.
""")
```
## Comment Operations
Codegen provides utilities for working with comments at scale. For example, you can update or remove specific types of comments across your codebase:
```python
# Example: Remove eslint disable comments for a specific rule
for file in codebase.files:
for comment in file.code_block.comments:
if "eslint-disable" in comment.source:
# Check if comment disables specific rule
if "@typescript-eslint/no-explicit-any" in comment.text:
comment.remove()
```
When editing multi-line comments or docstrings, Codegen automatically handles
indentation and maintains the existing comment style.
## Special APIs and AI Integration
### Google Style Docstrings
Codegen supports Google-style docstrings and can handle their specific formatting, using the [CommentGroup.to_google_docstring(...)](/api-reference/core/CommentGroup#to-google-docstring) method.
```python
# Edit while preserving Google style
symbol_a = file.get_symbol("SymbolA")
func_b = symbol_a.get_method("funcB")
func_b.docstring.to_google_docstring(func_b)
```
### Using AI for Documentation
Codegen integrates with LLMs to help generate and improve documentation. You can use the [Codebase.ai(...)](/api-reference/core/Codebase#ai) method to:
- Generate comprehensive docstrings
- Update existing documentation
- Convert between documentation styles
- Add parameter descriptions
```python
# Generate a docstring using AI
function = codebase.get_function("my_function")
new_docstring = codebase.ai(
"Generate a comprehensive docstring in Google style",
target=function
context={
# provide additional context to the LLM
'usages': function.usages,
'dependencies': function.dependencies
}
)
function.set_docstring(new_docstring)
```
Learn more about AI documentation capabilities in our [Documentation
Guide](/tutorials/creating-documentation) and [LLM Integration
Guide](/building-with-codegen/calling-out-to-llms).
### Documentation Coverage
You can analyze and improve documentation coverage across your codebase:
```python
# Count documented vs undocumented functions
total = 0
documented = 0
for function in codebase.functions:
total += 1
if function.docstring:
documented += 1
coverage = (documented / total * 100) if total > 0 else 0
print(f"Documentation coverage: {coverage:.1f}%")
```
Check out the [Documentation Guide](/tutorials/creating-documentation) for
more advanced coverage analysis and bulk documentation generation.
---
title: "External Modules"
sidebarTitle: "External Modules"
icon: "box-archive"
iconType: "solid"
---
Codegen provides a way to handle imports from external packages and modules through the [ExternalModule](/api-reference/core/ExternalModule) class.
```python
# Python examples
import datetime
from requests import get
# TypeScript/JavaScript examples
import React from 'react'
import { useState, useEffect } from 'react'
import type { ReactNode } from 'react'
import axios from 'axios'
```
## What are External Modules?
When writing code, you often import from packages that aren't part of your project - like `datetime` and `requests` in Python, or `react` and `axios` in TypeScript. In Codegen, these are represented as [ExternalModule](/api-reference/core/ExternalModule) instances.
```python
for imp in codebase.imports:
if isinstance(imp.symbol, ExternalModule):
print(f"Importing from external package: {imp.resolved_symbol.source}")
```
External modules are read-only - you can analyze them but can't modify their
implementation. This makes sense since they live in your project's
dependencies!
## Working with External Modules
The most common use case is handling external modules differently from your project's code:
### Identifying Function Calls as External Modules
For [FunctionCall](/api-reference/core/FunctionCall) instances, you can check if the function definition is an [ExternalModule](/api-reference/core/ExternalModule) via the [FunctionCall.function_definition](/api-reference/core/FunctionCall#function-definition) property:
```python
for fcall in file.function_calls:
definition = fcall.function_definition
if isinstance(definition, ExternalModule):
# Skip external functions
print(f'External function: {definition.name}')
else:
# Process local functions...
print(f'Local function: {definition.name}')
```
### Import Resolution
Similarly, when working with imports, you can determine if they resolve to external modules by checking the [Import.resolved_symbol](/api-reference/core/Import#resolved-symbol) property:
```python
for imp in file.imports:
resolved = imp.resolved_symbol
if isinstance(resolved, ExternalModule):
print(f"Import from external package: from {imp.module} import {imp.name}")
```
Use `isinstance(symbol, ExternalModule)` to reliably identify external
modules. This works better than checking names or paths since it handles all
edge cases.
## Properties and Methods
External modules provide several useful properties:
```python
# Get the module name
module_name = external_module.name # e.g. "datetime" or "useState"
# Check if it's from node_modules (TypeScript/JavaScript)
if external_module.filepath == "":
print("This is an external package from node_modules")
```
## Common Patterns
Here are some typical ways you might work with external modules:
### Skip External Processing:
When modifying function calls or imports, skip external modules since they can't be changed:
```python
# Example from a codemod that adds type hints
def add_type_hints(function):
if isinstance(function.definition, ExternalModule):
return # Can't add type hints to external modules like React.FC
# Add type hints to local functions...
```
### Analyze Dependencies
Track which external packages your code uses:
```python
# Find all external package dependencies
external_deps = set()
for imp in codebase.imports:
if isinstance(imp.resolved_symbol, ExternalModule):
external_deps.add(imp.resolved_symbol.source)
# Will find things like 'react', 'lodash', 'datetime', etc.
```
When working with imports, always handle external modules as a special case.
This ensures your codemods work correctly with both local and external code.
---
title: "Working with Type Annotations"
sidebarTitle: "Type Annotations"
icon: "code"
iconType: "solid"
---
This guide covers the core APIs and patterns for working with type annotations in Codegen.
## Type Resolution
Codegen builds a complete dependency graph of your codebase, connecting functions, classes, imports, and their relationships. This enables powerful type resolution capabilities:
```python
from codegen import Codebase
# Initialize codebase with dependency graph
codebase = Codebase("./")
# Get a function with a type annotation
function = codebase.get_file("path/to/file.py").get_function("my_func")
# Resolve its return type to actual symbols
return_type = function.return_type
resolved_symbols = return_type.resolved_types # Returns the actual Symbol objects
# For generic types, you can resolve parameters
if hasattr(return_type, "parameters"):
for param in return_type.parameters:
resolved_param = param.resolved_types # Get the actual type parameter symbols
# For assignments, resolve their type
assignment = codebase.get_file("path/to/file.py").get_assignment("my_var")
resolved_type = assignment.type.resolved_types
```
Type resolution follows imports and handles complex cases like type aliases, forward references, and generic type parameters.
## Core Interfaces
Type annotations in Codegen are built on two key interfaces:
- [Typeable](/api-reference/core/Typeable) - The base interface for any node that can have a type annotation (parameters, variables, functions, etc). Provides `.type` and `.is_typed`.
- [Type](/api-reference/core/Type) - The base class for all type annotations. Provides type resolution and dependency tracking.
Any node that inherits from `Typeable` will have a `.type` property that returns a `Type` object, which can be used to inspect and modify type annotations.
Learn more about [inheritable behaviors](/building-with-codegen/inheritable-behaviors) like Typeable here
## Core Type APIs
Type annotations can be accessed and modified through several key APIs:
### Function Types
The main APIs for function types are [Function.return_type](/api-reference/python/PyFunction#return-type) and [Function.set_return_type](/api-reference/python/PyFunction#set-return-type):
```python
# Get return type
return_type = function.return_type # -> TypeAnnotation
print(return_type.source) # "List[str]"
print(return_type.is_typed) # True/False
# Set return type
function.set_return_type("List[str]")
function.set_return_type(None) # Removes type annotation
```
### Parameter Types
Parameters use [Parameter.type](/api-reference/core/Parameter#type) and [Parameter.set_type_annotation](/api-reference/core/Parameter#set-type-annotation):
```python
for param in function.parameters:
# Get parameter type
param_type = param.type # -> TypeAnnotation
print(param_type.source) # "int"
print(param_type.is_typed) # True/False
# Set parameter type
param.set_type("int")
param.set_type(None) # Removes type annotation
```
### Variable Types
Variables and attributes use [Assignment.type](/api-reference/core/Assignment#type) and [Assignment.set_type_annotation](/api-reference/core/Assignment#set-type-annotation). This applies to:
- Global variables
- Local variables
- Class attributes (via [Class.attributes](/api-reference/core/Class#attributes))
```python
# For global/local assignments
assignment = file.get_assignment("my_var")
var_type = assignment.type # -> TypeAnnotation
print(var_type.source) # "str"
# Set variable type
assignment.set_type("str")
assignment.set_type(None) # Removes type annotation
# For class attributes
class_def = file.get_class("MyClass")
for attr in class_def.attributes:
# Each attribute has an assignment property
attr_type = attr.assignment.type # -> TypeAnnotation
print(f"{attr.name}: {attr_type.source}") # e.g. "x: int"
# Set attribute type
attr.assignment.set_type("int")
# You can also access attributes directly by index
first_attr = class_def.attributes[0]
first_attr.assignment.set_type("str")
```
## Working with Complex Types
### Union Types
Union types ([UnionType](/api-reference/core/UnionType)) can be manipulated as collections:
```python
# Get union type
union_type = function.return_type # -> A | B
print(union_type.symbols) # ["A", "B"]
# Add/remove options
union_type.append("float")
union_type.remove("None")
# Check contents
if "str" in union_type.options:
print("String is a possible type")
```
Learn more about [working with collections here](/building-with-codegen/collections)
### Generic Types
Generic types ([GenericType](/api-reference/core/GenericType)) expose their parameters as collection of [Parameters](/api-reference/core/Parameter):
```python
# Get generic type
generic_type = function.return_type # -> GenericType
print(generic_type.base) # "List"
print(generic_type.parameters) # ["str"]
# Modify parameters
generic_type.parameters.append("int")
generic_type.parameters[0] = "float"
# Create new generic
function.set_return_type("List[str]")
```
Learn more about [working with collections here](/building-with-codegen/collections)
### Type Resolution
Type resolution uses [Type.resolved_value](/api-reference/core/Type#resolved-value) to get the actual symbols that a type refers to:
```python
# Get the actual symbols for a type
type_annotation = function.return_type # -> Type
resolved_types = type_annotation.resolved_value # Returns an Expression, likely a Symbol or collection of Symbols
# For generic types, resolve each parameter
if hasattr(type_annotation, "parameters"):
for param in type_annotation.parameters:
param_types = param.resolved_value # Get symbols for each parameter
# For union types, resolve each option
if hasattr(type_annotation, "options"):
for option in type_annotation.options:
option_types = option.resolved_value # Get symbols for each union option
```
---
title: "Moving Symbols"
sidebarTitle: "Moving Symbols"
icon: "arrows-up-down-left-right"
iconType: "solid"
---
Codegen provides fast, configurable and safe APIs for moving symbols (functions, classes, variables) between files while automatically handling imports and dependencies.
The key API is [Symbol.move_to_file(...)](/api-reference/core/Symbol#move-to-file).
## Basic Symbol Movement
Simply call [Symbol.move_to_file(...)](/api-reference/core/Symbol#move-to-file) to move a symbol to a new file.
```python
# Manipulation code:
file1 = codebase.get_file("file1.py")
file2 = codebase.get_file("file2.py")
helper_func = file1.get_symbol("helper")
# Ensure the destination file exists
if not file2.exists():
file2 = codebase.create_file('file2.py')
# Move the symbol
helper_func.move_to_file(file2)
```
By default, this will move any dependencies, including imports, to the new
file.
## Moving Strategies
The [Symbol.move_to_file(...)](/api-reference/core/Symbol#move-to-file) method accepts a `strategy` parameter, which can be used to control how imports are updated.
Your options are:
- `"update_all_imports"`: Updates all import statements across the codebase (default)
- `"add_back_edge"`: Adds import and re-export in the original file
`"add_back_edge"` is useful when moving a symbol that is depended on by other symbols in the original file, and will result in smaller diffs.
`"add_back_edge"` will result in circular dependencies if the symbol has
non-import dependencies in it's original file.
## Moving Symbols in Bulk
Make sure to call [Codebase.commit(...)](/api-reference/core/Codebase#commit) _after_ moving symbols in bulk for performant symbol movement.
```python
# Move all functions with a specific prefix
for file in codebase.files:
for function in file.functions:
if function.name.startswith("pylsp_"):
function.move_to_file(
shared_file,
include_dependencies=True,
strategy="update_all_imports"
)
# Commit the changes once, at the end
codebase.commit()
```
---
title: "Collections"
sidebarTitle: "Collections"
icon: "layer-group"
iconType: "solid"
---
Codegen enables traversing and manipulating collections through the [List](/api-reference/core/List) and [Dict](/api-reference/core/Dict) classes.
These APIs work consistently across Python and TypeScript while preserving formatting and structure.
## Core Concepts
The [List](/api-reference/core/List) and [Dict](/api-reference/core/Dict) classes provide a consistent interface for working with ordered sequences of elements. Key features include:
- Standard sequence operations (indexing, length, iteration)
- Automatic formatting preservation
- Safe modification operations
- Language-agnostic behavior
- Comment and whitespace preservation
Collections handle:
- Proper indentation
- Delimiters (commas, newlines)
- Multi-line formatting
- Leading/trailing whitespace
- Nested structures
## List Operations
Lists in both Python and TypeScript can be manipulated using the same APIs:
```python
# Basic operations
items_list = file.get_symbol("items").value # Get list value
first = items_list[0] # Access elements
length = len(items_list) # Get length
items_list[0] = "new" # Modify element
items_list.append("d") # Add to end
items_list.insert(1, "x") # Insert at position
del items_list[1] # Remove element
# Iteration
for item in items_list:
print(item.source)
# Bulk operations
items_list.clear() # Remove all elements
```
### Single vs Multi-line Lists
Collections automatically preserve formatting:
```python
# Source code:
items = [a, b, c]
config = [
"debug",
"verbose",
"trace",
]
# Manipulation code:
items_list = file.get_symbol("items").value
items_list.append("d") # Adds new element
config_list = file.get_symbol("config").value
config_list.append("info") # Adds with formatting
# Result:
items = [a, b, c, d]
config = [
"debug",
"verbose",
"trace",
"info",
]
```
## Dictionary Operations
Dictionaries provide a similar consistent interface:
```python
# Basic operations
settings = file.get_symbol("settings").value # Get dict value
value = settings["key"] # Get value
settings["key"] = "value" # Set value
del settings["key"] # Remove key
has_key = "key" in settings # Check existence
# Iteration
for key in settings:
print(f"{key}: {settings[key]}")
# Bulk operations
settings.clear() # Remove all entries
```
---
title: "Traversing the Call Graph"
sidebarTitle: "Call Graph"
icon: "sitemap"
iconType: "solid"
---
Codegen provides powerful capabilities for analyzing and visualizing function call relationships in your codebase. This guide will show you how to traverse the call graph and create visual representations of function call paths.
## Understanding Call Graph Traversal
At the heart of call graph traversal is the [.function_calls](/api-reference/core/Function#function-calls) property, which returns information about all function calls made within a function:
```python
def example_function():
result = helper_function()
process_data()
return result
# Get all calls made by example_function
successors = example_function.function_calls
for successor in successors:
print(f"Call: {successor.source}") # The actual function call
print(f"Called: {successor.function_definition.name}") # The function being called
```
## Building a Call Graph
Here's how to build a directed graph of function calls using NetworkX:
```python
import networkx as nx
from codegen.sdk.core.interfaces.callable import FunctionCallDefinition
from codegen.sdk.core.function import Function
from codegen.sdk.core.external_module import ExternalModule
def create_call_graph(start_func, end_func, max_depth=5):
G = nx.DiGraph()
def traverse_calls(parent_func, current_depth):
if current_depth > max_depth:
return
# Determine source node
if isinstance(parent_func, Function):
src_call = src_func = parent_func
else:
src_func = parent_func.function_definition
src_call = parent_func
# Skip external modules
if isinstance(src_func, ExternalModule):
return
# Traverse all function calls
for call in src_func.function_calls:
func = call.function_definition
# Skip recursive calls
if func.name == src_func.name:
continue
# Add nodes and edges
G.add_node(call)
G.add_edge(src_call, call)
# Check if we reached the target
if func == end_func:
G.add_edge(call, end_func)
return
# Continue traversal
traverse_calls(call, current_depth + 1)
# Initialize graph
G.add_node(start_func, color="blue") # Start node
G.add_node(end_func, color="red") # End node
# Start traversal
traverse_calls(start_func, 1)
return G
# Usage example
start = codebase.get_function("create_skill")
end = codebase.get_function("auto_define_skill_description")
graph = create_call_graph(start, end)
```
## Filtering and Visualization
You can filter the graph to show only relevant paths and visualize the results:
```python
# Find all paths between start and end
all_paths = nx.all_simple_paths(graph, source=start, target=end)
# Create subgraph of only the nodes in these paths
nodes_in_paths = set()
for path in all_paths:
nodes_in_paths.update(path)
filtered_graph = graph.subgraph(nodes_in_paths)
# Visualize the graph
codebase.visualize(filtered_graph)
```
## Advanced Usage
### Example: Finding Dead Code
You can use call graph analysis to find unused functions:
```python
def find_dead_code(codebase):
dead_functions = []
for function in codebase.functions:
if not any(function.function_calls):
# No other functions call this one
dead_functions.append(function)
return dead_functions
```
### Example: Analyzing Call Chains
Find the longest call chain in your codebase:
```python
def get_max_call_chain(function):
G = nx.DiGraph()
def build_graph(func, depth=0):
if depth > 10: # Prevent infinite recursion
return
for call in func.function_calls:
called_func = call.function_definition
G.add_edge(func, called_func)
build_graph(called_func, depth + 1)
build_graph(function)
return nx.dag_longest_path(G)
```
The `.function_calls` property is optimized for performance and uses Codegen's internal graph structure to quickly traverse relationships. It's much faster than parsing the code repeatedly.
When traversing call graphs, be mindful of:
- Recursive calls that could create infinite loops
- External module calls that might not be resolvable
- Dynamic/runtime function calls that can't be statically analyzed
---
title: "React and JSX"
sidebarTitle: "React and JSX"
icon: "react"
iconType: "brands"
---
GraphSitter exposes several React and JSX-specific APIs for working with modern React codebases.
Key APIs include:
- [Function.is_jsx](/api-reference/typescript/TSFunction#is-jsx) - Check if a function contains JSX elements
- [Class.jsx_elements](/api-reference/typescript/TSClass#jsx-elements) - Get all JSX elements in a class
- [Function.jsx_elements](/api-reference/typescript/TSFunction#jsx-elements) - Get all JSX elements in a function
- [JSXElement](/api-reference/typescript/JSXElement) - Manipulate JSX elements
- [JSXProp](/api-reference/typescript/JSXProp) - Manipulate JSX props
See [React Modernization](/tutorials/react-modernization) for tutorials and
applications of the concepts described here
## Detecting React Components with `is_jsx`
Codegen exposes a `is_jsx` property on both classes and functions, which can be used to check if a symbol is a React component.
```python
# Check if a function is a React component
function = file.get_function("MyComponent")
is_component = function.is_jsx # True for React components
# Check if a class is a React component
class_def = file.get_class("MyClassComponent")
is_component = class_def.is_jsx # True for React class components
```
## Working with JSX Elements
Given a React component, you can access its JSX elements using the [jsx_elements](/api-reference/typescript/TSFunction#jsx-elements) property.
You can manipulate these elements by using the [JSXElement](/api-reference/typescript/JSXElement) and [JSXProp](/api-reference/typescript/JSXProp) APIs.
```python
# Get all JSX elements in a component
for element in component.jsx_elements:
# Access element name
if element.name == "Button":
# Wrap element in a div
element.wrap("", "
")
# Get specific prop
specific_prop = element.get_prop("className")
# Iterate over all props
for prop in element.props:
if prop.name == "className":
# Set prop value
prop.set_value('"my-classname"')
# Modify element
element.set_name("NewComponent")
element.add_prop("newProp", "{value}")
# Get child JSX elements
child_elements = element.jsx_elements
# Wrap element in a JSX expression (preserves whitespace)
element.wrap("", "
")
```
## Common React Operations
See [React Modernization](/tutorials/react-modernization) for more
### Refactoring Components into Separate Files
Split React components into individual files:
```python
# Find (named) React components
react_components = [
func for func in codebase.functions
if func.is_jsx and func.name is not None
]
# Filter out those that are not the default export
non_default_components = [
comp for comp in react_components
if not comp.export or not comp.export.is_default_export()
]
# Move these non-default components to new files
for component in react_components:
if component != default_component:
# Create new file
new_file_path = '/'.join(component.filepath.split('/')[:-1]) + f"{component.name}.tsx"
new_file = codebase.create_file(new_file_path)
# Move component and update imports
component.move_to_file(new_file, strategy="add_back_edge")
```
See [Moving Symbols](/building-with-codegen/moving-symbols) for more details
on moving symbols between files.
### Updating Component Names and Props
Replace components throughout the codebase with prop updates:
```python
# Find target component
new_component = codebase.get_symbol("NewComponent")
for function in codebase.functions:
if function.is_jsx:
# Update JSX elements
for element in function.jsx_elements:
if element.name == "OldComponent":
# Update name
element.set_name("NewComponent")
# Edit props
needs_clsx = not file.has_import("clsx")
for prop in element.props:
if prop.name == "className":
prop.set_value('clsx("new-classname")')
needs_clsx = True
elif prop.name == "onClick":
prop.set_name('handleClick')
# Add import if needed
if needs_clsx:
file.add_import_from_import_source("import clsx from 'clsx'")
# Add import if needed
if not file.has_import("NewComponent"):
file.add_import(new_component)
```
---
title: "Codebase Visualization"
sidebarTitle: "Visualization"
icon: "share-nodes"
iconType: "solid"
---
Codegen provides the ability to create interactive graph visualizations via the [codebase.visualize(...)](/api-reference/core/Codebase#visualize) method.
These visualizations have a number of applications, including:
- Understanding codebase structure
- Monitoring critical code paths
- Analyzing dependencies
- Understanding inheritance hierarchies
This guide provides a basic overview of graph creation and customization. Like the one below which displays the call_graph for the [modal/client.py](https://github.com/modal-labs/modal-client/blob/v0.72.49/modal/client.py) module.
Codegen visualizations are powered by [NetworkX](https://networkx.org/) and
rendered using [d3](https://d3js.org/what-is-d3).
## Basic Usage
The [Codebase.visualize](/api-reference/core/Codebase#visualize) method operates on a NetworkX [DiGraph](https://networkx.org/documentation/stable/reference/classes/graph.DiGraph.html).
```python
import networkx as nx
# Basic visualization
G = nx.grid_2d_graph(5, 5)
# Or start with an empty graph
# G = nx.DiGraph()
codebase.visualize(G)
```
It is up to the developer to add nodes and edges to the graph.
### Adding Nodes and Edges
When adding nodes to your graph, you can either add the symbol directly or just its name:
```python
import networkx as nx
G = nx.DiGraph()
function = codebase.get_function("my_function")
# Add the function object directly - enables source code preview
graph.add_node(function) # Will show function's source code on click
# Add just the name - no extra features
graph.add_node(function.name) # Will only show the name
```
Adding symbols to the graph directly (as opposed to adding by name) enables
automatic type information, code preview on hover, and more.
## Common Visualization Types
### Call Graphs
Visualize how functions call each other and trace execution paths:
```python
def create_call_graph(entry_point: Function):
graph = nx.DiGraph()
def add_calls(func):
for call in func.call_sites:
called_func = call.resolved_symbol
if called_func:
# Add function objects for rich previews
graph.add_node(func)
graph.add_node(called_func)
graph.add_edge(func, called_func)
add_calls(called_func)
add_calls(entry_point)
return graph
# Visualize API endpoint call graph
endpoint = codebase.get_function("handle_request")
call_graph = create_call_graph(endpoint)
codebase.visualize(call_graph, root=endpoint)
```
Learn more about [traversing the call graph
here](/building-with-codegen/traversing-the-call-graph).
### React Component Trees
Visualize the hierarchy of React components:
```python
def create_component_tree(root_component: Class):
graph = nx.DiGraph()
def add_children(component):
for usage in component.usages:
if isinstance(usage.parent, Class) and "Component" in usage.parent.bases:
graph.add_edge(component.name, usage.parent.name)
add_children(usage.parent)
add_children(root_component)
return graph
# Visualize component hierarchy
app = codebase.get_class("App")
component_tree = create_component_tree(app)
codebase.visualize(component_tree, root=app)
```
### Inheritance Graphs
Visualize class inheritance relationships:
```python
import networkx as nx
G = nx.DiGraph()
base = codebase.get_class("BaseModel")
def add_subclasses(cls):
for subclass in cls.subclasses:
G.add_edge(cls, subclass)
add_subclasses(subclass)
add_subclasses(base)
codebase.visualize(G, root=base)
```
### Module Dependencies
Visualize dependencies between modules:
```python
def create_module_graph(start_file: File):
G = nx.DiGraph()
def add_imports(file):
for imp in file.imports:
if imp.resolved_symbol and imp.resolved_symbol.file:
graph.add_edge(file, imp.resolved_symbol.file)
add_imports(imp.resolved_symbol.file)
add_imports(start_file)
return graph
# Visualize module dependencies
main = codebase.get_file("main.py")
module_graph = create_module_graph(main)
codebase.visualize(module_graph, root=main)
```
### Function Modularity
Visualize function groupings by modularity:
```python
def create_modularity_graph(functions: list[Function]):
graph = nx.Graph()
# Group functions by shared dependencies
for func in functions:
for dep in func.dependencies:
if isinstance(dep, Function):
weight = len(set(func.dependencies) & set(dep.dependencies))
if weight > 0:
graph.add_edge(func.name, dep.name, weight=weight)
return graph
# Visualize function modularity
funcs = codebase.functions
modularity_graph = create_modularity_graph(funcs)
codebase.visualize(modularity_graph)
```
## Customizing Visualizations
You can customize your visualizations using NetworkX's attributes while still preserving the smart node features:
```python
def create_custom_graph(codebase):
graph = nx.DiGraph()
# Add nodes with custom attributes while preserving source preview
for func in codebase.functions:
graph.add_node(func,
color='red' if func.is_public else 'blue',
shape='box' if func.is_async else 'oval'
)
# Add edges between actual function objects
for func in codebase.functions:
for call in func.call_sites:
if call.resolved_symbol:
graph.add_edge(func, call.resolved_symbol,
style='dashed' if call.is_conditional else 'solid',
weight=call.count
)
return graph
```
## Best Practices
1. **Use Symbol Objects for Rich Features**
```python
# Better: Add symbol objects for rich previews
# This will include source code previews, syntax highlighting, type information, etc.
for func in api_funcs:
graph.add_node(func)
# Basic: Just names, no extra features
for func in api_funcs:
graph.add_node(func.name)
```
2. **Focus on Relevant Subgraphs**
```python
# Better: Visualize specific subsystem
api_funcs = [f for f in codebase.functions if "api" in f.filepath]
api_graph = create_call_graph(api_funcs)
codebase.visualize(api_graph)
# Avoid: Visualizing entire codebase
full_graph = create_call_graph(codebase.functions) # Too complex
```
3. **Use Meaningful Layouts**
```python
# Group related nodes together
graph.add_node(controller_class, cluster="api")
graph.add_node(service_class, cluster="db")
```
4. **Add Visual Hints**
```python
# Color code by type while preserving rich previews
for node in codebase.functions:
if "Controller" in node.name:
graph.add_node(node, color="red")
elif "Service" in node.name:
graph.add_node(node, color="blue")
```
## Limitations
- Large graphs may become difficult to read
- Complex relationships might need multiple views
- Some graph layouts may take time to compute
- Preview features only work when adding symbol objects directly
---
title: "Flagging Symbols"
description: "Learn how to use symbol flags for debugging, tracking changes, and marking code for review"
icon: "flag"
iconType: "solid"
---
# Flagging Symbols
Symbol flags are a powerful feature in Codegen that allow you to mark and track specific code elements during development, debugging, or code review processes. Flags can be used to visually highlight code in the editor and can also integrate with various messaging systems.
## Basic Usage
The simplest way to flag a symbol is to call the `flag()` method on any symbol:
```python
# Flag a function
function.flag(message="This function needs optimization")
# Flag a class
my_class.flag(message="Consider breaking this into smaller classes")
# Flag a variable
variable.flag(message="Type hints needed here")
```
When you flag a symbol, two things happen:
1. A visual flag emoji (π©) is added as an inline comment
2. A `CodeFlag` object is created to track the flag in the system
## Language-Specific Behavior
The flag system adapts automatically to the programming language being used:
```python
# Python
# Results in: def my_function(): # π© Review needed
python_function.flag(message="Review needed")
# TypeScript
# Results in: function myFunction() { // π© Review needed
typescript_function.flag(message="Review needed")
```
## Example: Code Analysis
Here's an example of using flags during code analysis:
```python
def analyze_codebase(codebase):
for function in codebase.functions:
# Check documentation
if not function.docstring:
function.flag(
message="Missing docstring",
)
# Check error handling
if function.is_async and not function.has_try_catch:
function.flag(
message="Async function missing error handling",
)
```
This feature is particularly useful when building, and iterating on the symbols that you are trying to modify.
---
title: "Calling Out to LLMs"
sidebarTitle: "LLM Integration"
icon: "brain"
iconType: "solid"
---
Codegen natively integrates with LLMs via the [codebase.ai(...)](../api-reference/core/Codebase#ai) method, which lets you use large language models (LLMs) to help generate, modify, and analyze code.
## Configuration
Before using AI capabilities, you need to provide an OpenAI API key via [codebase.set_ai_key(...)](../api-reference/core/Codebase#set-ai-key):
```python
# Set your OpenAI API key
codebase.set_ai_key("your-openai-api-key")
```
## Calling Codebase.ai(...)
The [Codebase.ai(...)](../api-reference/core/Codebase#ai) method takes three key arguments:
```python
result = codebase.ai(
prompt="Your instruction to the AI",
target=symbol_to_modify, # Optional: The code being operated on
context=additional_info # Optional: Extra context from static analysis
)
```
- **prompt**: Clear instruction for what you want the AI to do
- **target**: The symbol (function, class, etc.) being operated on - its source code will be provided to the AI
- **context**: Additional information you want to provide to the AI, which you can gather using GraphSitter's analysis tools
Codegen does not automatically provide any context to the LLM by default. It
does not "understand" your codebase, only the context you provide.
The context parameter can include:
- A single symbol (its source code will be provided)
- A list of related symbols
- A dictionary mapping descriptions to symbols/values
- Nested combinations of the above
### How Context Works
The AI doesn't automatically know about your codebase. Instead, you can provide relevant context by:
1. Using GraphSitter's static analysis to gather information:
```python
function = codebase.get_function("process_data")
context = {
"call_sites": function.call_sites, # Where the function is called
"dependencies": function.dependencies, # What the function depends on
"parent": function.parent, # Class/module containing the function
"docstring": function.docstring, # Existing documentation
}
```
2. Passing this information to the AI:
```python
result = codebase.ai(
"Improve this function's implementation",
target=function,
context=context # AI will see the gathered information
)
```
## Common Use Cases
### Code Generation
Generate new code or refactor existing code:
```python
# Break up a large function
function = codebase.get_function("large_function")
new_code = codebase.ai(
"Break this function into smaller, more focused functions",
target=function
)
function.edit(new_code)
# Generate a test
my_function = codebase.get_function("my_function")
test_code = codebase.ai(
f"Write a test for the function {my_function.name}",
target=my_function
)
my_function.insert_after(test_code)
```
### Documentation
Generate and format documentation:
```python
# Generate docstrings for a class
class_def = codebase.get_class("MyClass")
for method in class_def.methods:
docstring = codebase.ai(
"Generate a docstring describing this method",
target=method,
context={
"class": class_def,
"style": "Google docstring format"
}
)
method.set_docstring(docstring)
```
### Code Analysis and Improvement
Use AI to analyze and improve code:
```python
# Improve function names
for function in codebase.functions:
if codebase.ai(
"Does this function name clearly describe its purpose? Answer yes/no",
target=function
).lower() == "no":
new_name = codebase.ai(
"Suggest a better name for this function",
target=function,
context={"call_sites": function.call_sites}
)
function.rename(new_name)
```
### Contextual Modifications
Make changes with full context awareness:
```python
# Refactor a class method
method = codebase.get_class("MyClass").get_method("target_method")
new_impl = codebase.ai(
"Refactor this method to be more efficient",
target=method,
context={
"parent_class": method.parent,
"call_sites": method.call_sites,
"dependencies": method.dependencies
}
)
method.edit(new_impl)
```
## Best Practices
1. **Provide Relevant Context**
```python
# Good: Providing specific, relevant context
summary = codebase.ai(
"Generate a summary of this method's purpose",
target=method,
context={
"class": method.parent, # Class containing the method
"usages": list(method.usages), # How the method is used
"dependencies": method.dependencies, # What the method depends on
"style": "concise"
}
)
# Bad: Missing context that could help the AI
summary = codebase.ai(
"Generate a summary",
target=method # AI only sees the method's code
)
```
2. **Gather Comprehensive Context**
```python
# Gather relevant information before AI call
def get_method_context(method):
return {
"class": method.parent,
"call_sites": list(method.call_sites),
"dependencies": list(method.dependencies),
"related_methods": [m for m in method.parent.methods
if m.name != method.name]
}
# Use gathered context in AI call
new_impl = codebase.ai(
"Refactor this method to be more efficient",
target=method,
context=get_method_context(method)
)
```
3. **Handle AI Limits**
```python
# Set custom AI request limits for large operations
codebase.set_session_options(max_ai_requests=200)
```
4. **Review Generated Code**
```python
# Generate and review before applying
new_code = codebase.ai(
"Optimize this function",
target=function
)
print("Review generated code:")
print(new_code)
if input("Apply changes? (y/n): ").lower() == 'y':
function.edit(new_code)
```
## Limitations and Safety
- The AI doesn't automatically know about your codebase - you must provide relevant context
- AI-generated code should always be reviewed
- Default limit of 150 AI requests per codemod execution
- Use [set_session_options(...)](../api-reference/core/Codebase#set-session-options) to adjust limits:
```python
codebase.set_session_options(max_ai_requests=200)
```
You can also use `codebase.set_session_options` to increase the execution time and the number of operations allowed in a session. This is useful for handling larger tasks or more complex operations that require additional resources. Adjust the `max_seconds` and `max_transactions` parameters to suit your needs:
```python
codebase.set_session_options(max_seconds=300, max_transactions=500)
```
---
title: "Semantic Code Search"
sidebarTitle: "Semantic Code Search"
icon: "magnifying-glass"
iconType: "solid"
---
Codegen provides semantic code search capabilities using embeddings. This allows you to search codebases using natural language queries and find semantically related code, even when the exact terms aren't present.
This is under active development. Interested in an application? [Reach out to the team!](/introduction/about.tsx)
## Basic Usage
Here's how to create and use a semantic code search index:
```python
# Parse a codebase
codebase = Codebase.from_repo('fastapi/fastapi', language='python')
# Create index
index = FileIndex(codebase)
index.create() # computes per-file embeddings
# Save index to .pkl
index.save('index.pkl')
# Load index into memory
index.load('index.pkl')
# Update index after changes
codebase.files[0].edit('# π Replacing File Content π')
codebase.commit()
index.update() # re-computes 1 embedding
```
## Searching Code
Once you have an index, you can perform semantic searches:
```python
# Search with natural language
results = index.similarity_search(
"How does FastAPI handle dependency injection?",
k=5 # number of results
)
# Print results
for file, score in results:
print(f"\nScore: {score:.3f} | File: {file.filepath}")
print(f"Preview: {file.content[:200]}...")
```
The `FileIndex` returns tuples of ([File](/api-reference/core/SourceFile), `score`)
The search uses cosine similarity between embeddings to find the most semantically related files, regardless of exact keyword matches.
## Available Indices
Codegen provides two types of semantic indices:
### FileIndex
The `FileIndex` operates at the file level:
- Indexes entire files, splitting large files into chunks
- Best for finding relevant files or modules
- Simpler and faster to create/update
```python
from codegen.extensions.index.file_index import FileIndex
index = FileIndex(codebase)
index.create()
```
### SymbolIndex (Experimental)
The `SymbolIndex` operates at the symbol level:
- Indexes individual functions, classes, and methods
- Better for finding specific code elements
- More granular search results
```python
from codegen.extensions.index.symbol_index import SymbolIndex
index = SymbolIndex(codebase)
index.create()
```
## How It Works
The semantic indices:
1. Process code at either file or symbol level
2. Split large content into chunks that fit within token limits
3. Use OpenAI's text-embedding-3-small model to create embeddings
4. Store embeddings efficiently for similarity search
5. Support incremental updates when code changes
When searching:
1. Your query is converted to an embedding
2. Cosine similarity is computed with all stored embeddings
3. The most similar items are returned with their scores
Creating embeddings requires an OpenAI API key with access to the embeddings endpoint.
## Example Searches
Here are some example semantic searches:
```python
# Find authentication-related code
results = index.similarity_search(
"How is user authentication implemented?",
k=3
)
# Find error handling patterns
results = index.similarity_search(
"Show me examples of error handling and custom exceptions",
k=3
)
# Find configuration management
results = index.similarity_search(
"Where is the application configuration and settings handled?",
k=3
)
```
The semantic search can understand concepts and return relevant results even when the exact terms aren't present in the code.
---
title: "Reducing Conditions"
sidebarTitle: "Reducing Conditions"
icon: "code-branch"
iconType: "solid"
---
Codegen provides powerful APIs for reducing conditional logic to constant values. This is particularly useful for removing feature flags, cleaning up dead code paths, and simplifying conditional logic.
## Overview
The `reduce_condition()` method is available on various conditional constructs:
- [If/else statements](/api-reference/core/IfBlockStatement#reduce-condition)
- [Ternary expressions](/api-reference/core/TernaryExpression#reduce-condition)
- [Binary expressions](/api-reference/core/BinaryExpression#reduce-condition)
- [Function calls](/api-reference/core/FunctionCall#reduce-condition)
When you reduce a condition to `True` or `False`, Codegen automatically:
1. Evaluates which code path(s) to keep
2. Removes unnecessary branches
3. Preserves proper indentation and formatting
### Motivating Example
For example, consider the following code:
```python
flag = get_feature_flag('MY_FEATURE')
if flag:
print('MY_FEATURE: ON')
else:
print('MY_FEATURE: OFF')
```
`.reduce_condition` allows you to deterministically reduce this code to the following:
```python
print('MY_FEATURE: ON')
```
This is useful when a feature flag is fully "rolled out".
## Implementations
### [IfBlockStatements](/api-reference/core/IfBlockStatement#reduce-condition)
You can reduce if/else statements to either their "true" or "false" branch.
For example, in the code snippet above:
```python
# Grab if statement
if_block = file.code_block.statements[1]
# Reduce to True branch
if_block.reduce_condition(True)
```
This will remove the `else` branch and keep the `print` statement, like so:
```python
flag = get_feature_flag('MY_FEATURE')
print('MY_FEATURE: ON')
```
### Handling Elif Chains
Codegen intelligently handles elif chains when reducing conditions:
```python
# Original code
if condition_a:
print("A")
elif condition_b:
print("B")
else:
print("C")
# Reduce first condition to False
if_block.reduce_condition(False)
# Result:
if condition_b:
print("B")
else:
print("C")
# Reduce elif condition to True
elif_block.reduce_condition(True)
# Result:
print("B")
```
## Ternary Expressions
Ternary expressions (conditional expressions) can also be reduced:
```python
# Original code
result = 'valueA' if condition else 'valueB'
# Reduce to True
ternary_expr.reduce_condition(True)
# Result:
result = 'valueA'
# Reduce to False
ternary_expr.reduce_condition(False)
# Result:
result = 'valueB'
```
### Nested Ternaries
Codegen handles nested ternary expressions correctly:
```python
# Original code
result = 'A' if a else 'B' if b else 'C'
# Reduce outer condition to False
outer_ternary.reduce_condition(False)
# Result:
result = 'B' if b else 'C'
# Then reduce inner condition to True
inner_ternary.reduce_condition(True)
# Result:
result = 'B'
```
## Binary Operations
Binary operations (and/or) can be reduced to simplify logic:
```python
# Original code
result = (x or y) and b
# Reduce x to True
x_assign.reduce_condition(True)
# Result:
result = b
# Reduce y to False
y_assign.reduce_condition(False)
# Result:
result = x and b
```
## Function Calls
[Function calls](/api-reference/core/FunctionCall#reduce-condition) can also be reduced, which is particularly useful when dealing with hooks or utility functions that return booleans:
```typescript
// Original code
const isEnabled = useFeatureFlag("my_feature");
return isEnabled ? : ;
// After reducing useFeatureFlag to True
return ;
```
### Feature Flag Hooks
A common use case is reducing feature flag hooks to constants. Consider the following code:
```typescript
// Original code
function MyComponent() {
const showNewUI = useFeatureFlag("new_ui_enabled");
if (showNewUI) {
return ;
}
return ;
}
```
We can reduce the `useFeatureFlag` hook to a constant value like so, with [FunctionCall.reduce_condition](/api-reference/core/FunctionCall#reduce-condition):
```python
hook = codebase.get_function("useFeatureFlag")
for usage in hook.usages():
if isinstance(usage.match, FunctionCall):
fcall = usage.match
if fcall.args[0].value.content == 'new_ui_enabled':
# This will automatically reduce any conditions using the flag
fcall.reduce_condition(True)
```
This produces the following code:
```typescript
function MyComponent() {
return ;
}
```
### Comprehensive Example
Here's a complete example of removing a feature flag from both configuration and usage:
```python
feature_flag_name = "new_ui_enabled"
target_value = True
# 1. Remove from config
config_file = codebase.get_file("src/featureFlags/config.ts")
feature_flag_config = config_file.get_symbol("FEATURE_FLAG_CONFIG").value
feature_flag_config.pop(feature_flag_name)
# 2. Find and reduce all usages
hook = codebase.get_function("useFeatureFlag")
for usage in hook.usages():
fcall = usage.match
if isinstance(fcall, FunctionCall):
# Check if this usage is for our target flag
first_arg = fcall.args[0].value
if isinstance(first_arg, String) and first_arg.content == feature_flag_name:
print(f'Reducing in: {fcall.parent_symbol.name}')
# This will automatically reduce:
# - Ternary expressions using the flag
# - If statements checking the flag
# - Binary operations with the flag
fcall.reduce_condition(target_value)
# Commit changes to disk
codebase.commit()
```
This example:
1. Removes the feature flag from configuration
2. Finds all usages of the feature flag hook
3. Reduces each usage to a constant value
4. Automatically handles all conditional constructs using the flag
When reducing a function call, Codegen automatically handles all dependent
conditions. This includes: - [If/else
statements](/api-reference/core/IfBlockStatement#reduce-condition) - [Ternary
expressions](/api-reference/core/TernaryExpression#reduce-condition) - [Binary
operations](/api-reference/core/BinaryExpression#reduce-condition)
## TypeScript and JSX Support
Condition reduction works with TypeScript and JSX, including conditional rendering:
```typescript
// Original JSX
const MyComponent: React.FC = () => {
let isVisible = true;
return (
{isVisible && Visible}
{!isVisible && Hidden}
);
};
// After reducing isVisible to True
const MyComponent: React.FC = () => {
return (
Visible
);
};
```
Condition reduction is particularly useful for cleaning up feature flags in
React components, where conditional rendering is common.
---
title: "Learn by Example"
sidebarTitle: "At a Glance"
icon: "graduation-cap"
iconType: "solid"
---
Explore our tutorials to learn how to use Codegen for various code transformation tasks.
## Featured Tutorials
Create an intelligent code agent with Langchain and powerful, codegen-powered tools
Generate interactive visualizations of your codebase's structure, dependencies, and relationships.
Create high-quality training data for LLM pre-training similar to word2vec or node2vec
Remove unused imports, functions, and variables with confidence.
## API Migrations
Update API calls, handle breaking changes, and manage bulk updates across your codebase.
Update SQLAlchemy code to use the new 2.0-style query interface and patterns.
Convert Flask applications to FastAPI, updating routes and dependencies.
Migrate Python 2 code to Python 3, updating syntax and modernizing APIs.
## Code Organization
Restructure files, enforce naming conventions, and improve project layout.
Split large files, extract shared logic, and manage dependencies.
Organize and optimize TypeScript module exports.
Convert between default and named exports in TypeScript/JavaScript.
## Testing & Types
Convert unittest test suites to pytest's modern testing style.
Add TypeScript types, infer types from usage, and improve type safety.
## Documentation & AI
Generate JSDoc comments, README files, and API documentation.
Generate system prompts, create hierarchical documentation, and optimize for AI assistance.
Each tutorial includes practical examples, code snippets, and best practices.
Follow them in order or jump to the ones most relevant to your needs.
---
title: "Building Code Agents"
sidebarTitle: "Code Agent"
icon: "robot"
iconType: "solid"
---
This guide demonstrates how to build an intelligent code agent that can analyze and manipulate codebases.
```python
from codegen import CodeAgent, Codebase
# Grab a repo from Github
codebase = Codebase.from_repo('fastapi/fastapi')
# Create a code agent with read/write codebase access
agent = CodeAgent(codebase)
# Run the agent with a prompt
agent.run("Tell me about this repo")
```
The agent has access to powerful code viewing and manipulation tools powered by Codegen, including:
- `ViewFileTool`: View contents and metadata of files
- `SemanticEditTool`: Make intelligent edits to files
- `RevealSymbolTool`: Analyze symbol dependencies and usages
- `MoveSymbolTool`: Move symbols between files with import handling
- `ReplacementEditTool`: Make regex-based replacement editing on files
- `ListDirectoryTool`: List directory contents
- `SearchTool`: Search for files and symbols
- `CreateFileTool`: Create new files
- `DeleteFileTool`: Delete files
- `RenameFileTool`: Rename files
- `EditFileTool`: Edit files
View the full code for the default tools and agent implementation in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/src/codegen/extensions/langchain/tools)
# Basic Usage
The following example shows how to create and run a `CodeAgent`:
```python
from codegen import CodeAgent, Codebase
# Grab a repo from Github
codebase = Codebase.from_repo('fastapi/fastapi')
# Create a code agent with read/write codebase access
agent = CodeAgent(codebase)
# Run the agent with a prompt
agent.run("Tell me about this repo")
```
Your `ANTHROPIC_API_KEY` must be set in your env.
The default implementation uses `anthropic/claude-3-5-sonnet-latest` for the model but this can be changed through the `model_provider` and `model_name` arguments.
```python
agent = CodeAgent(
codebase=codebase,
model_provider="openai",
model_name="gpt-4o",
)
```
If using a non-default model provider, make sure to set the appropriate API key (e.g., `OPENAI_API_KEY` for OpenAI models) in your env.
# Available Tools
The agent comes with a comprehensive set of tools for code analysis and manipulation. Here are some key tools:
```python
from codegen.extensions.langchain.tools import (
CreateFileTool,
DeleteFileTool,
EditFileTool,
ListDirectoryTool,
MoveSymbolTool,
RenameFileTool,
ReplacementEditTool,
RevealSymbolTool,
SearchTool,
SemanticEditTool,
ViewFileTool,
)
```
View the full set of [tools on Github](https://github.com/codegen-sh/codegen-sdk/blob/develop/src/codegen/extensions/langchain/tools.py)
Each tool provides specific capabilities:
# Extensions
## GitHub Integration
The agent includes tools for GitHub operations like PR management. Set up GitHub access with:
```bash
CODEGEN_SECRETS__GITHUB_TOKEN="..."
```
Import the GitHub tools:
```python
from codegen.extensions.langchain.tools import (
GithubCreatePRTool,
GithubViewPRTool,
GithubCreatePRCommentTool,
GithubCreatePRReviewCommentTool
)
```
These tools enable:
- Creating pull requests
- Viewing PR contents and diffs
- Adding general PR comments
- Adding inline review comments
View all Github tools on [Github](https://github.com/codegen-sh/codegen-sdk/blob/develop/src/codegen/extensions/langchain/tools.py)
## Linear Integration
The agent can interact with Linear for issue tracking and project management. To use Linear tools, set the following environment variables:
```bash
LINEAR_ACCESS_TOKEN="..."
LINEAR_TEAM_ID="..."
LINEAR_SIGNING_SECRET="..."
```
Import and use the Linear tools:
```python
from codegen.extensions.langchain.tools import (
LinearGetIssueTool,
LinearGetIssueCommentsTool,
LinearCommentOnIssueTool,
LinearSearchIssuesTool,
LinearCreateIssueTool,
LinearGetTeamsTool
)
```
These tools allow the agent to:
- Create and search issues
- Get issue details and comments
- Add comments to issues
- View team information
View all Linear tools on [Github](https://github.com/codegen-sh/codegen-sdk/blob/develop/src/codegen/extensions/langchain/tools.py)
## Adding Custom Tools
You can extend the agent with custom tools:
```python
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from codegen import CodeAgent
class CustomToolInput(BaseModel):
"""Input schema for custom tool."""
param: str = Field(..., description="Parameter description")
class CustomCodeTool(BaseTool):
"""A custom tool for the code agent."""
name = "custom_tool"
description = "Description of what the tool does"
args_schema = CustomToolInput
def _run(self, param: str) -> str:
# Tool implementation
return f"Processed {param}"
# Add custom tool to agent
tools.append(CustomCodeTool())
agent = CodebaseAgent(codebase, tools=tools, model_name="claude-3-5-sonnet-latest")
```
---
title: "Building a RAG-powered Slack Bot"
sidebarTitle: "Slack Bot"
icon: "slack"
iconType: "solid"
---
This tutorial demonstrates how to build a Slack bot that can answer code questions using simple RAG (Retrieval Augmented Generation) over a codebase. The bot uses semantic search to find relevant code snippets and generates detailed answers using OpenAI's APIs.
View the full code and setup instructions in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/slack_chatbot)
While this example uses the Codegen codebase, you can adapt it to any repository by changing the repository URL
## Overview
The process involves three main steps:
1. Initializing and indexing the codebase
2. Finding relevant code snippets for a query
3. Generating answers using RAG
Let's walk through each step using Codegen.
## Step 1: Initializing the Codebase
First, we initialize the codebase and create a vector index for semantic search:
```python
from codegen import Codebase
from codegen.extensions import VectorIndex
def initialize_codebase():
"""Initialize and index the codebase."""
# Initialize codebase with smart caching
codebase = Codebase.from_repo(
"codegen-sh/codegen-sdk",
language="python",
tmp_dir="/root"
)
# Initialize vector index
index = VectorIndex(codebase)
# Try to load existing index or create new one
index_path = "/root/E.pkl"
try:
index.load(index_path)
except FileNotFoundError:
# Create new index if none exists
index.create()
index.save(index_path)
return codebase, index
```
The vector index is persisted to disk, so subsequent queries will be much faster.
See [semantic code search](/building-with-codegen/semantic-code-search) to learn more about VectorIndex.
## Step 2: Finding Relevant Code
Next, we use the vector index to find code snippets relevant to a query:
```python
def find_relevant_code(index: VectorIndex, query: str) -> list[tuple[str, float]]:
"""Find code snippets relevant to the query."""
# Get top 10 most relevant files
results = index.similarity_search(query, k=10)
# Clean up chunk references from index
cleaned_results = []
for filepath, score in results:
if "#chunk" in filepath:
filepath = filepath.split("#chunk")[0]
cleaned_results.append((filepath, score))
return cleaned_results
```
VectorIndex automatically chunks large files for better search results. We clean up the chunk references to show clean file paths.
## Step 3: Generating Answers
Finally, we use GPT-4 to generate answers based on the relevant code:
```python
from openai import OpenAI
def generate_answer(query: str, context: str) -> str:
"""Generate an answer using RAG."""
prompt = f"""You are a code expert. Given the following code context and question,
provide a clear and accurate answer.
Note: Keep it short and sweet - 2 paragraphs max.
Question: {query}
Relevant code:
{context}
Answer:"""
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a code expert. Answer questions about the given repo based on RAG'd results."},
{"role": "user", "content": prompt},
],
temperature=0,
)
return response.choices[0].message.content
```
## Putting It All Together
Here's how the components work together to answer questions:
```python
def answer_question(query: str) -> tuple[str, list[tuple[str, float]]]:
"""Answer a question about the codebase using RAG."""
# Initialize or load codebase and index
codebase, index = initialize_codebase()
# Find relevant files
results = find_relevant_code(index, query)
# Collect context from relevant files
context = ""
for filepath, score in results:
file = codebase.get_file(filepath)
context += f"File: {filepath}\n```\n{file.content}\n```\n\n"
# Generate answer
answer = generate_answer(query, context)
return answer, results
```
This will:
1. Load or create the vector index
2. Find relevant code snippets
3. Generate a detailed answer
4. Return both the answer and file references
## Example Usage
Here's what the output looks like:
```python
answer, files = answer_question("How does VectorIndex handle large files?")
print("Answer:", answer)
print("\nRelevant files:")
for filepath, score in files:
print(f"β’ {filepath} (score: {score:.2f})")
```
Output:
```
Answer:
VectorIndex handles large files by automatically chunking them into smaller pieces
using tiktoken. Each chunk is embedded separately and can be searched independently,
allowing for more precise semantic search results.
Relevant files:
β’ src/codegen/extensions/vector_index.py (score: 0.92)
β’ src/codegen/extensions/tools/semantic_search.py (score: 0.85)
```
## Extensions
While this example demonstrates a simple RAG-based bot, you can extend it to build a more powerful code agent that can:
- Do more sophisticated code retrieval
- Make code changes using Codegen's edit APIs
- Gather further context from Slack channels
- ... etc.
Check out our [Code Agent tutorial](/tutorials/build-code-agent) to learn how to build an intelligent agent with access to Codegen's full suite of tools
## Learn More
Learn how to use VectorIndex for semantic code search and embeddings.
Create a more powerful agent with multi-step reasoning and code manipulation.
Learn about OpenAI's text embeddings and how they work.
Understand RAG patterns and best practices for better results.
---
title: "Building an AI-Powered GitHub PR Review Bot"
sidebarTitle: "GitHub PR Review Bot"
icon: "github"
iconType: "solid"
---
This tutorial demonstrates how to build an intelligent GitHub PR review bot that automatically reviews pull requests when triggered by labels. The bot uses Codegen's GitHub integration and AI capabilities to provide comprehensive code reviews with actionable feedback.
The bot is triggered by adding a "Codegen" label to PRs, making it easy to integrate into your existing workflow
## Overview
The process involves three main components:
1. Setting up a Modal web endpoint for GitHub webhooks
2. Handling PR label events
3. Running an AI-powered code review agent
Let's walk through each component using Codegen.
## Step 1: Setting Up the Modal App
First, we set up a Modal application to handle GitHub webhooks:
```python
import modal
from codegen.extensions.events.app import CodegenApp
from fastapi import Request
# Set up the base image with required dependencies
base_image = (
modal.Image.debian_slim(python_version="3.12")
.apt_install("git")
.pip_install(
"codegen>=0.18",
"openai>=1.1.0",
"fastapi[standard]",
"slack_sdk",
)
)
# Initialize the Codegen app with GitHub integration
app = CodegenApp(name="github", image=base_image)
@app.function(secrets=[modal.Secret.from_dotenv()])
@modal.web_endpoint(method="POST")
def entrypoint(event: dict, request: Request):
return app.github.handle(event, request)
```
The Modal app provides a webhook endpoint that GitHub can call when PR events occur.
Make sure to configure your GitHub repository's webhook settings to point to your Modal endpoint.
## Step 2: Handling PR Events
Next, we set up event handlers for PR label events:
```python
from codegen.extensions.github.types.events.pull_request import (
PullRequestLabeledEvent,
PullRequestUnlabeledEvent
)
@app.github.event("pull_request:labeled")
def handle_labeled(event: PullRequestLabeledEvent):
"""Handle PR labeled events."""
if event.label.name == "Codegen":
# Optional: Notify a Slack channel
app.slack.client.chat_postMessage(
channel="YOUR_CHANNEL_ID",
text=f"PR #{event.number} labeled with Codegen, starting review",
)
# Start the review process
pr_review_agent(event)
@app.github.event("pull_request:unlabeled")
def handle_unlabeled(event: PullRequestUnlabeledEvent):
"""Handle PR unlabeled events."""
if event.label.name == "Codegen":
# Clean up bot comments when label is removed
remove_bot_comments(event)
```
The bot only triggers on PRs labeled with "Codegen", giving you control over which PRs get reviewed.
## Step 3: Implementing the Review Agent
Finally, we implement the AI-powered review agent:
```python
from codegen import Codebase, CodeAgent
from codegen.extensions.langchain.tools import (
GithubViewPRTool,
GithubCreatePRCommentTool,
GithubCreatePRReviewCommentTool,
)
def pr_review_agent(event: PullRequestLabeledEvent) -> None:
"""Run the PR review agent."""
# Initialize codebase for the repository
repo_str = f"{event.organization.login}/{event.repository.name}"
codebase = Codebase.from_repo(
repo_str,
language='python',
secrets=SecretsConfig(github_token=os.environ["GITHUB_TOKEN"])
)
# Create a temporary comment to show the bot is working
review_message = "CodegenBot is starting to review the PR please wait..."
comment = codebase._op.create_pr_comment(event.number, review_message)
# Set up PR review tools
pr_tools = [
GithubViewPRTool(codebase),
GithubCreatePRCommentTool(codebase),
GithubCreatePRReviewCommentTool(codebase),
]
# Create and run the review agent
agent = CodeAgent(codebase=codebase, tools=pr_tools)
prompt = f"""
Review this pull request like a senior engineer:
{event.pull_request.url}
Be explicit about the changes, produce a short summary, and point out possible improvements.
Focus on facts and technical details, using code snippets where helpful.
"""
result = agent.run(prompt)
# Clean up the temporary comment
comment.delete()
```
## Setting Up the Environment
Before running the bot, you'll need:
1. Create a `.env` file with your credentials:
```env
GITHUB_TOKEN=your_github_token
GITHUB_API_KEY=your_github_token
ANTHROPIC_API_KEY=your_anthropic_key
SLACK_BOT_TOKEN=your_slack_token # Optional
```
2. Deploy the Modal app:
```bash
uv sync # Install dependencies
uv run modal deploy app.py
```
3. Configure GitHub webhook:
- Go to your repository settings
- Add webhook pointing to your Modal endpoint
- Select "Pull request" events
- Add a webhook secret (optional but recommended)
## Example Usage
1. Create or update a pull request in your repository
2. Add the "Codegen" label to trigger a review
3. The bot will:
- Post a temporary "starting review" comment
- Analyze the PR changes
- Post detailed review comments
- Remove the temporary comment when done
To remove the bot's comments:
1. Remove the "Codegen" label
2. The bot will automatically clean up its comments
## Extensions
While this example demonstrates a basic PR review bot, you can extend it to:
- Customize the review criteria
- Add more sophisticated analysis tools
- Integrate with other services
- Add automatic fix suggestions
- ... etc.
Check out our [Code Agent tutorial](/tutorials/build-code-agent) to learn more about building sophisticated AI agents with Codegen
---
title: "Deep Code Research with AI"
sidebarTitle: "Code Research Agent"
icon: "magnifying-glass"
iconType: "solid"
---
This guide demonstrates how to build an intelligent code research tool that can analyze and explain codebases using Codegen's and LangChain. The tool combines semantic code search, dependency analysis, and natural language understanding to help developers quickly understand new codebases.
View the full code on [GitHub](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/deep_code_research)
This example works with any public GitHub repository - just provide the repo name in the format owner/repo
## Overview
The process involves three main components:
1. A CLI interface for interacting with the research agent
2. A set of code analysis tools powered by Codegen
3. An LLM-powered agent that combines the tools to answer questions
Let's walk through building each component.
## Step 1: Setting Up the Research Tools
First, let's import the necessary components and set up our research tools:
```python
from codegen import Codebase
from codegen.extensions.langchain.agent import create_agent_with_tools
from codegen.extensions.langchain.tools import (
ListDirectoryTool,
RevealSymbolTool,
SearchTool,
SemanticSearchTool,
ViewFileTool,
)
from langchain_core.messages import SystemMessage
```
We'll create a function to initialize our codebase with a nice progress indicator:
```python
def initialize_codebase(repo_name: str) -> Optional[Codebase]:
"""Initialize a codebase with a spinner showing progress."""
with console.status("") as status:
try:
status.update(f"[bold blue]Cloning {repo_name}...[/bold blue]")
codebase = Codebase.from_repo(repo_name)
status.update("[bold green]β Repository cloned successfully![/bold green]")
return codebase
except Exception as e:
console.print(f"[bold red]Error initializing codebase:[/bold red] {e}")
return None
```
Then we'll set up our research tools:
```python
# Create research tools
tools = [
ViewFileTool(codebase), # View file contents
ListDirectoryTool(codebase), # Explore directory structure
SearchTool(codebase), # Text-based search
SemanticSearchTool(codebase), # Natural language search
RevealSymbolTool(codebase), # Analyze symbol relationships
]
```
Each tool provides specific capabilities:
- `ViewFileTool`: Read and understand file contents
- `ListDirectoryTool`: Explore the codebase structure
- `SearchTool`: Find specific code patterns
- `SemanticSearchTool`: Search using natural language
- `RevealSymbolTool`: Analyze dependencies and usages
## Step 2: Creating the Research Agent
Next, we'll create an agent that can use these tools intelligently. We'll give it a detailed prompt about its role:
```python
RESEARCH_AGENT_PROMPT = """You are a code research expert. Your goal is to help users understand codebases by:
1. Finding relevant code through semantic and text search
2. Analyzing symbol relationships and dependencies
3. Exploring directory structures
4. Reading and explaining code
Always explain your findings in detail and provide context about how different parts of the code relate to each other.
When analyzing code, consider:
- The purpose and functionality of each component
- How different parts interact
- Key patterns and design decisions
- Potential areas for improvement
Break down complex concepts into understandable pieces and use examples when helpful."""
# Initialize the agent
agent = create_agent_with_tools(
codebase=codebase,
tools=tools,
chat_history=[SystemMessage(content=RESEARCH_AGENT_PROMPT)],
verbose=True
)
```
## Step 3: Building the CLI Interface
Finally, we'll create a user-friendly CLI interface using rich-click:
```python
import rich_click as click
from rich.console import Console
from rich.markdown import Markdown
@click.group()
def cli():
"""π Codegen Code Research CLI"""
pass
@cli.command()
@click.argument("repo_name", required=False)
@click.option("--query", "-q", default=None, help="Initial research query.")
def research(repo_name: Optional[str] = None, query: Optional[str] = None):
"""Start a code research session."""
# Initialize codebase
codebase = initialize_codebase(repo_name)
# Create and run the agent
agent = create_research_agent(codebase)
# Main research loop
while True:
if not query:
query = Prompt.ask("[bold cyan]Research query[/bold cyan]")
result = agent.invoke(
{"input": query},
config={"configurable": {"thread_id": 1}}
)
console.print(Markdown(result["messages"][-1].content))
query = None # Clear for next iteration
```
## Using the Research Tool
You can use the tool in several ways:
1. Interactive mode (will prompt for repo):
```bash
python run.py research
```
2. Specify a repository:
```bash
python run.py research "fastapi/fastapi"
```
3. Start with an initial query:
```bash
python run.py research "fastapi/fastapi" -q "Explain the main components"
```
Example research queries:
- "Explain the main components and their relationships"
- "Find all usages of the FastAPI class"
- "Show me the dependency graph for the routing module"
- "What design patterns are used in this codebase?"
The agent maintains conversation history, so you can ask follow-up questions
and build on previous findings.
## Advanced Usage
### Custom Research Tools
You can extend the agent with custom tools for specific analysis needs:
```python
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
class CustomAnalysisTool(BaseTool):
"""Custom tool for specialized code analysis."""
name = "custom_analysis"
description = "Performs specialized code analysis"
def _run(self, query: str) -> str:
# Custom analysis logic
return results
# Add to tools list
tools.append(CustomAnalysisTool())
```
### Customizing the Agent
You can modify the agent's behavior by adjusting its prompt:
```python
CUSTOM_PROMPT = """You are a specialized code reviewer focused on:
1. Security best practices
2. Performance optimization
3. Code maintainability
...
"""
agent = create_agent_with_tools(
codebase=codebase,
tools=tools,
chat_history=[SystemMessage(content=CUSTOM_PROMPT)],
)
```
---
title: "Codebase Analytics"
sidebarTitle: "Analytics"
icon: "calculator"
iconType: "solid"
---
This tutorial explains how codebase metrics are efficiently calculated using the `codegen` library in the Codebase Analytics Dashboard. The metrics include indices of codebase maintainabilith and complexity.
View the full code and setup instructions in our [codebase-analytics repository](https://github.com/codegen-sh/codebase-analytics).
## Complexity Metrics
Complexity metrics help quantify how easy or difficult a codebase is to understand and maintain. These metrics are calculated by analyzing various aspects of the code structure, including control flow, code volume, and inheritance patterns. The following metrics provide different perspectives on code complexity.
### Cyclomatic Complexity
Cyclomatic Complexity measures the number of linearly independent paths through the codebase, making it a valuable indicator of how difficult code will be to test and maintain.
**Calculation Method**:
- Base complexity of 1
- +1 for each:
- if statement
- elif statement
- for loop
- while loop
- +1 for each boolean operator (and, or) in conditions
- +1 for each except block in try-catch statements
The `calculate_cyclomatic_complexity()` function traverses the Codgen codebase object and uses the above rules to find statement objects within each function and calculate the overall cyclomatic complexity of the codebase.
```python
def calculate_cyclomatic_complexity(function):
def analyze_statement(statement):
complexity = 0
if isinstance(statement, IfBlockStatement):
complexity += 1
if hasattr(statement, "elif_statements"):
complexity += len(statement.elif_statements)
elif isinstance(statement, (ForLoopStatement, WhileStatement)):
complexity += 1
return complexity
```
### Halstead Volume
Halstead Volume is a software metric which measures the complexity of a codebase by counting the number of unique operators and operands. It is calculated by multiplying the sum of unique operators and operands by the logarithm base 2 of the sum of unique operators and operands.
**Halstead Volume**: `V = (N1 + N2) * log2(n1 + n2)`
This calculation uses codegen's expression types to make this calculation very efficient - these include BinaryExpression, UnaryExpression and ComparisonExpression. The function extracts operators and operands from the codebase object and calculated in `calculate_halstead_volume()` function.
```python
def calculate_halstead_volume(operators, operands):
n1 = len(set(operators))
n2 = len(set(operands))
N1 = len(operators)
N2 = len(operands)
N = N1 + N2
n = n1 + n2
if n > 0:
volume = N * math.log2(n)
return volume, N1, N2, n1, n2
return 0, N1, N2, n1, n2
```
### Depth of Inheritance (DOI)
Depth of Inheritance measures the length of inheritance chain for each class. It is calculated by counting the length of the superclasses list for each class in the codebase. The implementation is handled through a simple calculation using codegen's class information in the `calculate_doi()` function.
```python
def calculate_doi(cls):
return len(cls.superclasses)
```
## Maintainability Index
Maintainability Index is a software metric which measures how maintainable a codebase is. Maintainability is described as ease to support and change the code. This index is calculated as a factored formula consisting of SLOC (Source Lines Of Code), Cyclomatic Complexity and Halstead volume.
**Maintainability Index**: `M = 171 - 5.2 * ln(HV) - 0.23 * CC - 16.2 * ln(SLOC)`
This formula is then normalized to a scale of 0-100, where 100 is the maximum maintainability.
The implementation is handled through the `calculate_maintainability_index()` function. The codegen codebase object is used to efficiently extract the Cyclomatic Complexity and Halstead Volume for each function and class in the codebase, which are then used to calculate the maintainability index.
```python
def calculate_maintainability_index(
halstead_volume: float, cyclomatic_complexity: float, loc: int
) -> int:
"""Calculate the normalized maintainability index for a given function."""
if loc <= 0:
return 100
try:
raw_mi = (
171
- 5.2 * math.log(max(1, halstead_volume))
- 0.23 * cyclomatic_complexity
- 16.2 * math.log(max(1, loc))
)
normalized_mi = max(0, min(100, raw_mi * 100 / 171))
return int(normalized_mi)
except (ValueError, TypeError):
return 0
```
## Line Metrics
Line metrics provide insights into the size, complexity, and maintainability of a codebase. These measurements help determine the scale of a project, identify areas that may need refactoring, and track the growth of the codebase over time.
### Lines of Code
Lines of Code refers to the total number of lines in the source code, including blank lines and comments. This is accomplished with a simple count of all lines in the source file.
### Logical Lines of Code (LLOC)
LLOC is the amount of lines of code which contain actual functional statements. It excludes comments, blank lines, and other lines which do not contribute to the utility of the codebase. A high LLOC relative to total lines of code suggests dense, potentially complex code that may benefit from breaking into smaller functions or modules with more documentation.
### Source Lines of Code (SLOC)
SLOC refers to the number of lines containing actual code, excluding blank lines. This includes programming language keywords and comments. While a higher SLOC indicates a larger codebase, it should be evaluated alongside other metrics like cyclomatic complexity and maintainability index to assess if the size is justified by the functionality provided.
### Comment Density
Comment density is calculated by dividing the lines of code which contain comments by the total lines of code in the codebase. The formula is:
```python
"comment_density": (total_comments / total_loc * 100)
```
It measures the proportion of comments in the codebase and is a good indicator of how much code is properly documented. Accordingly, it can show how maintainable and easy to understand the codebase is.
## General Codebase Statistics
The number of files is determined by traversing codegen's FileNode objects in the parsed codebase. The number of functions is calculated by counting FunctionDef nodes across all parsed files. The number of classes is obtained by summing ClassDef nodes throughout the codebase.
```python
num_files = len(codebase.files(extensions="*"))
num_functions = len(codebase.functions)
num_classes = len(codebase.classes)
```
The commit activity is calculated by using the git history of the repository. The number of commits is counted for each month in the last 12 months.
## Using the Analysis Tool (Modal Server)
The tool is implemented as a FastAPI application wrapped in a Modal deployment. To analyze a repository:
1. Send a POST request to `/analyze_repo` with the repository URL
2. The tool will:
- Clone the repository
- Parse the codebase using codegen
- Calculate all metrics
- Return a comprehensive JSON response with all metrics
This is the only endpoint in the FastAPI server, as it takes care of the entire analysis process. To run the FastAPI server locally, install all dependencies and run the server with `modal serve modal_main.py`.
The server can be connected to the frontend dashboard. This web component is implemented as a Next.js application with appropriate comments and visualizations for the raw server data. To run the frontend locally, install all dependencies and run the server with `npm run dev`. This can be connected to the FastAPI server by setting the URL in the request to the `/analyze_repo` endpoint.
---
title: "Mining Training Data for LLMs"
sidebarTitle: "Mining Data"
description: "Learn how to generate training data for large language models using Codegen"
icon: "network-wired"
iconType: "solid"
---
This guide demonstrates how to use Codegen to generate high-quality training data for large language models (LLMs) by extracting function implementations along with their dependencies and usages. This approach is similar to [word2vec](https://www.tensorflow.org/text/tutorials/word2vec) or [node2vec](https://snap.stanford.edu/node2vec/) - given the context of a function, learn to predict the function's implementation.
View the full code in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/generate_training_data)
This example works with both Python and Typescript repositories without modification
## Overview
The process involves three main steps:
1. Finding all functions in the codebase
2. Extracting their implementations, dependencies, and usages
3. Generating structured training data
Let's walk through each step using Codegen.
## Step 1: Finding Functions and Their Context
First, we will do a "graph expansion" for each function - grab the function's source, as well as the full source of all usages of the function and all dependencies.
See [dependencies and usages](/building-with-codegen/dependencies-and-usages) to learn more about navigating the code graph
First, let's import the types we need from Codegen:
```python
import codegen
from codegen import Codebase
from codegen.sdk.core.external_module import ExternalModule
from codegen.sdk.core.import_resolution import Import
from codegen.sdk.core.symbol import Symbol
```
Here's how we get the full context for each function:
```python
def get_function_context(function) -> dict:
"""Get the implementation, dependencies, and usages of a function."""
context = {
"implementation": {"source": function.source, "filepath": function.filepath},
"dependencies": [],
"usages": [],
}
# Add dependencies
for dep in function.dependencies:
# Hop through imports to find the root symbol source
if isinstance(dep, Import):
dep = hop_through_imports(dep)
context["dependencies"].append({"source": dep.source, "filepath": dep.filepath})
# Add usages
for usage in function.usages:
context["usages"].append({
"source": usage.usage_symbol.source,
"filepath": usage.usage_symbol.filepath,
})
return context
```
Notice how we use `hop_through_imports` to resolve dependencies. When working with imports, symbols can be re-exported multiple times. For example, a helper function might be imported and re-exported through several files before being used. We need to follow this chain to find the actual implementation:
```python
def hop_through_imports(imp: Import) -> Symbol | ExternalModule:
"""Finds the root symbol for an import."""
if isinstance(imp.imported_symbol, Import):
return hop_through_imports(imp.imported_symbol)
return imp.imported_symbol
```
This creates a structured representation of each function's context:
```json
{
"implementation": {
"source": "def process_data(input: str) -> dict: ...",
"filepath": "src/data_processor.py"
},
"dependencies": [
{
"source": "def validate_input(data: str) -> bool: ...",
"filepath": "src/validators.py"
}
],
"usages": [
{
"source": "result = process_data(user_input)",
"filepath": "src/api.py"
}
]
}
```
## Step 2: Processing the Codebase
Next, we process all functions in the codebase to generate our training data:
```python
def run(codebase: Codebase):
"""Generate training data using a node2vec-like approach for code embeddings."""
# Track all function contexts
training_data = {
"functions": [],
"metadata": {
"total_functions": len(codebase.functions),
"total_processed": 0,
"avg_dependencies": 0,
"avg_usages": 0,
},
}
# Process each function in the codebase
for function in codebase.functions:
# Skip if function is too small
if len(function.source.split("\n")) < 2:
continue
# Get function context
context = get_function_context(function)
# Only keep functions with enough context
if len(context["dependencies"]) + len(context["usages"]) > 0:
training_data["functions"].append(context)
# Update metadata
training_data["metadata"]["total_processed"] = len(training_data["functions"])
if training_data["functions"]:
training_data["metadata"]["avg_dependencies"] = sum(
len(f["dependencies"]) for f in training_data["functions"]
) / len(training_data["functions"])
training_data["metadata"]["avg_usages"] = sum(
len(f["usages"]) for f in training_data["functions"]
) / len(training_data["functions"])
return training_data
```
## Step 3: Running the Generator
Finally, we can run our training data generator on any codebase.
See [parsing codebases](/building-with-codegen/parsing-codebases) to learn more
```python
if __name__ == "__main__":
print("Initializing codebase...")
codebase = Codebase.from_repo("fastapi/fastapi")
print("Generating training data...")
training_data = run(codebase)
print("Saving training data...")
with open("training_data.json", "w") as f:
json.dump(training_data, f, indent=2)
print("Training data saved to training_data.json")
```
This will:
1. Load the target codebase
2. Process all functions
3. Save the structured training data to a JSON file
You can use any Git repository as your source codebase by passing the repo URL
to [Codebase.from_repo(...)](/api-reference/core/Codebase#from-repo).
## Using the Training Data
The generated data can be used to train LLMs in several ways:
1. **Masked Function Prediction**: Hide a function's implementation and predict it from dependencies and usages
2. **Code Embeddings**: Generate embeddings that capture semantic relationships between functions
3. **Dependency Prediction**: Learn to predict which functions are likely to be dependencies
4. **Usage Pattern Learning**: Train models to understand common usage patterns
For example, to create a masked prediction task:
```python
def create_training_example(function_data):
"""Create a masked prediction example from function data."""
return {
"context": {
"dependencies": function_data["dependencies"],
"usages": function_data["usages"]
},
"target": function_data["implementation"]
}
# Create training examples
examples = [create_training_example(f) for f in training_data["functions"]]
```
---
title: "Codebase Visualization"
sidebarTitle: "Visualization"
description: "This guide will show you how to create codebase visualizations using [codegen](/introduction/overview)."
icon: "share-nodes"
iconType: "solid"
---
## Overview
To demonstrate the visualization capabilities of the codegen we will generate three different visualizations of PostHog's open source [repository](https://github.com/PostHog/posthog).
- [Call Trace Visualization](#call-trace-visualization)
- [Function Dependency Graph](#function-dependency-graph)
- [Blast Radius Visualization](#blast-radius-visualization)
## Call Trace Visualization
Visualizing the call trace of a function is a great way to understand the flow of a function and for debugging. In this tutorial we will create a call trace visualization of the `patch` method of the `SharingConfigurationViewSet` class. View the source code [here](https://github.com/PostHog/posthog/blob/c2986d9ac7502aa107a4afbe31b3633848be6582/posthog/api/sharing.py#L163).
### Basic Setup
First, we'll set up our codebase, graph and configure some basic parameters:
```python
import networkx as nx
from codegen import Codebase
# Initialize codebase
codebase = Codebase("path/to/posthog/")
# Create a directed graph for representing call relationships
G = nx.DiGraph()
# Configuration flags
IGNORE_EXTERNAL_MODULE_CALLS = True # Skip calls to external modules
IGNORE_CLASS_CALLS = False # Include class definition calls
MAX_DEPTH = 10
COLOR_PALETTE = {
"StartFunction": "#9cdcfe", # Light blue - Start Function
"PyFunction": "#a277ff", # Soft purple/periwinkle - PyFunction
"PyClass": "#ffca85", # Warm peach/orange - PyClass
"ExternalModule": "#f694ff" # Bright magenta/pink - ExternalModule
}
```
### Building the Visualization
We'll create a function that will recursively traverse the call trace of a function and add nodes and edges to the graph:
```python
def create_downstream_call_trace(src_func: Function, depth: int = 0):
"""Creates call graph by recursively traversing function calls
Args:
src_func (Function): Starting function for call graph
depth (int): Current recursion depth
"""
# Prevent infinite recursion
if MAX_DEPTH <= depth:
return
# External modules are not functions
if isinstance(src_func, ExternalModule):
return
# Process each function call
for call in src_func.function_calls:
# Skip self-recursive calls
if call.name == src_func.name:
continue
# Get called function definition
func = call.function_definition
if not func:
continue
# Apply configured filters
if isinstance(func, ExternalModule) and IGNORE_EXTERNAL_MODULE_CALLS:
continue
if isinstance(func, Class) and IGNORE_CLASS_CALLS:
continue
# Generate display name (include class for methods)
if isinstance(func, Class) or isinstance(func, ExternalModule):
func_name = func.name
elif isinstance(func, Function):
func_name = f"{func.parent_class.name}.{func.name}" if func.is_method else func.name
# Add node and edge with metadata
G.add_node(func, name=func_name,
color=COLOR_PALETTE.get(func.__class__.__name__))
G.add_edge(src_func, func, **generate_edge_meta(call))
# Recurse for regular functions
if isinstance(func, Function):
create_downstream_call_trace(func, depth + 1)
```
### Adding Edge Metadata
We can enrich our edges with metadata about the function calls:
```python
def generate_edge_meta(call: FunctionCall) -> dict:
"""Generate metadata for call graph edges
Args:
call (FunctionCall): Function call information
Returns:
dict: Edge metadata including name and location
"""
return {
"name": call.name,
"file_path": call.filepath,
"start_point": call.start_point,
"end_point": call.end_point,
"symbol_name": "FunctionCall"
}
```
### Visualizing the Graph
Finally, we can visualize our call graph starting from a specific function:
```python
# Get target function to analyze
target_class = codebase.get_class('SharingConfigurationViewSet')
target_method = target_class.get_method('patch')
# Add root node
G.add_node(target_method,
name=f"{target_class.name}.{target_method.name}",
color=COLOR_PALETTE["StartFunction"])
# Build the call graph
create_downstream_call_trace(target_method)
# Render the visualization
codebase.visualize(G)
```
### Take a look
View on [codegen.sh](https://www.codegen.sh/codemod/6a34b45d-c8ad-422e-95a8-46d4dc3ce2b0/public/diff)
### Common Use Cases
The call graph visualization is particularly useful for:
- Understanding complex codebases
- Planning refactoring efforts
- Identifying tightly coupled components
- Analyzing critical paths
- Documenting system architecture
## Function Dependency Graph
Understanding symbol dependencies is crucial for maintaining and refactoring code. This tutorial will show you how to create visual dependency graphs using Codegen and NetworkX. We will be creating a dependency graph of the `get_query_runner` function. View the source code [here](https://github.com/PostHog/posthog/blob/c2986d9ac7502aa107a4afbe31b3633848be6582/posthog/hogql_queries/query_runner.py#L152).
### Basic Setup
We'll use the same basic setup as the [Call Trace Visualization](/tutorials/codebase-visualization#call-trace-visualization) tutorial.
### Building the Dependency Graph
The core function for building our dependency graph:
```python
def create_dependencies_visualization(symbol: Symbol, depth: int = 0):
"""Creates visualization of symbol dependencies
Args:
symbol (Symbol): Starting symbol to analyze
depth (int): Current recursion depth
"""
# Prevent excessive recursion
if depth >= MAX_DEPTH:
return
# Process each dependency
for dep in symbol.dependencies:
dep_symbol = None
# Handle different dependency types
if isinstance(dep, Symbol):
# Direct symbol reference
dep_symbol = dep
elif isinstance(dep, Import):
# Import statement - get resolved symbol
dep_symbol = dep.resolved_symbol if dep.resolved_symbol else None
if dep_symbol:
# Add node with appropriate styling
G.add_node(dep_symbol,
color=COLOR_PALETTE.get(dep_symbol.__class__.__name__,
"#f694ff"))
# Add dependency relationship
G.add_edge(symbol, dep_symbol)
# Recurse unless it's a class (avoid complexity)
if not isinstance(dep_symbol, PyClass):
create_dependencies_visualization(dep_symbol, depth + 1)
```
### Visualizing the Graph
Finally, we can visualize our dependency graph starting from a specific symbol:
```python
# Get target symbol
target_func = codebase.get_function("get_query_runner")
# Add root node
G.add_node(target_func, color=COLOR_PALETTE["StartFunction"])
# Generate dependency graph
create_dependencies_visualization(target_func)
# Render visualization
codebase.visualize(G)
```
### Take a look
View on [codegen.sh](https://www.codegen.sh/codemod/39a36f0c-9d35-4666-9db7-12ae7c28fc17/public/diff)
## Blast Radius visualization
Understanding the impact of code changes is crucial for safe refactoring. A blast radius visualization shows how changes to one function might affect other parts of the codebase by tracing usage relationships. In this tutorial we will create a blast radius visualization of the `export_asset` function. View the source code [here](https://github.com/PostHog/posthog/blob/c2986d9ac7502aa107a4afbe31b3633848be6582/posthog/tasks/exporter.py#L57).
### Basic Setup
We'll use the same basic setup as the [Call Trace Visualization](/tutorials/codebase-visualization#call-trace-visualization) tutorial.
### Helper Functions
We'll create some utility functions to help build our visualization:
```python
# List of HTTP methods to highlight
HTTP_METHODS = ["get", "put", "patch", "post", "head", "delete"]
def generate_edge_meta(usage: Usage) -> dict:
"""Generate metadata for graph edges
Args:
usage (Usage): Usage relationship information
Returns:
dict: Edge metadata including name and location
"""
return {
"name": usage.match.source,
"file_path": usage.match.filepath,
"start_point": usage.match.start_point,
"end_point": usage.match.end_point,
"symbol_name": usage.match.__class__.__name__
}
def is_http_method(symbol: PySymbol) -> bool:
"""Check if a symbol is an HTTP endpoint method
Args:
symbol (PySymbol): Symbol to check
Returns:
bool: True if symbol is an HTTP method
"""
if isinstance(symbol, PyFunction) and symbol.is_method:
return symbol.name in HTTP_METHODS
return False
```
### Building the Blast Radius Visualization
The main function for creating our blast radius visualization:
```python
def create_blast_radius_visualization(symbol: PySymbol, depth: int = 0):
"""Create visualization of symbol usage relationships
Args:
symbol (PySymbol): Starting symbol to analyze
depth (int): Current recursion depth
"""
# Prevent excessive recursion
if depth >= MAX_DEPTH:
return
# Process each usage of the symbol
for usage in symbol.usages:
usage_symbol = usage.usage_symbol
# Determine node color based on type
if is_http_method(usage_symbol):
color = COLOR_PALETTE.get("HTTP_METHOD")
else:
color = COLOR_PALETTE.get(usage_symbol.__class__.__name__, "#f694ff")
# Add node and edge to graph
G.add_node(usage_symbol, color=color)
G.add_edge(symbol, usage_symbol, **generate_edge_meta(usage))
# Recursively process usage symbol
create_blast_radius_visualization(usage_symbol, depth + 1)
```
### Visualizing the Graph
Finally, we can create our blast radius visualization:
```python
# Get target function to analyze
target_func = codebase.get_function('export_asset')
# Add root node
G.add_node(target_func, color=COLOR_PALETTE.get("StartFunction"))
# Build the visualization
create_blast_radius_visualization(target_func)
# Render graph to show impact flow
# Note: a -> b means changes to a will impact b
codebase.visualize(G)
```
### Take a look
View on [codegen.sh](https://www.codegen.sh/codemod/d255db6c-9a86-4197-9b78-16c506858a3b/public/diff)
## What's Next?
Learn how to use Codegen to create modular codebases.
Learn how to use Codegen to delete dead code.
Learn how to use Codegen to increase type coverage.
Explore the complete API documentation for all Codegen classes and methods.
---
title: "Migrating APIs"
sidebarTitle: "API Migrations"
icon: "webhook"
iconType: "solid"
---
API migrations are a common task in large codebases. Whether you're updating a deprecated function, changing parameter names, or modifying return types, Codegen makes it easy to update all call sites consistently.
## Common Migration Scenarios
### Renaming Parameters
When updating parameter names across an API, you need to update both the function definition and all call sites:
```python
# Find the API function to update
api_function = codebase.get_function("process_data")
# Update the parameter name
old_param = api_function.get_parameter("input")
old_param.rename("data")
# All call sites are automatically updated:
# process_data(input="test") -> process_data(data="test")
```
See [dependencies and usages](/building-with-codegen/dependencies-and-usages) for more on updating parameter names and types.
### Adding Required Parameters
When adding a new required parameter to an API:
```python
# Find all call sites before modifying the function
call_sites = list(api_function.call_sites)
# Add the new parameter
api_function.add_parameter("timeout: int")
# Update all existing call sites to include the new parameter
for call in call_sites:
call.add_argument("timeout=30") # Add with a default value
```
See [function calls and callsites](/building-with-codegen/function-calls-and-callsites) for more on handling call sites.
### Changing Parameter Types
When updating parameter types:
```python
# Update the parameter type
param = api_function.get_parameter("user_id")
param.type = "UUID" # Change from string to UUID
# Find all call sites that need type conversion
for call in api_function.call_sites:
arg = call.get_arg_by_parameter_name("user_id")
if arg:
# Convert string to UUID
arg.edit(f"UUID({arg.value})")
```
See [working with type annotations](/building-with-codegen/type-annotations) for more on changing parameter types.
### Deprecating Functions
When deprecating an old API in favor of a new one:
```python
old_api = codebase.get_function("old_process_data")
new_api = codebase.get_function("new_process_data")
# Add deprecation warning
old_api.add_decorator('@deprecated("Use new_process_data instead")')
# Update all call sites to use the new API
for call in old_api.call_sites:
# Map old arguments to new parameter names
args = [
f"data={call.get_arg_by_parameter_name('input').value}",
f"timeout={call.get_arg_by_parameter_name('wait').value}"
]
# Replace the old call with the new API
call.replace(f"new_process_data({', '.join(args)})")
```
## Bulk Updates to Method Chains
When updating chained method calls, like database queries or builder patterns:
```python
# Find all query chains ending with .execute()
for execute_call in codebase.function_calls:
if execute_call.name != "execute":
continue
# Get the full chain
chain = execute_call.call_chain
# Example: Add .timeout() before .execute()
if "timeout" not in {call.name for call in chain}:
execute_call.insert_before("timeout(30)")
```
## Handling Breaking Changes
When making breaking changes to an API, it's important to:
1. Identify all affected call sites
2. Make changes consistently
3. Update related documentation
4. Consider backward compatibility
Here's a comprehensive example:
```python
def migrate_api_v1_to_v2(codebase):
old_api = codebase.get_function("create_user_v1")
# Document all existing call patterns
call_patterns = {}
for call in old_api.call_sites:
args = [arg.source for arg in call.args]
pattern = ", ".join(args)
call_patterns[pattern] = call_patterns.get(pattern, 0) + 1
print("Found call patterns:")
for pattern, count in call_patterns.items():
print(f" {pattern}: {count} occurrences")
# Create new API version
new_api = old_api.copy()
new_api.rename("create_user_v2")
# Update parameter types
new_api.get_parameter("email").type = "EmailStr"
new_api.get_parameter("role").type = "UserRole"
# Add new required parameters
new_api.add_parameter("tenant_id: UUID")
# Update all call sites
for call in old_api.call_sites:
# Get current arguments
email_arg = call.get_arg_by_parameter_name("email")
role_arg = call.get_arg_by_parameter_name("role")
# Build new argument list with type conversions
new_args = [
f"email=EmailStr({email_arg.value})",
f"role=UserRole({role_arg.value})",
"tenant_id=get_current_tenant_id()"
]
# Replace old call with new version
call.replace(f"create_user_v2({', '.join(new_args)})")
# Add deprecation notice to old version
old_api.add_decorator('@deprecated("Use create_user_v2 instead")')
# Run the migration
migrate_api_v1_to_v2(codebase)
```
## Best Practices
1. **Analyze First**: Before making changes, analyze all call sites to understand usage patterns
```python
# Document current usage
for call in api.call_sites:
print(f"Called from: {call.parent_function.name}")
print(f"With args: {[arg.source for arg in call.args]}")
```
2. **Make Atomic Changes**: Update one aspect at a time
```python
# First update parameter names
param.rename("new_name")
# Then update types
param.type = "new_type"
# Finally update call sites
for call in api.call_sites:
# ... update calls
```
3. **Maintain Backwards Compatibility**:
```python
# Add new parameter with default
api.add_parameter("new_param: str = None")
# Later make it required
api.get_parameter("new_param").remove_default()
```
4. **Document Changes**:
```python
# Add clear deprecation messages
old_api.add_decorator('''@deprecated(
"Use new_api() instead. Migration guide: docs/migrations/v2.md"
)''')
```
Remember to test thoroughly after making bulk changes to APIs. While Codegen ensures syntactic correctness, you'll want to verify the semantic correctness of the changes.
---
title: "Organizing Your Codebase"
sidebarTitle: "Organization"
icon: "folder-tree"
iconType: "solid"
---
Codegen SDK provides a powerful set of tools for deterministically moving code safely and efficiently. This guide will walk you through the basics of moving code with Codegen SDK.
Common use cases include:
```python
print(f"π Processing file: {filepath}")
file = codebase.get_file(filepath)
# Get the directory path for creating new files
dir_path = file.directory.path if file.directory else ""
# Iterate through all functions in the file
for function in file.functions:
# Create new filename based on function name
new_filepath = f"{dir_path}/{function.name}.py"
print(f"π Creating new file: {new_filepath}")
# Create the new file
new_file = codebase.create_file(new_filepath)
# Move the function to the new file, including dependencies
print(f"β‘οΈ Moving function: {function.name}")
function.move_to_file(new_file, include_dependencies=True)
```
```python
# Dictionary to track modules and their functions
module_map = {
"utils": lambda f: f.name.startswith("util_") or f.name.startswith("helper_"),
"api": lambda f: f.name.startswith("api_") or f.name.startswith("endpoint_"),
"data": lambda f: f.name.startswith("data_") or f.name.startswith("db_"),
"core": lambda f: True # Default module for other functions
}
print("π Starting code organization...")
# Create module directories if they don't exist
for module in module_map.keys():
if not codebase.has_directory(module):
print(f"π Creating module directory: {module}")
codebase.create_directory(module, exist_ok=True)
# Process each file in the codebase
for file in codebase.files:
print(f"\nπ Processing file: {file.filepath}")
# Skip if file is already in a module directory
if any(file.filepath.startswith(module) for module in module_map.keys()):
continue
# Process each function in the file
for function in file.functions:
# Determine which module this function belongs to
target_module = next(
(module for module, condition in module_map.items()
if condition(function)),
"core"
)
# Create the new file path
new_filepath = f"{target_module}/{function.name}.py"
print(f" β‘οΈ Moving {function.name} to {target_module} module")
# Create new file and move function
if not codebase.has_file(new_filepath):
new_file = codebase.create_file(new_filepath)
function.move_to_file(new_file, include_dependencies=True)
print("\nβ
Code organization complete!")
```
```python
# Create a graph to detect cycles
import networkx as nx
# Build dependency graph
G = nx.DiGraph()
# Add edges for imports between files
for file in codebase.files:
for imp in file.imports:
if imp.from_file:
G.add_edge(file.filepath, imp.from_file.filepath)
# Find cycles in the graph
cycles = list(nx.simple_cycles(G))
if not cycles:
print("β
No import cycles found!")
exit()
print(f"π Found {len(cycles)} import cycles")
# Process each cycle
for cycle in cycles:
print(f"\nβ Processing cycle: {' -> '.join(cycle)}")
# Get the first two files in the cycle
file1 = codebase.get_file(cycle[0])
file2 = codebase.get_file(cycle[1])
# Find functions in file1 that are used by file2
for function in file1.functions:
if any(usage.file == file2 for usage in function.usages):
# Create new file for the shared function
new_filepath = f"shared/{function.name}.py"
print(f" β‘οΈ Moving {function.name} to {new_filepath}")
if not codebase.has_directory("shared"):
codebase.create_directory("shared")
new_file = codebase.create_file(new_filepath)
function.move_to_file(new_file, include_dependencies=True)
print("\nβ
Import cycles resolved!")
```
Most operations in Codegen will automatically handle updaging
[dependencies](/building-with-codegen/dependencies-and-usages) and
[imports](/building-with-codegen/imports). See [Moving
Symbols](/building-with-codegen/moving-symbols) to learn more.
## Basic Symbol Movement
To move a symbol from one file to another, you can use the [move_to_file](/api-reference/core/Function#move-to-file) method.
```python python
# Get the symbol
symbol_to_move = source_file.get_symbol("my_function")
# Pick a destination file
dst_file = codebase.get_file("path/to/dst/location.py")
# Move the symbol, move all of its dependencies with it (remove from old file), and add an import of symbol into old file
symbol_to_move.move_to_file(dst_file, include_dependencies=True, strategy="add_back_edge")
```
```python typescript
# Get the symbol
symbol_to_move = source_file.get_symbol("myFunction")
# Pick a destination file
dst_file = codebase.get_file("path/to/dst/location.ts")
# Move the symbol, move all of its dependencies with it (remove from old file), and add an import of symbol into old file
symbol_to_move.move_to_file(dst_file, include_dependencies=True, strategy="add_back_edge")
```
This will move `my_function` to `path/to/dst/location.py`, safely updating all references to it in the process.
## Updating Imports
After moving a symbol, you may need to update imports throughout your codebase. GraphSitter offers two strategies for this:
1. **Update All Imports**: This strategy updates all imports across the codebase to reflect the new location of the symbol.
```python python
symbol_to_move = codebase.get_symbol("symbol_to_move")
dst_file = codebase.create_file("new_file.py")
symbol_to_move.move_to_file(dst_file, strategy="update_all_imports")
```
```python typescript
symbol_to_move = codebase.get_symbol("symbolToMove")
dst_file = codebase.create_file("new_file.ts")
symbol_to_move.move_to_file(dst_file, strategy="update_all_imports")
```
Updating all imports can result in very large PRs
2. **Add Back Edge**: This strategy adds an import in the original file that re-imports (and exports) the moved symbol, maintaining backwards compatibility. This will result in fewer total modifications, as existing imports will not need to be updated.
```python python
symbol_to_move = codebase.get_symbol("symbol_to_move")
dst_file = codebase.create_file("new_file.py")
symbol_to_move.move_to_file(dst_file, strategy="add_back_edge")
```
```python typescript
symbol_to_move = codebase.get_symbol("symbolToMove")
dst_file = codebase.create_file("new_file.ts")
symbol_to_move.move_to_file(dst_file, strategy="add_back_edge")
```
## Handling Dependencies
By default, Codegen will move all of a symbols dependencies along with it. This ensures that your codebase remains consistent and functional.
```python python
my_symbol = codebase.get_symbol("my_symbol")
dst_file = codebase.create_file("new_file.py")
my_symbol.move_to_file(dst_file, include_dependencies=True)
```
```python typescript
my_symbol = codebase.get_symbol("mySymbol")
dst_file = codebase.create_file("new_file.ts")
my_symbol.move_to_file(dst_file, include_dependencies=True)
```
If you set `include_dependencies=False`, only the symbol itself will be moved, and any dependencies will remain in the original file.
## Moving Multiple Symbols
If you need to move multiple symbols, you can do so in a loop:
```python
source_file = codebase.get_file("path/to/source_file.py")
dest_file = codebase.get_file("path/to/destination_file.py")
# Create a list of symbols to move
symbols_to_move = [source_file.get_function("my_function"), source_file.get_class("MyClass")]
# Move each symbol to the destination file
for symbol in symbols_to_move:
symbol.move_to_file(dest_file, include_dependencies=True, strategy="update_all_imports")
```
## Best Practices
1. **Commit After Major Changes**: If you're making multiple significant changes, use `codebase.commit()` between them to ensure the codebase graph is up-to-date.
2. **Re-fetch References**: After a commit, re-fetch any file or symbol references you're working with, as they may have become stale.
3. **Handle Errors**: Be prepared to handle cases where symbols or files might not exist, or where moves might fail due to naming conflicts.
By following these guidelines, you can effectively move symbols around your codebase while maintaining its integrity and functionality.
---
title: "Converting Promise Chains to Async/Await"
sidebarTitle: "Promise to Async/Await"
icon: "code-merge"
iconType: "solid"
---
Modern JavaScript/TypeScript codebases often need to migrate from Promise-based code to the more readable async/await syntax. Codegen provides powerful tools to automate this conversion while preserving business logic and handling complex scenarios.
You can find the complete example code in our [examples repository](https://github.com/codegen-sh/codegen-sdk/blob/develop/codegen-examples/examples/promises_to_async_await/promises_to_async_await.ipynb).
## Finding Promise Chains
Codegen offers multiple ways to locate Promise chains in your codebase:
- In files
- In functions
- Part of a function call chain
### Promise Chains in a File
Find all Promise chains in a file:
```python
ts_file = codebase.get_file("api_client.ts")
promise_chains = ts_file.promise_chains
print(f"Found {len(promise_chains)} Promise chains")
```
### Promise Chains in a Function
Find Promise chains within a specific function:
```python
ts_func = codebase.get_function("getUserData")
chains = ts_func.promise_chains
for chain in chains:
print(f"Found chain starting with: {chain.name}")
```
### Promise Chain starting from a Function Call
Find Promise chains starting from a specific function call:
```python
# Assuming the function call is part of a promise chain
fetch_call = codebase.get_function("fetchUserData").function_calls[2]
chain = fetch_call.promise_chain
```
## Converting Promise Chains
### In-Place Conversion
Convert Promise chains directly in your codebase:
```python
# Find and convert all Promise chains in a file
for chain in typescript_file.promise_chains:
chain.convert_to_async_await()
```
### Handle Business Logic Without In-Place Edit
Generate the transformed code without inplace edit by returning the new code as a string. This is useful when you want to add additional business logic to the overall conversion.
```python
async_await_code = chain.convert_to_async_await(inplace_edit=False)
print("Converted code:", async_await_code)
promise_statement = chain.parent_statement
new_code = promise_statement.edit(
f"""
{async_await_code}
// handle additional business logic here
"""
)
```
## Supported Promise Chain Patterns
- Basic `promise.then()` statements of any length
- Catch `promise.then().catch()` statements of any length
- Finally `promise.then().catch().finally()` statements of any length
- Desctructure `promise.then((var1, var2))` statements -> `let [var1, var2] = await statement;`
- Implicit returns -> `return promise.then(() => console.log("hello"))`
- Top level variable assignments -> `let assigned_var = promise.then()`
- Top level variable assignments -> `let assigned_var = promise.then()`
- Ambiguous/conditional return blocks
A list of all the covered cases can be found in the [example notebook](https://github.com/codegen-sh/codegen-sdk/tree/codegen-examples/examples/promises_to_async_await/promise_to_async_await.ipynb).
## Examples
### 1. Basic Promise Chains
```typescript
// Before
function getValue(): Promise {
return Promise.resolve(10)
.then(value => value * 2);
}
```
***Applying the conversion...***
```python
promise_chain = codebase.get_function("getValue").promise_chains[0]
promise_chain.convert_to_async_await()
codebase.commit()
```
```typescript
// After
async function getValue(): Promise {
let value = await Promise.resolve(10);
return value * 2;
}
```
### 2. Error Handling with Catch/Finally
```typescript
// Before
function processData(): Promise {
return fetchData()
.then(data => processData(data))
.catch(error => {
console.error("Error:", error);
throw error;
})
.finally(() => {
cleanup();
});
}
```
***Applying the conversion...***
```python
promise_chain = codebase.get_function("processData").promise_chains[0]
promise_chain.convert_to_async_await()
codebase.commit()
```
```typescript
// After
async function processData(): Promise {
try {
let data = await fetchData();
return processData(data);
} catch (error) {
console.error("Error:", error);
throw error;
} finally {
cleanup();
}
}
```
### 3. Promise.all with Destructuring
```typescript
// Before
function getAllUserInfo(userId: number) {
return Promise.all([
fetchUserData(userId),
fetchUserPosts(userId)
]).then(([user, posts]) => {
return { user, posts };
});
}
```
***Applying the conversion...***
```python
promise_chain = codebase.get_function("getAllUserInfo").promise_chains[0]
promise_chain.convert_to_async_await()
codebase.commit()
```
```typescript
// After
async function getAllUserInfo(userId: number) {
const [user, posts] = await Promise.all([
fetchUserData(userId),
fetchUserPosts(userId)
]);
return { user, posts };
}
```
### 4. Handling Ambiguous Returns Using Anonymous functions
For `then` blocks that have more than one return statement, Codegen will add an anonymous function to handle the ambiguous return to guarantee a deterministic conversion.
```typescript
// Before
function create(opts: any): Promise {
let qResponse = request(opts);
qResponse = qResponse.then(function success(response) {
if (response.statusCode < 200 || response.statusCode >= 300) {
throw new Error(JSON.stringify(response));
}
if (typeof response.body === "string") {
return JSON.parse(response.body);
}
return response.body;
});
return qResponse;
}
```
***Applying the conversion...***
```python
promise_chain = codebase.get_function("create").promise_chains[0]
promise_chain.convert_to_async_await()
codebase.commit()
```
```typescript
// After
async function create(opts): Promise {
let qResponse = request(opts);
let response = await qResponse;
qResponse = (async (response) => {
if (response.statusCode < 200 || response.statusCode >= 300) {
throw new Error(JSON.stringify(response));
}
if (typeof response.body === "string") {
return JSON.parse(response.body);
}
return response.body;
})(response);
return qResponse;
}
```
## Handling Top-Level Assignment Variables
When converting Promise chains that involve top-level assignment variables, you can specify the variable name of your choice or pick the default which is the original variable assignment name.
```python
# Convert with custom variable names for clarity
chain.convert_to_async_await(
assignment_variable_name="operationResult",
)
```
## Next Steps
Converting Promise chains to async/await improves code readability and maintainability. Codegen's tools make this migration process automated and reliable, handling complex cases while preserving business logic.
Here are some next steps to ensure a successful migration:
1. Ensure to run `npx prettier --write .` after the migration to fix indentation + linting
2. **Incremental Migration**: Convert one module at a time
3. **Handle Additional Business Logic**: Use `.promise_statement.edit()` to modify the entire chain and handle external business logic
4. If the specific conversion case is not covered, open an issue on the [Codegen](https://github.com/codegen-sh/codegen-sdk) repository or try to right your own transformation logic using the codegen-sdk
---
title: "Improving Code Modularity"
sidebarTitle: "Modularity"
icon: "diagram-project"
iconType: "solid"
---
Codegen SDK provides powerful tools for analyzing and improving code modularity. This guide will help you identify and fix common modularity issues like circular dependencies, tight coupling, and poorly organized imports.
Common use cases include:
- Breaking up circular dependencies
- Organizing imports and exports
- Identifying highly coupled modules
- Extracting shared code into common modules
- Analyzing module boundaries
## Analyzing Import Relationships
First, let's see how to analyze import relationships in your codebase:
```python
import networkx as nx
from collections import defaultdict
# Create a graph of file dependencies
def create_dependency_graph():
G = nx.DiGraph()
for file in codebase.files:
# Add node for this file
G.add_node(file.filepath)
# Add edges for each import
for imp in file.imports:
if imp.from_file: # Skip external imports
G.add_edge(file.filepath, imp.from_file.filepath)
return G
# Create and analyze the graph
graph = create_dependency_graph()
# Find circular dependencies
cycles = list(nx.simple_cycles(graph))
if cycles:
print("π Found circular dependencies:")
for cycle in cycles:
print(f" β’ {' -> '.join(cycle)}")
# Calculate modularity metrics
print("\nπ Modularity Metrics:")
print(f" β’ Number of files: {len(graph.nodes)}")
print(f" β’ Number of imports: {len(graph.edges)}")
print(f" β’ Average imports per file: {len(graph.edges)/len(graph.nodes):.1f}")
```
## Breaking Circular Dependencies
When you find circular dependencies, here's how to break them:
```python
def break_circular_dependency(cycle):
# Get the first two files in the cycle
file1 = codebase.get_file(cycle[0])
file2 = codebase.get_file(cycle[1])
# Create a shared module for common code
shared_dir = "shared"
if not codebase.has_directory(shared_dir):
codebase.create_directory(shared_dir)
# Find symbols used by both files
shared_symbols = []
for symbol in file1.symbols:
if any(usage.file == file2 for usage in symbol.usages):
shared_symbols.append(symbol)
# Move shared symbols to a new file
if shared_symbols:
shared_file = codebase.create_file(f"{shared_dir}/shared_types.py")
for symbol in shared_symbols:
symbol.move_to_file(shared_file, strategy="update_all_imports")
# Break each cycle found
for cycle in cycles:
break_circular_dependency(cycle)
```
## Organizing Imports
Clean up and organize imports across your codebase:
```python
def organize_file_imports(file):
# Group imports by type
std_lib_imports = []
third_party_imports = []
local_imports = []
for imp in file.imports:
if imp.is_standard_library:
std_lib_imports.append(imp)
elif imp.is_third_party:
third_party_imports.append(imp)
else:
local_imports.append(imp)
# Sort each group
for group in [std_lib_imports, third_party_imports, local_imports]:
group.sort(key=lambda x: x.module_name)
# Remove all existing imports
for imp in file.imports:
imp.remove()
# Add imports back in organized groups
if std_lib_imports:
for imp in std_lib_imports:
file.add_import(imp.source)
file.insert_after_imports("") # Add newline
if third_party_imports:
for imp in third_party_imports:
file.add_import(imp.source)
file.insert_after_imports("") # Add newline
if local_imports:
for imp in local_imports:
file.add_import(imp.source)
# Organize imports in all files
for file in codebase.files:
organize_file_imports(file)
```
## Identifying Highly Coupled Modules
Find modules that might need to be split up:
```python
from collections import defaultdict
def analyze_module_coupling():
coupling_scores = defaultdict(int)
for file in codebase.files:
# Count unique files imported from
imported_files = {imp.from_file for imp in file.imports if imp.from_file}
coupling_scores[file.filepath] = len(imported_files)
# Count files that import this file
importing_files = {usage.file for symbol in file.symbols
for usage in symbol.usages if usage.file != file}
coupling_scores[file.filepath] += len(importing_files)
# Sort by coupling score
sorted_files = sorted(coupling_scores.items(),
key=lambda x: x[1],
reverse=True)
print("\nπ Module Coupling Analysis:")
print("\nMost coupled files:")
for filepath, score in sorted_files[:5]:
print(f" β’ {filepath}: {score} connections")
analyze_module_coupling()
```
## Extracting Shared Code
When you find highly coupled modules, extract shared code:
```python
def extract_shared_code(file, min_usages=3):
# Find symbols used by multiple files
for symbol in file.symbols:
# Get unique files using this symbol
using_files = {usage.file for usage in symbol.usages
if usage.file != file}
if len(using_files) >= min_usages:
# Create appropriate shared module
module_name = determine_shared_module(symbol)
if not codebase.has_file(f"shared/{module_name}.py"):
shared_file = codebase.create_file(f"shared/{module_name}.py")
else:
shared_file = codebase.get_file(f"shared/{module_name}.py")
# Move symbol to shared module
symbol.move_to_file(shared_file, strategy="update_all_imports")
def determine_shared_module(symbol):
# Logic to determine appropriate shared module name
if symbol.is_type:
return "types"
elif symbol.is_constant:
return "constants"
elif symbol.is_utility:
return "utils"
else:
return "common"
```
---
title: "Managing Feature Flags"
sidebarTitle: "Feature Flags"
icon: "flag"
iconType: "solid"
---
Codegen has been used in production for multi-million line codebases to automatically delete "dead" (rolled-out) feature flags. This guide will walk you through analyzing feature flag usage and safely removing rolled out flags.
Every codebase does feature flags differently. This guide shows common techniques and syntax but likely requires adaptation to codebase-specific circumstances.
## Analyzing Feature Flag Usage
Before removing a feature flag, it's important to analyze its usage across the codebase. Codegen provides tools to help identify where and how feature flags are used.
### For Python Codebases
For Python codebases using a `FeatureFlags` class pattern like so:
```python
class FeatureFlags:
FEATURE_1 = False
FEATURE_2 = True
```
You can use [Class.get_attribute(...)](/api-reference/core/Class#get-attribute) and [Attribute.usages](/api-reference/core/Attribute#usages) to analyze the coverage of your flags, like so:
```python
feature_flag_usage = {}
feature_flag_class = codebase.get_class('FeatureFlag')
if feature_flag_class:
# Initialize usage count for all attributes
for attr in feature_flag_class.attributes:
feature_flag_usage[attr.name] = 0
# Get all usages of the FeatureFlag class
for usage in feature_flag_class.usages:
usage_source = usage.usage_symbol.source if hasattr(usage, 'usage_symbol') else str(usage)
for flag_name in feature_flag_usage.keys():
if f"FeatureFlag.{flag_name}" in usage_source:
feature_flag_usage[flag_name] += 1
sorted_flags = sorted(feature_flag_usage.items(), key=lambda x: x[1], reverse=True)
print("Feature Flag Usage Table:")
print("-------------------------")
print(f"{'Feature Flag':<30} | {'Usage Count':<12}")
print("-" * 45)
for flag, count in sorted_flags:
print(f"{flag:<30} | {count:<12}")
print(f"\nTotal feature flags: {len(sorted_flags)}")
else:
print("β FeatureFlag enum not found in the codebase")
```
This will output a table showing all feature flags and their usage counts, helping identify which flags are candidates for removal.
Learn more about [Attributes](/building-with-codegen/class-api#class-attributes) and [tracking usages](/building-with-codegen/dependencies-and-usages) here
## Removing Rolled Out Flags
Once you've identified a flag that's ready to be removed, Codegen can help safely delete it and its associated code paths.
This primarily leverages Codegen's API for [reduction conditions](/building-with-codegen/reducing-conditions)
### Python Example
For Python codebases, here's how to remove a feature flag and its usages:
```python
flag_name = "FEATURE_TO_REMOVE"
# Get the feature flag variable
feature_flag_file = codebase.get_file("app/utils/feature_flags.py")
flag_class = feature_flag_file.get_class("FeatureFlag")
# Check if the flag exists
flag_var = flag_class.get_attribute(flag_name)
if not flag_var:
print(f'No such flag: {flag_name}')
return
# Remove all usages of the feature flag
for usage in flag_var.usages:
if isinstance(usage.parent, IfBlockStatement):
# For if statements, reduce the condition to True
usage.parent.reduce_condition(True)
elif isinstance(usage.parent, WithStatement):
# For with statements, keep the code block
usage.parent.code_block.unwrap()
else:
# For other cases, remove the usage
usage.remove()
# Remove the flag definition
flag_var.remove()
# Commit changes
codebase.commit()
```
### React/TypeScript Example
For React applications using a hooks-based feature flag system:
```python
feature_flag_name = "NEW_UI_ENABLED"
target_value = True # The value to reduce the flag to
print(f'Removing feature flag: {feature_flag_name}')
# 1. Remove from configuration
config_file = codebase.get_file("src/featureFlags/config.ts")
feature_flag_config = config_file.get_symbol("FEATURE_FLAG_CONFIG").value
if feature_flag_name in feature_flag_config.keys():
feature_flag_config.pop(feature_flag_name)
print('β
Removed from feature flag config')
# 2. Find and reduce all hook usages
hook = codebase.get_function("useFeatureFlag")
for usage in hook.usages:
fcall = usage.match
if isinstance(fcall, FunctionCall):
# Check if this usage is for our target flag
first_arg = fcall.args[0].value
if isinstance(first_arg, String) and first_arg.content == feature_flag_name:
print(f'Reducing in: {fcall.parent_symbol.name}')
# This automatically handles:
# - Ternary expressions: flag ? :
# - If statements: if (flag) { ... }
# - Conditional rendering: {flag && }
fcall.reduce_condition(target_value)
# 3. Commit changes
codebase.commit()
```
This will:
1. Remove the feature flag from the configuration
2. Find all usages of the `useFeatureFlag` hook for this flag
3. Automatically reduce any conditional logic using the flag
4. Handle common React patterns like ternaries and conditional rendering
## Related Resources
- [Reducing Conditions](/building-with-codegen/reducing-conditions) - Details on condition reduction APIs
- [Dead Code Removal](/tutorials/deleting-dead-code) - Remove unused code after flag deletion
---
title: "Deleting Dead Code"
sidebarTitle: "Dead Code"
icon: "trash"
iconType: "solid"
---
Dead code refers to code that is not being used or referenced anywhere in your codebase.
However, it's important to note that some code might appear unused but should not be deleted, including:
- Test files and test functions
- Functions with decorators (which may be called indirectly)
- Public API endpoints
- Event handlers or callback functions
- Code used through reflection or dynamic imports
This guide will show you how to safely identify and remove genuinely unused code while preserving important functionality.
## Overview
To simply identify code without any external usages, you can check for the absence of [Symbol.usages](/api-reference/core/Symbol#usages).
See [Dependencies and Usages](/building-with-codegen/dependencies-and-usages) for more information on how to use these properties.
```python
# Iterate through all functions in the codebase
for function in codebase.functions:
# Remove functions with no usages
if not function.usages:
function.remove()
# Commit
codebase.commit()
```
This will remove all code that is not explicitly referenced elsewhere, including tests, endpoints, etc. This is almost certainly not what you want. We recommend further filtering.
## Filtering for Special Cases
To filter out special cases that are not explicitly referenced yet are, nonetheless, worth keeping around, you can use the following pattern:
```python
for function in codebase.functions:
# Skip test files
if "test" in function.file.filepath:
continue
# Skip decorated functions
if function.decorators:
continue
# Skip public routes, e.g. next.js endpoints
# (Typescript only)
if 'routes' in function.file.filepath and function.is_jsx:
continue
# ... etc.
# Check if the function has no usages and no call sites
if not function.usages and not function.call_sites:
# Print a message indicating the removal of the function
print(f"Removing unused function: {function.name}")
# Remove the function from the file
function.remove()
# Commit
codebase.commit()
```
## Cleaning Up Unused Variables
To remove unused variables, you can check for their usages within their scope:
```python typescript
for func in codebase.functions:
# Iterate through local variable assignments in the function
for var_assignments in func.code_block.local_var_assignments:
# Check if the local variable assignment has no usages
if not var_assignments.local_usages:
# Remove the local variable assignment
var_assignments.remove()
# Commit
codebase.commit()
```
## Cleaning Up After Removal
After removing dead code, you may need to clean up any remaining artifacts:
```python
for file in codebase.files:
# Check if the file is empty
if not file.content.strip():
# Print a message indicating the removal of the empty file
print(f"Removing empty file: {file.filepath}")
# Remove the empty file
file.remove()
# commit is NECESSARY to remove the files from the codebase
codebase.commit()
# Remove redundant newlines
for file in codebase.files:
# Replace three or more consecutive newlines with two newlines
file.edit(re.sub(r"\n{3,}", "\n\n", file.content))
```
---
title: "Increasing Type Coverage"
sidebarTitle: "Type Coverage"
icon: "shield-check"
iconType: "solid"
---
This guide demonstrates how to analyze and manipulate type annotations with Codegen SDK.
Common use cases include:
- Adding a type to a union or generic type
- Checking if a generic type has a given subtype
- Resolving a type annotation
Adding type hints can improve developer experience and [significantly speed up](https://github.com/microsoft/Typescript/wiki/Performance#using-type-annotations) programs like the Typescript compiler and `mypy`.
See [Type Annotations](/building-with-codegen/type-annotations) for a general overview of the type maninpulation
## APIs for monitoring types
Codegen programs typically access type annotations through the following APIs:
- [Parameter.type](/api-reference/core/Parameter#type)
- [Function.return_type](/api-reference/python/PyFunction#return-type)
- [Assignment.type](/api-reference/core/Assignment#type)
Each of these has an associated setter.
## Finding the extent of your type coverage
To get an indication of your progress on type coverage, analyze the percentage of typed elements across your codebase
```python
# Initialize counters for parameters
total_parameters = 0
typed_parameters = 0
# Initialize counters for return types
total_functions = 0
typed_returns = 0
# Initialize counters for class attributes
total_attributes = 0
typed_attributes = 0
# Count parameter and return type coverage
for function in codebase.functions:
# Count parameters
total_parameters += len(function.parameters)
typed_parameters += sum(1 for param in function.parameters if param.is_typed)
# Count return types
total_functions += 1
if function.return_type and function.return_type.is_typed:
typed_returns += 1
# Count class attribute coverage
for cls in codebase.classes:
for attr in cls.attributes:
total_attributes += 1
if attr.is_typed:
typed_attributes += 1
# Calculate percentages
param_percentage = (typed_parameters / total_parameters * 100) if total_parameters > 0 else 0
return_percentage = (typed_returns / total_functions * 100) if total_functions > 0 else 0
attr_percentage = (typed_attributes / total_attributes * 100) if total_attributes > 0 else 0
# Print results
print("\nType Coverage Analysis")
print("---------------------")
print(f"Parameters: {param_percentage:.1f}% ({typed_parameters}/{total_parameters} typed)")
print(f"Return types: {return_percentage:.1f}% ({typed_returns}/{total_functions} typed)")
print(f"Class attributes: {attr_percentage:.1f}% ({typed_attributes}/{total_attributes} typed)")
```
This analysis gives you a breakdown of type coverage across three key areas:
1. Function parameters - Arguments passed to functions
2. Return types - Function return type annotations
3. Class attributes - Type hints on class variables
Focus first on adding types to the most frequently used functions and classes, as these will have the biggest impact on type checking and IDE support.
## Adding simple return type annotations
To add a return type, use `function.set_return_type`. The script below will add a `-> None` return type to all functions that contain no return statements:
```python For Python
for file in codebase.files:
# Check if 'app' is in the file's filepath
if "app" in file.filepath:
# Iterate through all functions in the file
for function in file.functions:
# Check if the function has no return statements
if len(function.return_statements) == 0:
# Set the return type to None
function.set_return_type("None")
```
```python For Typescript
for file in codebase.files:
# Check if 'app' is in the file's filepath
if "app" in file.filepath:
# Iterate through all functions in the file
for function in file.functions:
# Check if the function has no return statements
if len(function.return_statements) == 0:
# Set the return type to None
function.set_return_type("null")
```
## Coming Soon: Advanced Type Inference
Codegen is building out an API for direct interface with `tsc` and `mypy` for precise type inference. Interested piloting this API? Let us know!
---
title: "Managing TypeScript Exports"
sidebarTitle: "Export Management"
description: "Safely and systematically manage exports in your TypeScript codebase"
icon: "ship"
iconType: "solid"
---
Codegen provides powerful tools for managing and reorganizing exports in TypeScript codebases. This tutorial builds on the concepts covered in [exports](/building-with-codegen/exports) to show you how to automate common export management tasks and ensure your module boundaries stay clean and maintainable.
## Common Export Management Tasks
### Collecting and Processing Exports
When reorganizing exports, the first step is identifying which exports need to be processed:
```python
processed_imports = set()
for file in codebase.files:
# Only process files under /src/shared
if '/src/shared' not in file.filepath:
continue
# Gather all reexports that are not external exports
all_reexports = []
for export_stmt in file.export_statements:
for export in export_stmt.exports:
if export.is_reexport() and not export.is_external_export:
all_reexports.append(export)
# Skip if there are none
if not all_reexports:
continue
```
### Moving Exports to Public Files
When centralizing exports in public-facing files:
```python
# Replace "src/" with "src/shared/"
resolved_public_file = export.resolved_symbol.filepath.replace("src/", "src/shared/")
# Get relative path from the "public" file back to the original file
relative_path = codebase.get_relative_path(
from_file=resolved_public_file,
to_file=export.resolved_symbol.filepath
)
# Ensure the "public" file exists
if not codebase.has_file(resolved_public_file):
target_file = codebase.create_file(resolved_public_file, sync=True)
else:
target_file = codebase.get_file(resolved_public_file)
# If target file already has a wildcard export for this relative path, skip
if target_file.has_export_statement_for_path(relative_path, "WILDCARD"):
has_wildcard = True
continue
```
### Managing Different Export Types
Codegen can handle all types of exports automatically:
```python
# A) Wildcard export, e.g. `export * from "..."`
if export.is_wildcard_export():
target_file.insert_before(f'export * from "{relative_path}"')
```
```python
# B) Type export, e.g. `export type { Foo, Bar } from "..."`
elif export.is_type_export():
# Does this file already have a type export statement for the path?
statement = file.get_export_statement_for_path(relative_path, "TYPE")
if statement:
# Insert into existing statement
if export.is_aliased():
statement.insert(0, f"{export.resolved_symbol.name} as {export.name}")
else:
statement.insert(0, f"{export.name}")
else:
# Insert a new type export statement
if export.is_aliased():
target_file.insert_before(
f'export type {{ {export.resolved_symbol.name} as {export.name} }} '
f'from "{relative_path}"'
)
else:
target_file.insert_before(
f'export type {{ {export.name} }} from "{relative_path}"'
)
```
```python
# C) Normal export, e.g. `export { Foo, Bar } from "..."`
else:
statement = file.get_export_statement_for_path(relative_path, "EXPORT")
if statement:
# Insert into existing statement
if export.is_aliased():
statement.insert(0, f"{export.resolved_symbol.name} as {export.name}")
else:
statement.insert(0, f"{export.name}")
else:
# Insert a brand-new normal export statement
if export.is_aliased():
target_file.insert_before(
f'export {{ {export.resolved_symbol.name} as {export.name} }} '
f'from "{relative_path}"'
)
else:
target_file.insert_before(
f'export {{ {export.name} }} from "{relative_path}"'
)
```
## Updating Import References
After moving exports, you need to update all import references:
```python
# Now update all import usages that refer to this export
for usage in export.symbol_usages():
if isinstance(usage, TSImport) and usage not in processed_imports:
processed_imports.add(usage)
# Translate the resolved_public_file to the usage file's TS config import path
new_path = usage.file.ts_config.translate_import_path(resolved_public_file)
if has_wildcard and export.name != export.resolved_symbol.name:
name = f"{export.resolved_symbol.name} as {export.name}"
else:
name = usage.name
if usage.is_type_import():
new_import = f'import type {{ {name} }} from "{new_path}"'
else:
new_import = f'import {{ {name} }} from "{new_path}"'
usage.file.insert_before(new_import)
usage.remove()
# Remove the old export from the original file
export.remove()
# If the file ends up with no exports, remove it entirely
if not file.export_statements and len(file.symbols) == 0:
file.remove()
```
## Best Practices
1. **Check for Wildcards First**: Always check for existing wildcard exports before adding new ones:
```python
if target_file.has_export_statement_for_path(relative_path, "WILDCARD"):
has_wildcard = True
continue
```
2. **Handle Path Translations**: Use TypeScript config for path translations:
```python
new_path = usage.file.ts_config.translate_import_path(resolved_public_file)
```
3. **Clean Up Empty Files**: Remove files that no longer contain exports or symbols:
```python
if not file.export_statements and len(file.symbols) == 0:
file.remove()
```
## Next Steps
After reorganizing your exports:
1. Run your test suite to verify everything still works
2. Review the generated import statements
3. Check for any empty files that should be removed
4. Verify that all export types (wildcard, type, named) are working as expected
Remember that managing exports is an iterative process. You may need to run the codemod multiple times as your codebase evolves.
### Related tutorials
- [Moving symbols](/building-with-codegen/moving-symbols)
- [Exports](/building-with-codegen/exports)
- [Dependencies and usages](/building-with-codegen/dependencies-and-usages)
## Complete Codemod
Here's the complete codemod that you can copy and use directly:
```python
processed_imports = set()
for file in codebase.files:
# Only process files under /src/shared
if '/src/shared' not in file.filepath:
continue
# Gather all reexports that are not external exports
all_reexports = []
for export_stmt in file.export_statements:
for export in export_stmt.exports:
if export.is_reexport() and not export.is_external_export:
all_reexports.append(export)
# Skip if there are none
if not all_reexports:
continue
for export in all_reexports:
has_wildcard = False
# Replace "src/" with "src/shared/"
resolved_public_file = export.resolved_symbol.filepath.replace("src/", "src/shared/")
# Get relative path from the "public" file back to the original file
relative_path = codebase.get_relative_path(
from_file=resolved_public_file,
to_file=export.resolved_symbol.filepath
)
# Ensure the "public" file exists
if not codebase.has_file(resolved_public_file):
target_file = codebase.create_file(resolved_public_file, sync=True)
else:
target_file = codebase.get_file(resolved_public_file)
# If target file already has a wildcard export for this relative path, skip
if target_file.has_export_statement_for_path(relative_path, "WILDCARD"):
has_wildcard = True
continue
# Compare "public" path to the local file's export.filepath
if codebase._remove_extension(resolved_public_file) != codebase._remove_extension(export.filepath):
# A) Wildcard export, e.g. `export * from "..."`
if export.is_wildcard_export():
target_file.insert_before(f'export * from "{relative_path}"')
# B) Type export, e.g. `export type { Foo, Bar } from "..."`
elif export.is_type_export():
# Does this file already have a type export statement for the path?
statement = file.get_export_statement_for_path(relative_path, "TYPE")
if statement:
# Insert into existing statement
if export.is_aliased():
statement.insert(0, f"{export.resolved_symbol.name} as {export.name}")
else:
statement.insert(0, f"{export.name}")
else:
# Insert a new type export statement
if export.is_aliased():
target_file.insert_before(
f'export type {{ {export.resolved_symbol.name} as {export.name} }} '
f'from "{relative_path}"'
)
else:
target_file.insert_before(
f'export type {{ {export.name} }} from "{relative_path}"'
)
# C) Normal export, e.g. `export { Foo, Bar } from "..."`
else:
statement = file.get_export_statement_for_path(relative_path, "EXPORT")
if statement:
# Insert into existing statement
if export.is_aliased():
statement.insert(0, f"{export.resolved_symbol.name} as {export.name}")
else:
statement.insert(0, f"{export.name}")
else:
# Insert a brand-new normal export statement
if export.is_aliased():
target_file.insert_before(
f'export {{ {export.resolved_symbol.name} as {export.name} }} '
f'from "{relative_path}"'
)
else:
target_file.insert_before(
f'export {{ {export.name} }} from "{relative_path}"'
)
# Now update all import usages that refer to this export
for usage in export.symbol_usages():
if isinstance(usage, TSImport) and usage not in processed_imports:
processed_imports.add(usage)
# Translate the resolved_public_file to the usage file's TS config import path
new_path = usage.file.ts_config.translate_import_path(resolved_public_file)
if has_wildcard and export.name != export.resolved_symbol.name:
name = f"{export.resolved_symbol.name} as {export.name}"
else:
name = usage.name
if usage.is_type_import():
new_import = f'import type {{ {name} }} from "{new_path}"'
else:
new_import = f'import {{ {name} }} from "{new_path}"'
usage.file.insert_before(new_import)
usage.remove()
# Remove the old export from the original file
export.remove()
# If the file ends up with no exports, remove it entirely
if not file.export_statements and len(file.symbols) == 0:
file.remove()
```
---
title: "Converting Default Exports"
sidebarTitle: "Default Export Conversion"
description: "Convert default exports to named exports in your TypeScript codebase"
icon: "arrow-right-arrow-left"
iconType: "solid"
---
Codegen provides tools to help you migrate away from default exports to named exports in your TypeScript codebase. This tutorial builds on the concepts covered in [exports](/building-with-codegen/exports) to show you how to automate this conversion process.
## Overview
Default exports can make code harder to maintain and refactor. Converting them to named exports provides several benefits:
- Better IDE support for imports and refactoring
- More explicit and consistent import statements
- Easier to track symbol usage across the codebase
## Converting Default Exports
Here's how to convert default exports to named exports:
```python
for file in codebase.files:
target_file = file.filepath
if not target_file:
print(f"β οΈ Target file not found: {filepath}")
continue
# Get corresponding non-shared file
non_shared_path = target_file.filepath.replace('/shared/', '/')
if not codebase.has_file(non_shared_path):
print(f"β οΈ No matching non-shared file for: {filepath}")
continue
non_shared_file = codebase.get_file(non_shared_path)
print(f"π Processing {target_file.filepath}")
# Process individual exports
for export in target_file.exports:
# Handle default exports
if export.is_reexport() and export.is_default_export():
print(f" π Converting default export '{export.name}'")
default_export = next((e for e in non_shared_file.default_exports), None)
if default_export:
default_export.make_non_default()
print(f"β¨ Fixed exports in {target_file.filepath}")
```
## Understanding the Process
Let's break down how this works:
```python
# Process individual exports
for export in target_file.exports:
# Handle default exports
if export.is_reexport() and export.is_default_export():
print(f" π Converting default export '{export.name}'")
```
The code identifies default exports by checking:
1. If it's a re-export (`is_reexport()`)
2. If it's a default export (`is_default_export()`)
```python
default_export = next((e for e in non_shared_file.default_exports), None)
if default_export:
default_export.make_non_default()
```
For each default export:
1. Find the corresponding export in the non-shared file
2. Convert it to a named export using `make_non_default()`
```python
# Get corresponding non-shared file
non_shared_path = target_file.filepath.replace('/shared/', '/')
if not codebase.has_file(non_shared_path):
print(f"β οΈ No matching non-shared file for: {filepath}")
continue
non_shared_file = codebase.get_file(non_shared_path)
```
The code:
1. Maps shared files to their non-shared counterparts
2. Verifies the non-shared file exists
3. Loads the non-shared file for processing
## Best Practices
1. **Check for Missing Files**: Always verify files exist before processing:
```python
if not target_file:
print(f"β οΈ Target file not found: {filepath}")
continue
```
2. **Log Progress**: Add logging to track the conversion process:
```python
print(f"π Processing {target_file.filepath}")
print(f" π Converting default export '{export.name}'")
```
3. **Handle Missing Exports**: Check that default exports exist before converting:
```python
default_export = next((e for e in non_shared_file.default_exports), None)
if default_export:
default_export.make_non_default()
```
## Next Steps
After converting default exports:
1. Run your test suite to verify everything still works
2. Update any import statements that were using default imports
3. Review the changes to ensure all exports were converted correctly
4. Consider adding ESLint rules to prevent new default exports
Remember to test thoroughly after converting default exports, as this change affects how other files import the converted modules.
### Related tutorials
- [Managing typescript exports](/tutorials/managing-typescript-exports)
- [Exports](/building-with-codegen/exports)
- [Dependencies and usages](/building-with-codegen/dependencies-and-usages)
## Complete Codemod
Here's the complete codemod that you can copy and use directly:
```python
for file in codebase.files:
target_file = file.filepath
if not target_file:
print(f"β οΈ Target file not found: {filepath}")
continue
# Get corresponding non-shared file
non_shared_path = target_file.filepath.replace('/shared/', '/')
if not codebase.has_file(non_shared_path):
print(f"β οΈ No matching non-shared file for: {filepath}")
continue
non_shared_file = codebase.get_file(non_shared_path)
print(f"π Processing {target_file.filepath}")
# Process individual exports
for export in target_file.exports:
# Handle default exports
if export.is_reexport() and export.is_default_export():
print(f" π Converting default export '{export.name}'")
default_export = next((e for e in non_shared_file.default_exports), None)
if default_export:
default_export.make_non_default()
print(f"β¨ Fixed exports in {target_file.filepath}")
```
---
title: "Creating Documentation"
sidebarTitle: "Documentation"
icon: "book"
iconType: "solid"
---
This guide demonstrates how to determine docs coverage and create documentation for your codebase.
This primarily leverages two APIs:
- [codebase.ai(...)](/api-reference/core/Codebase#ai) for generating docstrings
- [function.set_docstring(...)](/api-reference/core/HasBlock#set-docstring) for modifying them
## Determining Documentation Coverage
In order to determine the extent of your documentation coverage, you can iterate through all symbols of interest and count the number of docstrings:
To see your current documentation coverage, you can iterate through all symbols of interest and count the number of docstrings:
```python python
# Initialize counters
total_functions = 0
functions_with_docs = 0
total_classes = 0
classes_with_docs = 0
# Check functions
for function in codebase.functions:
total_functions += 1
if function.docstring:
functions_with_docs += 1
# Check classes
for cls in codebase.classes:
total_classes += 1
if cls.docstring:
classes_with_docs += 1
# Calculate percentages
func_coverage = (functions_with_docs / total_functions * 100) if total_functions > 0 else 0
class_coverage = (classes_with_docs / total_classes * 100) if total_classes > 0 else 0
# Print results with emojis
print("\nπ Documentation Coverage Report:")
print(f"\nπ Functions:")
print(f" β’ Total: {total_functions}")
print(f" β’ Documented: {functions_with_docs}")
print(f" β’ Coverage: {func_coverage:.1f}%")
print(f"\nπ Classes:")
print(f" β’ Total: {total_classes}")
print(f" β’ Documented: {classes_with_docs}")
print(f" β’ Coverage: {class_coverage:.1f}%")
print(f"\nπ― Overall Coverage: {((functions_with_docs + classes_with_docs) / (total_functions + total_classes) * 100):.1f}%")
```
Which provides the following output:
```
π Documentation Coverage Report:
π Functions:
β’ Total: 1384
β’ Documented: 331
β’ Coverage: 23.9%
π Classes:
β’ Total: 453
β’ Documented: 91
β’ Coverage: 20.1%
π― Overall Coverage: 23.0%
```
## Identifying Areas of Low Documentation Coverage
To identify areas of low documentation coverage, you can iterate through all directories and count the number of functions with docstrings.
Learn more about [Directories here](/building-with-codegen/files-and-directories).
```python python
# Track directory stats
dir_stats = {}
# Analyze each directory
for directory in codebase.directories:
# Skip test, sql and alembic directories
if any(x in directory.path.lower() for x in ['test', 'sql', 'alembic']):
continue
# Get undecorated functions
funcs = [f for f in directory.functions if not f.is_decorated]
total = len(funcs)
# Only analyze dirs with >10 functions
if total > 10:
documented = sum(1 for f in funcs if f.docstring)
coverage = (documented / total * 100)
dir_stats[directory.path] = {
'total': total,
'documented': documented,
'coverage': coverage
}
# Find lowest coverage directory
if dir_stats:
lowest_dir = min(dir_stats.items(), key=lambda x: x[1]['coverage'])
path, stats = lowest_dir
print(f"π Lowest coverage directory: '{path}'")
print(f" β’ Total functions: {stats['total']}")
print(f" β’ Documented: {stats['documented']}")
print(f" β’ Coverage: {stats['coverage']:.1f}%")
# Print all directory stats for comparison
print("\nπ All directory coverage rates:")
for path, stats in sorted(dir_stats.items(), key=lambda x: x[1]['coverage']):
print(f" '{path}': {stats['coverage']:.1f}% ({stats['documented']}/{stats['total']} functions)")
```
Which provides the following output:
```python
π Lowest coverage directory: 'codegen-backend/app/utils/github_utils/branch'
β’ Total functions: 12
β’ Documented: 0
β’ Coverage: 0.0%
π All directory coverage rates:
'codegen-backend/app/utils/github_utils/branch': 0.0% (0/12 functions)
'codegen-backend/app/utils/slack': 14.3% (2/14 functions)
'codegen-backend/app/modal_app/github': 18.2% (2/11 functions)
'codegen-backend/app/modal_app/slack': 18.2% (2/11 functions)
'codegen-backend/app/utils/github_utils/webhook': 21.4% (6/28 functions)
'codegen-backend/app/modal_app/cron': 23.1% (3/13 functions)
'codegen-backend/app/utils/github_utils': 23.5% (39/166 functions)
'codegen-backend/app/codemod': 25.0% (7/28 functions)
```
## Leveraging AI for Generating Documentation
For non-trivial codebases, it can be challenging to achieve full documentation coverage.
The most efficient way to edit informative docstrings is to use [codebase.ai](/api-reference/core/Codebase#ai) to generate docstrings, then use the [set_docstring](/api-reference/core/HasBlock#set-docstring) method to update the docstring.
Learn more about using AI in our [guides](/building-with-codegen/calling-out-to-llms).
```python python
# Import datetime for timestamp
from datetime import datetime
# Get current timestamp
timestamp = datetime.now().strftime("%B %d, %Y")
print("π Generating and Updating Function Documentation")
# Process all functions in the codebase
for function in codebase.functions:
current_docstring = function.docstring()
if current_docstring:
# Update existing docstring to be more descriptive
new_docstring = codebase.ai(
f"Update the docstring for {function.name} to be more descriptive and comprehensive.",
target=function
)
new_docstring += f"\n\nUpdated on: {timestamp}"
else:
# Generate new docstring for function
new_docstring = codebase.ai(
f"Generate a comprehensive docstring for {function.name} including parameters, return type, and description.",
target=function
)
new_docstring += f"\n\nCreated on: {timestamp}"
# Set the new or updated docstring
function.set_docstring(new_docstring)
```
## Adding Explicit Parameter Names and Types
Alternatively, you can also rely on deterministic string formatting to edit docstrings.
To add "Google-style" parameter names and types to a function docstring, you can use the following code snippet:
```python python
# Iterate through all functions in the codebase
for function in codebase.functions:
# Skip if function already has a docstring
if function.docstring:
continue
# Build parameter documentation
param_docs = []
for param in function.parameters:
param_type = param.type.source if param.is_typed else "Any"
param_docs.append(f" {param.name} ({param_type}): Description of {param.name}")
# Get return type if present
return_type = function.return_type.source if function.return_type else "None"
# Create Google-style docstring
docstring = f'''"""
Description of {function.name}.
Args:
{chr(10).join(param_docs)}
Returns:
{return_type}: Description of return value
"""'''
# Set the new docstring
function.set_docstring(docstring)
```
---
title: "React Modernization"
sidebarTitle: "React Modernization"
icon: "react"
iconType: "brands"
description: "Modernize your React codebase with Codegen"
---
Codegen SDK provides powerful APIs for modernizing React codebases. This guide will walk you through common React modernization patterns.
Common use cases include:
- Upgrading to modern APIs, including React 18+
- Automatically memoizing components
- Converting to modern hooks
- Standardizing prop types
- Organizing components into individual files
and much more.
## Converting Class Components to Functions
Here's how to convert React class components to functional components:
```python
# Find all React class components
for class_def in codebase.classes:
# Skip if not a React component
if not class_def.is_jsx or "Component" not in [base.name for base in class_def.bases]:
continue
print(f"Converting {class_def.name} to functional component")
# Extract state from constructor
constructor = class_def.get_method("constructor")
state_properties = []
if constructor:
for statement in constructor.code_block.statements:
if "this.state" in statement.source:
# Extract state properties
state_properties = [prop.strip() for prop in
statement.source.split("{")[1].split("}")[0].split(",")]
# Create useState hooks for each state property
state_hooks = []
for prop in state_properties:
hook_name = f"[{prop}, set{prop[0].upper()}{prop[1:]}]"
state_hooks.append(f"const {hook_name} = useState(null);")
# Convert lifecycle methods to effects
effects = []
if class_def.get_method("componentDidMount"):
effects.append("""
useEffect(() => {
// TODO: Move componentDidMount logic here
}, []);
""")
if class_def.get_method("componentDidUpdate"):
effects.append("""
useEffect(() => {
// TODO: Move componentDidUpdate logic here
});
""")
# Get the render method
render_method = class_def.get_method("render")
# Create the functional component
func_component = f"""
const {class_def.name} = ({class_def.get_method("render").parameters[0].name}) => {{
{chr(10).join(state_hooks)}
{chr(10).join(effects)}
{render_method.code_block.source}
}}
"""
# Replace the class with the functional component
class_def.edit(func_component)
# Add required imports
file = class_def.file
if not any("useState" in imp.source for imp in file.imports):
file.add_import("import { useState, useEffect } from 'react';")
```
## Migrating to Modern Hooks
Convert legacy patterns to modern React hooks:
```python
# Find components using legacy patterns
for function in codebase.functions:
if not function.is_jsx:
continue
# Look for common legacy patterns
for call in function.function_calls:
# Convert withRouter to useNavigate
if call.name == "withRouter":
# Add useNavigate import
function.file.add_import(
"import { useNavigate } from 'react-router-dom';"
)
# Add navigate hook
function.insert_before_first_return("const navigate = useNavigate();")
# Replace history.push calls
for history_call in function.function_calls:
if "history.push" in history_call.source:
history_call.edit(
history_call.source.replace("history.push", "navigate")
)
# Convert lifecycle methods in hooks
elif call.name == "componentDidMount":
call.parent.edit("""
useEffect(() => {
// Your componentDidMount logic here
}, []);
""")
```
## Standardizing Props
### Inferring Props from Usage
Add proper prop types and TypeScript interfaces based on how props are used:
```python
# Add TypeScript interfaces for props
for function in codebase.functions:
if not function.is_jsx:
continue
# Get props parameter
props_param = function.parameters[0] if function.parameters else None
if not props_param:
continue
# Collect used props
used_props = set()
for prop_access in function.function_calls:
if f"{props_param.name}." in prop_access.source:
prop_name = prop_access.source.split(".")[1]
used_props.add(prop_name)
# Create interface
if used_props:
interface_def = f"""
interface {function.name}Props {{
{chr(10).join(f' {prop}: any;' for prop in used_props)}
}}
"""
function.insert_before(interface_def)
# Update function signature
function.edit(function.source.replace(
f"({props_param.name})",
f"({props_param.name}: {function.name}Props)"
))
```
### Extracting Inline Props
Convert inline prop type definitions to separate type declarations:
```python
# Iterate over all files in the codebase
for file in codebase.files:
# Iterate over all functions in the file
for function in file.functions:
# Check if the function is a React functional component
if function.is_jsx: # Assuming is_jsx indicates a function component
# Check if the function has inline props definition
if len(function.parameters) == 1 and isinstance(function.parameters[0].type, Dict):
# Extract the inline prop type
inline_props: TSObjectType = function.parameters[0].type.source
# Create a new type definition for the props
props_type_name = f"{function.name}Props"
props_type_definition = f"type {props_type_name} = {inline_props};"
# Set the new type for the parameter
function.parameters[0].set_type_annotation(props_type_name)
# Add the new type definition to the file
function.insert_before('\n' + props_type_definition + '\n')
```
This will convert components from:
```typescript
function UserCard({ name, age }: { name: string; age: number }) {
return (
{name} ({age})
);
}
```
To:
```typescript
type UserCardProps = { name: string; age: number };
function UserCard({ name, age }: UserCardProps) {
return (
{name} ({age})
);
}
```
Extracting prop types makes them reusable and easier to maintain. It also
improves code readability by separating type definitions from component logic.
## Updating Fragment Syntax
Modernize React Fragment syntax:
```python
for function in codebase.functions:
if not function.is_jsx:
continue
# Replace React.Fragment with <>
for element in function.jsx_elements:
if element.name == "React.Fragment":
element.edit(element.source.replace(
"",
"<>"
).replace(
"",
">"
))
```
## Organizing Components into Individual Files
A common modernization task is splitting files with multiple components into a more maintainable structure where each component has its own file. This is especially useful when modernizing legacy React codebases that might have grown organically.
```python
# Initialize a dictionary to store files and their corresponding JSX components
files_with_jsx_components = {}
# Iterate through all files in the codebase
for file in codebase.files:
# Check if the file is in the components directory
if 'components' not in file.filepath:
continue
# Count the number of JSX components in the file
jsx_count = sum(1 for function in file.functions if function.is_jsx)
# Only proceed if there are multiple JSX components
if jsx_count > 1:
# Identify non-default exported components
non_default_components = [
func for func in file.functions
if func.is_jsx and not func.is_exported
]
default_components = [
func for func in file.functions
if func.is_jsx and func.is_exported and func.export.is_default_export()
]
# Log the file path and its components
print(f"π {file.filepath}:")
for component in default_components:
print(f" π’ {component.name} (default)")
for component in non_default_components:
print(f" π΅ {component.name}")
# Create a new directory path based on the original file's directory
new_dir_path = "/".join(file.filepath.split("/")[:-1]) + "/" + file.name.split(".")[0]
codebase.create_directory(new_dir_path, exist_ok=True)
# Create a new file path for the component
new_file_path = f"{new_dir_path}/{component.name}.tsx"
new_file = codebase.create_file(new_file_path)
# Log the movement of the component
print(f" π«Έ Moved to: {new_file_path}")
# Move the component to the new file
component.move_to_file(new_file, strategy="add_back_edge")
```
This script will:
1. Find files containing multiple React components
2. Create a new directory structure based on the original file
3. Move each non-default exported component to its own file
4. Preserve imports and dependencies automatically
5. Keep default exports in their original location
For example, given this structure:
```
components/
Forms.tsx # Contains Button, Input, Form (default)
```
It will create:
```
components/
Forms.tsx # Contains Form (default)
forms/
Button.tsx
Input.tsx
```
The `strategy="add_back_edge"` parameter ensures that any components that were
previously co-located can still import each other without circular
dependencies. Learn more about [moving
code](/building-with-codegen/moving-symbols) here.
---
title: "Migrating from unittest to pytest"
sidebarTitle: "Unittest to Pytest"
description: "Learn how to migrate unittest test suites to pytest using Codegen"
icon: "vial"
iconType: "solid"
---
Migrating from [unittest](https://docs.python.org/3/library/unittest.html) to [pytest](https://docs.pytest.org/) involves converting test classes and assertions to pytest's more modern and concise style. This guide will walk you through using Codegen to automate this migration.
You can find the complete example code in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/unittest_to_pytest).
## Overview
The migration process involves four main steps:
1. Converting test class inheritance and setup/teardown methods
2. Updating assertions to pytest style
3. Converting test discovery patterns
4. Modernizing fixture usage
Let's walk through each step using Codegen.
## Step 1: Convert Test Classes and Setup Methods
The first step is to convert unittest's class-based tests to pytest's function-based style. This includes:
- Removing `unittest.TestCase` inheritance
- Converting `setUp` and `tearDown` methods to fixtures
- Updating class-level setup methods
```python
# From:
class TestUsers(unittest.TestCase):
def setUp(self):
self.db = setup_test_db()
def tearDown(self):
self.db.cleanup()
def test_create_user(self):
user = self.db.create_user("test")
self.assertEqual(user.name, "test")
# To:
import pytest
@pytest.fixture
def db():
db = setup_test_db()
yield db
db.cleanup()
def test_create_user(db):
user = db.create_user("test")
assert user.name == "test"
```
## Step 2: Update Assertions
Next, we'll convert unittest's assertion methods to pytest's plain assert statements:
```python
# From:
def test_user_validation(self):
self.assertTrue(is_valid_email("user@example.com"))
self.assertFalse(is_valid_email("invalid"))
self.assertEqual(get_user_count(), 0)
self.assertIn("admin", get_roles())
self.assertRaises(ValueError, parse_user_id, "invalid")
# To:
def test_user_validation():
assert is_valid_email("user@example.com")
assert not is_valid_email("invalid")
assert get_user_count() == 0
assert "admin" in get_roles()
with pytest.raises(ValueError):
parse_user_id("invalid")
```
## Step 3: Update Test Discovery
pytest uses a different test discovery pattern than unittest. We'll update the test file names and patterns:
```python
# From:
if __name__ == '__main__':
unittest.main()
# To:
# Remove the unittest.main() block entirely
# Rename test files to test_*.py or *_test.py
```
## Step 4: Modernize Fixture Usage
Finally, we'll update how test dependencies are managed using pytest's powerful fixture system:
```python
# From:
class TestDatabase(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.db_conn = create_test_db()
def setUp(self):
self.transaction = self.db_conn.begin()
def tearDown(self):
self.transaction.rollback()
# To:
@pytest.fixture(scope="session")
def db_conn():
return create_test_db()
@pytest.fixture
def transaction(db_conn):
transaction = db_conn.begin()
yield transaction
transaction.rollback()
```
## Common Patterns
Here are some common patterns you'll encounter when migrating to pytest:
1. **Parameterized Tests**
```python
# From:
def test_validation(self):
test_cases = [("valid@email.com", True), ("invalid", False)]
for email, expected in test_cases:
with self.subTest(email=email):
self.assertEqual(is_valid_email(email), expected)
# To:
@pytest.mark.parametrize("email,expected", [
("valid@email.com", True),
("invalid", False)
])
def test_validation(email, expected):
assert is_valid_email(email) == expected
```
2. **Exception Testing**
```python
# From:
def test_exceptions(self):
self.assertRaises(ValueError, process_data, None)
with self.assertRaises(TypeError):
process_data(123)
# To:
def test_exceptions():
with pytest.raises(ValueError):
process_data(None)
with pytest.raises(TypeError):
process_data(123)
```
3. **Temporary Resources**
```python
# From:
def setUp(self):
self.temp_dir = tempfile.mkdtemp()
def tearDown(self):
shutil.rmtree(self.temp_dir)
# To:
@pytest.fixture
def temp_dir():
dir = tempfile.mkdtemp()
yield dir
shutil.rmtree(dir)
```
## Tips and Notes
1. pytest fixtures are more flexible than unittest's setup/teardown methods:
- They can be shared across test files
- They support different scopes (function, class, module, session)
- They can be parameterized
2. pytest's assertion introspection provides better error messages by default:
```python
# pytest shows a detailed comparison
assert result == expected
```
3. You can gradually migrate to pytest:
- pytest can run unittest-style tests
- Convert one test file at a time
- Start with assertion style updates before moving to fixtures
4. Consider using pytest's built-in fixtures:
- `tmp_path` for temporary directories
- `capsys` for capturing stdout/stderr
- `monkeypatch` for modifying objects
- `caplog` for capturing log messages
---
title: "Migrating from SQLAlchemy 1.4 to 2.0"
sidebarTitle: "SQLAlchemy 1.4 to 2.0"
description: "Learn how to migrate SQLAlchemy 1.4 codebases to 2.0 using Codegen"
icon: "layer-group"
iconType: "solid"
---
Migrating from [SQLAlchemy](https://www.sqlalchemy.org/) 1.4 to 2.0 involves several API changes to support the new 2.0-style query interface. This guide will walk you through using Codegen to automate this migration, handling query syntax, session usage, and ORM patterns.
You can find the complete example code in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/sqlalchemy_1.4_to_2.0).
## Overview
The migration process involves three main steps:
1. Converting legacy Query objects to select() statements
2. Updating session execution patterns
3. Modernizing ORM relationship declarations
Let's walk through each step using Codegen.
## Step 1: Convert Query to Select
First, we need to convert legacy Query-style operations to the new select() syntax:
```python
def convert_query_to_select(file):
"""Convert Query-style operations to select() statements"""
for call in file.function_calls:
if call.name == "query":
# Convert query(Model) to select(Model)
call.set_name("select")
# Update method chains
if call.parent and call.parent.is_method_chain:
chain = call.parent
if "filter" in chain.source:
# Convert .filter() to .where()
chain.source = chain.source.replace(".filter(", ".where(")
if "filter_by" in chain.source:
# Convert .filter_by(name='x') to .where(Model.name == 'x')
model = call.args[0].value
conditions = chain.source.split("filter_by(")[1].split(")")[0]
new_conditions = []
for cond in conditions.split(","):
if "=" in cond:
key, value = cond.split("=")
new_conditions.append(f"{model}.{key.strip()} == {value.strip()}")
chain.edit(f".where({' & '.join(new_conditions)})")
```
This transforms code from:
```python
# Legacy Query style
session.query(User).filter_by(name='john').filter(User.age >= 18).all()
```
to:
```python
# New select() style
session.execute(
select(User).where(User.name == 'john').where(User.age >= 18)
).scalars().all()
```
SQLAlchemy 2.0 standardizes on select() statements for all queries, providing
better type checking and a more consistent API.
## Step 2: Update Session Execution
Next, we update how queries are executed with the Session:
```python
def update_session_execution(file):
"""Update session execution patterns for 2.0 style"""
for call in file.function_calls:
if call.name == "query":
# Find the full query chain
chain = call
while chain.parent and chain.parent.is_method_chain:
chain = chain.parent
# Wrap in session.execute() if needed
if not chain.parent or "execute" not in chain.parent.source:
chain.edit(f"execute(select{chain.source[5:]})")
# Add .scalars() for single-entity queries
if len(call.args) == 1:
chain.edit(f"{chain.source}.scalars()")
```
This converts patterns like:
```python
# Old style
users = session.query(User).all()
first_user = session.query(User).first()
```
to:
```python
# New style
users = session.execute(select(User)).scalars().all()
first_user = session.execute(select(User)).scalars().first()
```
The new execution pattern is more explicit about what's being returned, making
it easier to understand and maintain type safety.
## Step 3: Update ORM Relationships
Finally, we update relationship declarations to use the new style:
```
```
---
title: "Fixing Import Loops"
description: "Learn how to identify and fix problematic import loops using Codegen."
icon: "arrows-rotate"
iconType: "solid"
---
Import loops occur when two or more Python modules depend on each other, creating a circular dependency. While some import cycles can be harmless, others can lead to runtime errors and make code harder to maintain.
In this tutorial, we'll explore how to identify and fix problematic import cycles using Codegen.
You can find the complete example code in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/removing_import_loops_in_pytorch).
## Overview
The steps to identify and fix import loops are as follows:
1. Detect import loops
2. Visualize them
3. Identify problematic cycles with mixed static/dynamic imports
4. Fix these cycles using Codegen
# Step 1: Detect Import Loops
- Create a graph
- Loop through imports in the codebase and add edges between the import files
- Find strongly connected components using Networkx (the import loops)
```python
G = nx.MultiDiGraph()
# Add all edges to the graph
for imp in codebase.imports:
if imp.from_file and imp.to_file:
edge_color = "red" if imp.is_dynamic else "black"
edge_label = "dynamic" if imp.is_dynamic else "static"
# Store the import statement and its metadata
G.add_edge(
imp.to_file.filepath,
imp.from_file.filepath,
color=edge_color,
label=edge_label,
is_dynamic=imp.is_dynamic,
import_statement=imp, # Store the whole import object
key=id(imp.import_statement),
)
# Find strongly connected components
cycles = [scc for scc in nx.strongly_connected_components(G) if len(scc) > 1]
print(f"π Found {len(cycles)} import cycles:")
for i, cycle in enumerate(cycles, 1):
print(f"\nCycle #{i}:")
print(f"Size: {len(cycle)} files")
# Create subgraph for this cycle to count edges
cycle_subgraph = G.subgraph(cycle)
# Count total edges
total_edges = cycle_subgraph.number_of_edges()
print(f"Total number of imports in cycle: {total_edges}")
# Count dynamic and static imports separately
dynamic_imports = sum(1 for u, v, data in cycle_subgraph.edges(data=True) if data.get("color") == "red")
static_imports = sum(1 for u, v, data in cycle_subgraph.edges(data=True) if data.get("color") == "black")
print(f"Number of dynamic imports: {dynamic_imports}")
print(f"Number of static imports: {static_imports}")
```
## Understanding Import Cycles
Not all import cycles are problematic! Here's an example of a cycle that one may think would cause an error but it does not because due to using dynamic imports.
```python
# top level import in in APoT_tensor.py
from quantizer.py import objectA
```
```python
# dynamic import in quantizer.py
def some_func():
# dynamic import (evaluated when some_func() is called)
from APoT_tensor.py import objectB
```
A dynamic import is an import defined inside of a function, method or any executable body of code which delays the import execution until that function, method or body of code is called.
You can use [Import.is_dynamic](/api-reference/core/Import#is-dynamic) to check if the import is dynamic allowing you to investigate imports that are handled more intentionally.
# Step 2: Visualize Import Loops
- Create a new subgraph to visualize one cycle
- color and label the edges based on their type (dynamic/static)
- visualize the cycle graph using [codebase.visualize(graph)](/api-reference/core/Codebase#visualize)
Learn more about codebase visualization [here](/building-with-codegen/codebase-visualization)
```python
cycle = cycles[0]
def create_single_loop_graph(cycle):
cycle_graph = nx.MultiDiGraph() # Changed to MultiDiGraph to support multiple edges
cycle = list(cycle)
for i in range(len(cycle)):
for j in range(len(cycle)):
# Get all edges between these nodes from original graph
edge_data_dict = G.get_edge_data(cycle[i], cycle[j])
if edge_data_dict:
# For each edge between these nodes
for edge_key, edge_data in edge_data_dict.items():
# Add edge with all its attributes to cycle graph
cycle_graph.add_edge(cycle[i], cycle[j], **edge_data)
return cycle_graph
cycle_graph = create_single_loop_graph(cycle)
codebase.visualize(cycle_graph)
```
# Step 3: Identify problematic cycles with mixed static & dynamic imports
The import loops that we are really concerned about are those that have mixed static/dynamic imports.
Here's an example of a problematic cycle that we want to fix:
```python
# In flex_decoding.py
from .flex_attention import (
compute_forward_block_mn,
compute_forward_inner,
# ... more static imports
)
# Also in flex_decoding.py
def create_flex_decoding_kernel(*args, **kwargs):
from .flex_attention import set_head_dim_values # dynamic import
```
It's clear that there is both a top level and a dynamic import that imports from the *same* module. Thus, this can cause issues if not handled carefully.
Let's find these problematic cycles:
```python
def find_problematic_import_loops(G, sccs):
"""Find cycles where files have both static and dynamic imports between them."""
problematic_cycles = []
for i, scc in enumerate(sccs):
if i == 2: # skipping the second import loop as it's incredibly long (it's also invalid)
continue
mixed_import_files = {} # (from_file, to_file) -> {dynamic: count, static: count}
# Check all file pairs in the cycle
for from_file in scc:
for to_file in scc:
if G.has_edge(from_file, to_file):
# Get all edges between these files
edges = G.get_edge_data(from_file, to_file)
# Count imports by type
dynamic_count = sum(1 for e in edges.values() if e["color"] == "red")
static_count = sum(1 for e in edges.values() if e["color"] == "black")
# If we have both types between same files, this is problematic
if dynamic_count > 0 and static_count > 0:
mixed_import_files[(from_file, to_file)] = {"dynamic": dynamic_count, "static": static_count, "edges": edges}
if mixed_import_files:
problematic_cycles.append({"files": scc, "mixed_imports": mixed_import_files, "index": i})
# Print findings
print(f"Found {len(problematic_cycles)} cycles with mixed imports:")
for i, cycle in enumerate(problematic_cycles):
print(f"\nβ οΈ Problematic Cycle #{i + 1}:")
print(f"\nβ οΈ Index #{cycle['index']}:")
print(f"Size: {len(cycle['files'])} files")
for (from_file, to_file), data in cycle["mixed_imports"].items():
print("\nπ Mixed imports detected:")
print(f" From: {from_file}")
print(f" To: {to_file}")
print(f" Dynamic imports: {data['dynamic']}")
print(f" Static imports: {data['static']}")
return problematic_cycles
problematic_cycles = find_problematic_import_loops(G, cycles)
```
# Step 4: Fix the loop by moving the shared symbols to a separate `utils.py` file
One common fix to this problem to break this cycle is to move all the shared symbols to a separate `utils.py` file. We can do this using the method [symbol.move_to_file](/api-reference/core/Symbol#move-to-file):
Learn more about moving symbols [here](/building-with-codegen/moving-symbols)
```python
# Create new utils file
utils_file = codebase.create_file("torch/_inductor/kernel/flex_utils.py")
# Get the two files involved in the import cycle
decoding_file = codebase.get_file("torch/_inductor/kernel/flex_decoding.py")
attention_file = codebase.get_file("torch/_inductor/kernel/flex_attention.py")
attention_file_path = "torch/_inductor/kernel/flex_attention.py"
decoding_file_path = "torch/_inductor/kernel/flex_decoding.py"
# Track symbols to move
symbols_to_move = set()
# Find imports from flex_attention in flex_decoding
for imp in decoding_file.imports:
if imp.from_file and imp.from_file.filepath == attention_file_path:
# Get the actual symbol from flex_attention
if imp.imported_symbol:
symbols_to_move.add(imp.imported_symbol)
# Move identified symbols to utils file
for symbol in symbols_to_move:
symbol.move_to_file(utils_file)
print(f"π Moved {len(symbols_to_move)} symbols to flex_utils.py")
for symbol in symbols_to_move:
print(symbol.name)
# Commit changes
codebase.commit()
```
# Conclusions & Next Steps
Import loops can be tricky to identify and fix, but Codegen provides powerful tools to help manage them:
- Use `codebase.imports` to analyze import relationships across your project
- Visualize import cycles to better understand dependencies
- Distinguish between static and dynamic imports using `Import.is_dynamic`
- Move shared symbols to break cycles using `symbol.move_to_file`
Here are some next steps you can take:
1. **Analyze Your Codebase**: Run similar analysis on your own codebase to identify potential import cycles
2. **Create Import Guidelines**: Establish best practices for your team around when to use static vs dynamic imports
3. **Automate Fixes**: Create scripts to automatically detect and fix problematic import patterns
4. **Monitor Changes**: Set up CI checks to prevent new problematic import cycles from being introduced
For more examples of codebase analysis and refactoring, check out our other [tutorials](/tutorials/at-a-glance).
---
title: "Migrating from Python 2 to Python 3"
sidebarTitle: "Python 2 to 3"
description: "Learn how to migrate Python 2 codebases to Python 3 using Codegen"
icon: "snake"
iconType: "solid"
---
Migrating from Python 2 to Python 3 involves several syntax and API changes. This guide will walk you through using Codegen to automate this migration, handling print statements, string handling, iterators, and more.
You can find the complete example code in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/python2_to_python3).
## Overview
The migration process involves five main steps:
1. Converting print statements to function calls
2. Updating Unicode to str
3. Converting raw_input to input
4. Updating exception handling syntax
5. Modernizing iterator methods
Let's walk through each step using Codegen.
## Step 1: Convert Print Statements
First, we need to convert Python 2's print statements to Python 3's print function calls:
```python
def convert_print_statements(file):
"""Convert Python 2 print statements to Python 3 function calls"""
lines = file.content.split('\n')
new_content = []
for line in lines:
stripped = line.strip()
if stripped.startswith('print '):
indent = line[:len(line) - len(line.lstrip())]
args = stripped[6:].strip()
new_content.append(f"{indent}print({args})")
else:
new_content.append(line)
if new_content != lines:
file.edit('\n'.join(new_content))
```
This transforms code from:
```python
print "Hello, world!"
print x, y, z
```
to:
```python
print("Hello, world!")
print(x, y, z)
```
In Python 3, `print` is a function rather than a statement, requiring
parentheses around its arguments.
## Step 2: Update Unicode to str
Next, we update Unicode-related code to use Python 3's unified string type:
```python
def update_unicode_to_str(file):
"""Convert Unicode-related code to str for Python 3"""
# Update imports from 'unicode' to 'str'
for imp in file.imports:
if imp.name == 'unicode':
imp.set_name("str")
# Update function calls from Unicode to str
for func_call in file.function_calls:
if func_call.name == "unicode":
func_call.set_name("str")
# Check function arguments for Unicode references
for arg in func_call.args:
if arg.value == "unicode":
arg.set_value("str")
# Find and update Unicode string literals (u"...")
for string_literal in file.find('u"'):
if string_literal.source.startswith('u"') or string_literal.source.startswith("u'"):
new_string = string_literal.source[1:] # Remove the 'u' prefix
string_literal.edit(new_string)
```
This converts code from:
```python
from __future__ import unicode_literals
text = unicode("Hello")
prefix = u"prefix"
```
to:
```python
text = str("Hello")
prefix = "prefix"
```
Python 3 unifies string types, making the `unicode` type and `u` prefix
unnecessary.
## Step 3: Convert raw_input to input
Python 3 renames `raw_input()` to `input()`:
```python
def convert_raw_input(file):
"""Convert raw_input() calls to input()"""
for call in file.function_calls:
if call.name == "raw_input":
call.edit(f"input{call.source[len('raw_input'):]}")
```
This updates code from:
```python
name = raw_input("Enter your name: ")
```
to:
```python
name = input("Enter your name: ")
```
Python 3's `input()` function always returns a string, like Python 2's
`raw_input()`.
## Step 4: Update Exception Handling
Python 3 changes the syntax for exception handling:
```python
def update_exception_syntax(file):
"""Update Python 2 exception handling to Python 3 syntax"""
for editable in file.find("except "):
if editable.source.lstrip().startswith("except") and ", " in editable.source and " as " not in editable.source:
parts = editable.source.split(",", 1)
new_source = f"{parts[0]} as{parts[1]}"
editable.edit(new_source)
```
This converts code from:
```python
try:
process_data()
except ValueError, e:
print(e)
```
to:
```python
try:
process_data()
except ValueError as e:
print(e)
```
Python 3 uses `as` instead of a comma to name the exception variable.
## Step 5: Update Iterator Methods
Finally, we update iterator methods to use Python 3's naming:
```python
def update_iterators(file):
"""Update iterator methods from Python 2 to Python 3"""
for cls in file.classes:
next_method = cls.get_method("next")
if next_method:
# Create new __next__ method with same content
new_method_source = next_method.source.replace("def next", "def __next__")
cls.add_source(new_method_source)
next_method.remove()
```
This transforms iterator classes from:
```python
class MyIterator:
def next(self):
return self.value
```
to:
```python
class MyIterator:
def __next__(self):
return self.value
```
Python 3 renames the `next()` method to `__next__()` for consistency with
other special methods.
## Running the Migration
You can run the complete migration using our example script:
```bash
git clone https://github.com/codegen-sh/codegen-sdk.git
cd codegen-examples/examples/python2_to_python3
python run.py
```
The script will:
1. Process all Python [files](/api-reference/python/PyFile) in your codebase
2. Apply the transformations in the correct order
3. Maintain your code's functionality while updating to Python 3 syntax
## Next Steps
After migration, you might want to:
- Add type hints to your code
- Use f-strings for string formatting
- Update dependencies to Python 3 versions
- Run the test suite to verify functionality
Check out these related tutorials:
- [Increase Type Coverage](/tutorials/increase-type-coverage)
- [Organizing Your Codebase](/tutorials/organize-your-codebase)
- [Creating Documentation](/tutorials/creating-documentation)
## Learn More
- [Python 3 Documentation](https://docs.python.org/3/)
- [What's New in Python 3](https://docs.python.org/3/whatsnew/3.0.html)
- [Codegen API Reference](/api-reference)
- [Dependencies and Usages](/building-with-codegen/dependencies-and-usages)
---
title: "Migrating from Flask to FastAPI"
sidebarTitle: "Flask to FastAPI"
icon: "bolt"
iconType: "solid"
---
Migrating from [Flask](https://flask.palletsprojects.com/) to [FastAPI](https://fastapi.tiangolo.com/) involves several key changes to your codebase. This guide will walk you through using Codegen to automate this migration, handling imports, route decorators, static files, and template rendering.
You can find the complete example code in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/flask_to_fastapi_migration)
## Overview
The migration process involves four main steps:
1. Updating imports and initialization
2. Converting route decorators
3. Setting up static file handling
4. Updating template handling
Let's walk through each step using Codegen.
## I: Update Imports and Initialization
First, we need to update Flask imports to their FastAPI equivalents and modify the app initialization:
Learn more about [imports here](/building-with-codegen/imports).
```python
from codegen import Codebase
# Parse the codebase
codebase = Codebase("./")
# Update imports and initialization
for file in codebase.files:
# Update Flask to FastAPI imports
for imp in file.imports:
if imp.name == "Flask":
imp.set_name("FastAPI")
elif imp.module == "flask":
imp.set_module("fastapi")
# Update app initialization
for call in file.function_calls:
if call.name == "Flask":
call.set_name("FastAPI")
# Remove __name__ argument (not needed in FastAPI)
if len(call.args) > 0 and call.args[0].value == "__name__":
call.args[0].remove()
```
This transforms code from:
```python
from flask import Flask
app = Flask(__name__)
```
to:
```python
from fastapi import FastAPI
app = FastAPI()
```
FastAPI doesn't require the `__name__` argument that Flask uses for template
resolution. Codegen automatically removes it during migration.
## II: Convert Route Decorators
Next, we update Flask's route decorators to FastAPI's operation decorators:
```python
for function in file.functions:
for decorator in function.decorators:
if "@app.route" in decorator.source:
route = decorator.source.split('"')[1]
method = "get" # Default to GET
if "methods=" in decorator.source:
methods = decorator.source.split("methods=")[1].split("]")[0]
if "post" in methods.lower():
method = "post"
elif "put" in methods.lower():
method = "put"
elif "delete" in methods.lower():
method = "delete"
decorator.edit(f'@app.{method}("{route}")')
```
This converts decorators from Flask style:
```python
@app.route("/users", methods=["POST"])
def create_user():
pass
```
to FastAPI style:
```python
@app.post("/users")
def create_user():
pass
```
FastAPI provides specific decorators for each HTTP method, making the API more
explicit and enabling better type checking and OpenAPI documentation.
## III: Setup Static Files
FastAPI handles static files differently than Flask. We need to add the StaticFiles mounting:
```python
# Add StaticFiles import
file.add_import("from fastapi.staticfiles import StaticFiles")
# Mount static directory
file.add_symbol_from_source(
'app.mount("/static", StaticFiles(directory="static"), name="static")'
)
```
This sets up static file serving equivalent to Flask's automatic static file handling.
FastAPI requires explicit mounting of static directories, which provides more
flexibility in how you serve static files.
## IV: Update Template Handling
Finally, we update the template rendering to use FastAPI's Jinja2Templates:
```python
for func_call in file.function_calls:
if func_call.name == "render_template":
# Convert to FastAPI's template response
func_call.set_name("Jinja2Templates(directory='templates').TemplateResponse")
if len(func_call.args) > 1:
# Convert template variables to context dict
context_arg = ", ".join(
f"{arg.name}={arg.value}" for arg in func_call.args[1:]
)
func_call.set_kwarg("context", f"{'{'}{context_arg}{'}'}")
# Add required request parameter
func_call.set_kwarg("request", "request")
```
This transforms template rendering from Flask style:
```python
@app.get("/users")
def list_users():
return render_template("users.html", users=users)
```
to FastAPI style:
```python
@app.get("/users")
def list_users(request: Request):
return Jinja2Templates(directory="templates").TemplateResponse(
"users.html",
context={"users": users},
request=request
)
```
FastAPI requires the `request` object to be passed to templates. Codegen
automatically adds this parameter during migration.
## Running the Migration
You can run the complete migration using our example script:
```bash
git clone https://github.com/codegen-sh/codegen-sdk.git
cd codegen-examples/examples/flask_to_fastapi_migration
python run.py
```
The script will:
1. Process all Python [files](/api-reference/python/PyFile) in your codebase
2. Apply the transformations in the correct order
3. Maintain your code's functionality while updating to FastAPI patterns
## Next Steps
After migration, you might want to:
- Add type hints to your route parameters
- Set up dependency injection
- Add request/response models
- Configure CORS and middleware
Check out these related tutorials:
- [Increase Type Coverage](/tutorials/increase-type-coverage)
- [Managing TypeScript Exports](/tutorials/managing-typescript-exports)
- [Organizing Your Codebase](/tutorials/organize-your-codebase)
## Learn More
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [Codegen API Reference](/api-reference)
- [Moving Symbols Guide](/building-with-codegen/moving-symbols)
- [Dependencies and Usages](/building-with-codegen/dependencies-and-usages)
---
title: "Building a Model Context Protocol server with Codegen"
sidebarTitle: "MCP Server"
icon: "boxes-stacked"
iconType: "solid"
---
Learn how to build a Model Context Protocol (MCP) server that enables AI models to understand and manipulate code using Codegen's powerful tools.
This guide will walk you through creating an MCP server that can provide semantic code search
View the full code in our [examples repository](https://github.com/codegen-sh/codegen-sdk/tree/develop/src/codegen/extensions/mcp)
## Setup:
Install the MCP python library
```
uv pip install mcp
```
## Step 1: Setting Up Your MCP Server
First, let's create a basic MCP server using Codegen's MCP tools:
server.py
```python
from codegen import Codebase
from mcp.server.fastmcp import FastMCP
from typing import Annotated
# Initialize the codebase
codebase = Codebase.from_repo(".")
# create the MCP server using FastMCP
mcp = FastMCP(name="demo-mcp", instructions="Use this server for semantic search of codebases")
if __name__ == "__main__":
# Initialize and run the server
print("Starting demo mpc server...")
mcp.run(transport="stdio")
```
## Step 2: Create the search tool
Let's implement the semantic search tool.
server.py
```python
from codegen.extensions.tools.semantic_search import semantic_search
....
@mcp.tool('codebase_semantic_search', "search codebase with the provided query")
def search(query: Annotated[str, "search query to run against codebase"]):
codebase = Codebase("provide location to codebase", language="provide codebase Language")
# use the semantic search tool from codegen.extensions.tools OR write your own
results = semantic_search(codebase=codebase, query=query)
return results
....
```
## Run Your MCP Server
You can run and inspect your MCP server with:
```
mcp dev server.py
```
If you'd like to integrate this into an IDE checkout out this [setup guide](/introduction/ide-usage#mcp-server-setup)
And that's a wrap, chime in at our [community
Slack](https://community.codegen.com) if you have questions or ideas for additional MCP tools/capabilities
---
title: "Neo4j Graph"
sidebarTitle: "Neo4j Graph"
icon: "database"
iconType: "solid"
---
# Neo4j Graph
Codegen can export codebase graphs to Neo4j for visualization and analysis.
## Installation
In order to use Neo4j you will need to install it and run it locally using Docker.
### Neo4j
First, install Neo4j using the official [installation guide](https://neo4j.com/docs/desktop-manual/current/installation/download-installation/).
### Docker
To run Neo4j locally using Docker, follow the instructions [here](https://neo4j.com/docs/apoc/current/installation/#docker).
## Launch Neo4j Locally
```bash
docker run \
-p 7474:7474 -p 7687:7687 \
-v $PWD/data:/data -v $PWD/plugins:/plugins \
--name neo4j-apoc \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4J_PLUGINS=\[\"apoc\"\] \
neo4j:latest
```
## Usage
```python
from codegen import Codebase
from codegen.extensions.graph.main import visualize_codebase
# parse codebase
codebase = Codebase("path/to/codebase")
# export to Neo4j
visualize_codebase(codebase, "bolt://localhost:7687", "neo4j", "password")
```
## Visualization
Once exported, you can open the Neo4j browser at `http://localhost:7474`, sign in with the username `neo4j` and the password `password`, and use the following Cypher queries to visualize the codebase:
### Class Hierarchy
```cypher
Match (s: Class )-[r: INHERITS_FROM*]-> (e:Class) RETURN s, e LIMIT 10
```
### Methods Defined by Each Class
```cypher
Match (s: Class )-[r: DEFINES]-> (e:Method) RETURN s, e LIMIT 10
```
### Function Calls
```cypher
Match (s: Func )-[r: CALLS]-> (e:Func) RETURN s, e LIMIT 10
```
### Call Graph
```cypher
Match path = (:(Method|Func)) -[:CALLS*5..10]-> (:(Method|Func))
Return path
LIMIT 20
```
---
title: "Code Attributions"
sidebarTitle: "Code Attributions"
description: "Learn how to analyze code statistics and attributions using Codegen"
icon: "network-wired"
iconType: "solid"
---
# AI Impact Analysis
This tutorial shows how to use Codegen's attribution extension to analyze the impact of AI on your
codebase. You'll learn how to identify which parts of your code were written by AI tools like
GitHub Copilot, Devin, or other AI assistants.
Note: the code is flexible - you can track CI pipeline bots, or any other contributor you want.
## Overview
The attribution extension analyzes git history to:
1. Identify which symbols (functions, classes, etc.) were authored or modified by AI tools
2. Calculate the percentage of AI contributions in your codebase
3. Find high-impact AI-written code (code that many other parts depend on)
4. Track the evolution of AI contributions over time
## Installation
The attribution extension is included with Codegen. No additional installation is required.
## Basic Usage
### Running the Analysis
You can run the AI impact analysis using the Codegen CLI:
```bash
codegen analyze-ai-impact
```
Or from Python code:
```python
from codegen import Codebase
from codegen.extensions.attribution.cli import run
# Initialize codebase from current directory
codebase = Codebase.from_repo("your-org/your-repo", language="python")
# Run the analysis
run(codebase)
```
### Understanding the Results
The analysis will print a summary of AI contributions to your console and save detailed results to a JSON file. The summary includes:
- List of all contributors (human and AI)
- Percentage of commits made by AI
- Number of files and symbols touched by AI
- High-impact AI-written code (code with many dependents)
- Top files by AI contribution percentage
## Advanced Usage
### Accessing Attribution Information
After running the analysis, each symbol in your codebase will have attribution information attached to it:
```python
from codegen import Codebase
from codegen.extensions.attribution.main import add_attribution_to_symbols
# Initialize codebase
codebase = Codebase.from_repo("your-org/your-repo", language="python")
# Add attribution information to symbols
ai_authors = ['github-actions[bot]', 'dependabot[bot]', 'copilot[bot]']
add_attribution_to_symbols(codebase, ai_authors)
# Access attribution information on symbols
for symbol in codebase.symbols:
if hasattr(symbol, 'is_ai_authored') and symbol.is_ai_authored:
print(f"AI-authored symbol: {symbol.name} in {symbol.filepath}")
print(f"Last editor: {symbol.last_editor}")
print(f"All editors: {symbol.editor_history}")
```
### Customizing AI Author Detection
By default, the analysis looks for common AI bot names in commit authors.
You can customize this by providing your own list of AI authors:
```python
from codegen import Codebase
from codegen.extensions.attribution.main import analyze_ai_impact
# Initialize codebase
codebase = Codebase.from_repo("your-org/your-repo", language="python")
# Define custom AI authors
ai_authors = [
'github-actions[bot]',
'dependabot[bot]',
'copilot[bot]',
'devin[bot]',
'your-custom-ai-email@example.com'
]
# Run analysis with custom AI authors
results = analyze_ai_impact(codebase, ai_authors)
```
## Example: Contributor Analysis
Here's a complete example that analyzes contributors to your codebase and their impact:
```python
import os
from collections import Counter
from codegen import Codebase
from codegen.extensions.attribution.main import add_attribution_to_symbols
from codegen.git.repo_operator.repo_operator import RepoOperator
from codegen.git.schemas.repo_config import RepoConfig
from codegen.sdk.codebase.config import ProjectConfig
from codegen.shared.enums.programming_language import ProgrammingLanguage
def analyze_contributors(codebase):
"""Analyze contributors to the codebase and their impact."""
print("\nπ Contributor Analysis:")
# Define which authors are considered AI
ai_authors = ['devin[bot]', 'codegen[bot]', 'github-actions[bot]', 'dependabot[bot]']
# Add attribution information to all symbols
print("Adding attribution information to symbols...")
add_attribution_to_symbols(codebase, ai_authors)
# Collect statistics about contributors
contributor_stats = Counter()
ai_contributor_stats = Counter()
print("Analyzing symbol attributions...")
for symbol in codebase.symbols:
if hasattr(symbol, 'last_editor') and symbol.last_editor:
contributor_stats[symbol.last_editor] += 1
# Track if this is an AI contributor
if any(ai in symbol.last_editor for ai in ai_authors):
ai_contributor_stats[symbol.last_editor] += 1
# Print top contributors overall
print("\nπ₯ Top Contributors by Symbols Authored:")
for contributor, count in contributor_stats.most_common(10):
is_ai = any(ai in contributor for ai in ai_authors)
ai_indicator = "π€" if is_ai else "π€"
print(f" {ai_indicator} {contributor}: {count} symbols")
# Print top AI contributors if any
if ai_contributor_stats:
print("\nπ€ Top AI Contributors:")
for contributor, count in ai_contributor_stats.most_common(5):
print(f" β’ {contributor}: {count} symbols")
# Initialize codebase from current directory
if os.path.exists(".git"):
repo_path = os.getcwd()
repo_config = RepoConfig.from_repo_path(repo_path)
repo_operator = RepoOperator(repo_config=repo_config)
project = ProjectConfig.from_repo_operator(
repo_operator=repo_operator,
programming_language=ProgrammingLanguage.PYTHON
)
codebase = Codebase(projects=[project])
# Run the contributor analysis
analyze_contributors(codebase)
```
## Conclusion
The attribution extension provides valuable insights into how AI tools are being used in your
development process. By understanding which parts of your codebase are authored by AI, you can:
- Track the adoption of AI coding assistants in your team
- Identify areas where AI is most effective
- Ensure appropriate review of AI-generated code
- Measure the impact of AI on developer productivity