# SageMaker JumpStart - Deploy Chronos-2 endpoints to AWS for production use

In this demo notebook, we will walk through the process of using the **SageMaker Python SDK** to deploy a **Chronos-2** model to a cloud endpoint on AWS. To simplify deployment, we will leverage **SageMaker JumpStart**.

### Why Deploy to an Endpoint?
So far, we‚Äôve seen how to run models locally, which is useful for experimentation. However, in a production setting, a forecasting model is typically just one component of a larger system. Running models locally doesn‚Äôt scale well and lacks the reliability needed for real-world applications.

To address this, we deploy models as **endpoints** on AWS. An endpoint acts as a **hosted service**‚Äîwe can send it requests (containing time series data), and it returns forecasts in response. This allows seamless integration into production workflows, ensuring scalability and real-time inference.

<div class="alert alert-warning">
<b>‚ö†Ô∏è Looking for Chronos-Bolt or original Chronos?</b><br>
This notebook covers <b>Chronos-2</b>, our latest and recommended model. For documentation on older models (Chronos-Bolt and original Chronos), see the <a href="https://github.com/amazon-science/chronos-forecasting/blob/v1.5.3/notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb"><b>legacy deployment walkthrough</b></a>.
</div>

### Chronos-2 vs. Previous Models

**Chronos-2** is a foundation model for time series forecasting that builds on [Chronos](https://arxiv.org/abs/2403.07815) and [Chronos-Bolt](https://aws.amazon.com/blogs/machine-learning/fast-and-accurate-zero-shot-forecasting-with-chronos-bolt-and-autogluon/). It offers significant improvements in capabilities, better accuracy, and can handle diverse forecasting scenarios not supported by earlier models.

| Capability | Chronos-2 | Chronos-Bolt | Chronos |
|------------|-----------|--------------|----------|
| Univariate Forecasting | ‚úÖ | ‚úÖ | ‚úÖ |
| Cross-learning across items | ‚úÖ | ‚ùå | ‚ùå |
| Multivariate Forecasting | ‚úÖ | ‚ùå | ‚ùå |
| Past-only (real/categorical) covariates | ‚úÖ | ‚ùå | ‚ùå |
| Known future (real/categorical) covariates | ‚úÖ | üß© | ‚ùå |
| Max. Context Length | 8192 | 2048 | 512 |
| Max. Prediction Length | 1024 | 64 | 64 |

üß© Chronos-Bolt does not natively support future covariates, but they can be combined with external covariate regressors (see [AutoGluon tutorial](https://auto.gluon.ai/stable/tutorials/timeseries/forecasting-chronos.html#incorporating-the-covariates)). This only models per-timestep effects, not effects across time. In contrast, Chronos-2 supports all covariate types natively.

## Deploy the model

First, update the SageMaker SDK to access the latest models:

In [None]:
!pip install -U -q sagemaker

We create a `JumpStartModel` with the necessary configuration based on the model ID. The key parameters are:
- `model_id`: Specifies the model to use. We use `pytorch-forecasting-chronos-2` for the [Chronos-2](https://github.com/amazon-science/chronos-forecasting) model.
- `instance_type`: Defines the AWS instance for serving the endpoint. Chronos-2 currently requires a **GPU instance** from the `ml.g5`, `ml.g6`, `ml.g6e`, or `ml.g4dn` families with a single GPU. The model does not benefit from multi-GPU instances. **CPU support is coming soon**.

   You can check the pricing for different SageMaker instance types for real-time inference [here](https://aws.amazon.com/sagemaker-ai/pricing/).

The `JumpStartModel` will automatically set the necessary attributes such as `image_uri` based on the chosen `model_id` and `instance_type`.

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(
    model_id="pytorch-forecasting-chronos-2",
    instance_type="ml.g5.2xlarge",
    # You might need to provide the SageMaker execution role to ensure necessary AWS resources are accessible
    # role="arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXXXXXXXXXX",
)

Next, we deploy the model and create an endpoint. Deployment typically takes a few minutes, as SageMaker provisions the instance, loads the model, and sets up the endpoint for inference.


In [None]:
predictor = model.deploy()

> **Note:** Once the endpoint is deployed, it remains active and incurs charges on your AWS account until it is deleted. The cost depends on factors such as the instance type, the region where the endpoint is hosted, and the duration it remains running. To avoid unnecessary charges, make sure to delete the endpoint when it is no longer needed. For detailed pricing information, refer to the [SageMaker AI pricing page](https://aws.amazon.com/sagemaker-ai/pricing/).

Alternatively, you can connect to an existing endpoint.

In [None]:
# from sagemaker.predictor import Predictor
# from sagemaker.serializers import JSONSerializer
# from sagemaker.deserializers import JSONDeserializer

# predictor = Predictor(
#     "NAME_OF_EXISTING_ENDPOINT",
#     serializer=JSONSerializer(),
#     deserializer=JSONDeserializer(),
# )

## Querying the endpoint

We can now invoke the endpoint to make a forecast. We send a **payload** to the endpoint, which includes historical time series values and configuration parameters, such as the prediction length. The endpoint processes this input and returns a **response** containing the forecasted values based on the provided data.

In [2]:
# Define a utility function to print the response in a pretty format
from pprint import pformat


def nested_round(data, decimals=2):
    """Round numbers, including nested dicts and list."""
    if isinstance(data, float):
        return round(data, decimals)
    elif isinstance(data, list):
        return [nested_round(item, decimals) for item in data]
    elif isinstance(data, dict):
        return {key: nested_round(value, decimals) for key, value in data.items()}
    else:
        return data


def pretty_format(data):
    return pformat(nested_round(data), width=150, sort_dicts=False)

### Univariate forecasting

In [3]:
payload = {
    "inputs": [
        {"target": [0.0, 4.0, 5.0, 1.5, -3.0, -5.0, -3.0, 1.5, 5.0, 4.0, 0.0, -4.0, -5.0, -1.5, 3.0, 5.0, 3.0, -1.5, -5.0, -4.0]},
    ],
    "parameters": {
        "prediction_length": 10
    }
}
response = predictor.predict(payload)
print(pretty_format(response))

{'predictions': [{'mean': [-0.36, 4.03, 5.31, 2.44, -2.47, -5.09, -4.31, 0.07, 4.41, 5.16],
                  '0.1': [-1.69, 2.84, 4.0, 0.97, -3.77, -6.19, -5.34, -1.77, 2.55, 3.61],
                  '0.5': [-0.36, 4.03, 5.31, 2.44, -2.47, -5.09, -4.31, 0.07, 4.41, 5.16],
                  '0.9': [1.03, 5.0, 6.31, 3.81, -0.85, -3.89, -2.89, 1.84, 5.59, 6.44]}]}


A payload may also contain **multiple time series**, potentially including `start` and `item_id` fields.

In [4]:
payload = {
    "inputs": [
        {
            "target": [1.0, 2.0, 3.0, 2.0, 0.5, 2.0, 3.0, 2.0, 1.0],
            "item_id": "product_A",
            "start": "2024-01-01T01:00:00",
        },
        {
            "target": [5.4, 3.0, 3.0, 2.0, 1.5, 2.0, -1.0],
            "item_id": "product_B",
            "start": "2024-02-02T03:00:00",
        },
    ],
    "parameters": {
        "prediction_length": 5,
        "freq": "1h",
        "quantile_levels": [0.1, 0.5, 0.9],
        "batch_size": 2,
    },
}
response = predictor.predict(payload)
print(pretty_format(response))

{'predictions': [{'mean': [1.7, 1.95, 1.66, 1.55, 1.84],
                  '0.1': [0.28, 0.32, -0.08, -0.35, -0.18],
                  '0.5': [1.7, 1.95, 1.66, 1.55, 1.84],
                  '0.9': [3.09, 3.77, 3.62, 3.58, 4.22],
                  'item_id': 'product_A',
                  'start': '2024-01-01T10:00:00'},
                 {'mean': [-1.21, -1.4, -1.27, -1.34, -1.27],
                  '0.1': [-4.19, -5.84, -6.38, -7.53, -8.0],
                  '0.5': [-1.21, -1.4, -1.27, -1.34, -1.27],
                  '0.9': [2.02, 2.92, 3.55, 4.62, 5.66],
                  'item_id': 'product_B',
                  'start': '2024-02-02T10:00:00'}]}


### Forecasting with covariates

Chronos-2 models also support forecasting with **covariates** (a.k.a. exogenous features or related time series). These can be provided using the `past_covariates` and `future_covariates` keys.

**Note:** If you only provide `past_covariates` without matching keys in `future_covariates`, the model will treat them as past-only covariates (features that are only available historically but not in the future).
If future values of covariates are available, it is recommended to provide them in `future_covariates` as this typically results in more accurate forecasts.

In [5]:
payload = {
    "inputs": [
        {
            "target": [1.0, 2.0, 3.0, 2.0, 0.5, 2.0, 3.0, 2.0, 1.0],
            # past_covariates must have the same length as "target"
            "past_covariates": {
                "feat_1": [3.0, 6.0, 9.0, 6.0, 1.5, 6.0, 9.0, 6.0, 3.0],
                # Categorical covariates should be provided as strings
                "feat_2": ["A", "B", "B", "B", "A", "A", "A", "A", "B"],
                # feat_3 is a past-only covariate (not present in future_covariates)
                "feat_3": [10.0, 20.0, 30.0, 20.0, 5.0, 20.0, 30.0, 20.0, 10.0],
            },
            # future_covariates must have length equal to "prediction_length"
            "future_covariates": {
                "feat_1": [2.5, 2.2, 3.3],
                "feat_2": ["B", "A", "A"],
            },
        },
        {
            "target": [5.4, 3.0, 3.0, 2.0, 1.5, 2.0, -1.0],
            "past_covariates": {
                "feat_1": [0.6, 1.2, 1.8, 1.2, 0.3, 1.2, 1.8],
                "feat_2": ["A", "B", "B", "B", "A", "A", "A"],
                "feat_3": [5.4, 3.0, 3.0, 2.0, 1.5, 2.0, -1.0],
            },
            "future_covariates": {
                "feat_1": [1.2, 0.3, 4.4],
                "feat_2": ["A", "B", "A"],
            },
        },
    ],
    "parameters": {
        "prediction_length": 3,
        "quantile_levels": [0.1, 0.5, 0.9],
    },
}
response = predictor.predict(payload)
print(pretty_format(response))

{'predictions': [{'mean': [1.73, 2.09, 1.73], '0.1': [0.36, 0.6, 0.17], '0.5': [1.73, 2.09, 1.73], '0.9': [3.11, 3.8, 3.52]},
                 {'mean': [-0.61, -0.41, -1.43], '0.1': [-4.16, -5.59, -7.53], '0.5': [-0.61, -0.41, -1.43], '0.9': [3.12, 4.56, 3.91]}]}


### Multivariate forecasting

Chronos-2 also supports **multivariate forecasting**, where multiple related time series are forecasted jointly. For multivariate forecasting, provide the target as a list of lists, where each inner list represents one dimension of the multivariate series.

In [6]:
payload = {
    "inputs": [
        {
            # For multivariate forecasting, target is a list of lists
            # Each inner list represents one dimension with the same length
            # np.array(target) would have shape [num_dimensions, length]
            "target": [
                [1.0, 2.0, 3.0, 2.0, 1.0, 2.0, 3.0, 4.0],  # Dimension 1
                [5.0, 4.0, 3.0, 4.0, 5.0, 4.0, 3.0, 2.0],  # Dimension 2
                [2.0, 2.5, 3.0, 2.5, 2.0, 2.5, 3.0, 3.5],  # Dimension 3
            ],
        },
    ],
    "parameters": {
        "prediction_length": 4,
        "quantile_levels": [0.1, 0.5, 0.9],
    },
}
response = predictor.predict(payload)
print(pretty_format(response))

{'predictions': [{'mean': [[3.66, 3.55, 3.5, 3.42], [2.0, 2.05, 2.19, 2.23], [3.33, 3.27, 3.25, 3.22]],
                  '0.1': [[1.98, 1.52, 1.17, 0.88], [0.84, 0.18, 0.0, -0.25], [2.5, 2.27, 2.08, 1.94]],
                  '0.5': [[3.66, 3.55, 3.5, 3.42], [2.0, 2.05, 2.19, 2.23], [3.33, 3.27, 3.25, 3.22]],
                  '0.9': [[5.75, 6.25, 6.59, 7.0], [3.8, 4.47, 4.88, 5.31], [4.38, 4.62, 4.78, 5.0]]}]}


## Endpoint API
So far, we have explored several examples of querying the endpoint with different payload structures. Below is a comprehensive API specification detailing all supported parameters, their meanings, and how they affect the model‚Äôs predictions.

* **inputs** (required): List with at most 1000 time series that need to be forecasted. Each time series is represented by a dictionary with the following keys:
    * **target** (required): Observed time series values.
        - For univariate forecasting: List of numeric values.
        - For multivariate forecasting: List of lists, where each inner list represents one dimension. All dimensions must have the same length. If converted to a numpy array via `np.array(target)`, the shape would be `[num_dimensions, length]`.
        - It is recommended that each time series contains at least 30 observations.
        - If any time series contains fewer than 5 observations, an error will be raised.
    * **item_id**: String that uniquely identifies each time series.
        - If provided, the ID must be unique for each time series.
        - If provided, then the endpoint response will also include the **item_id** field for each forecast.
    * **start**: Timestamp of the first time series observation in ISO format (`YYYY-MM-DD` or `YYYY-MM-DDThh:mm:ss`).
        - If **start** field is provided, then **freq** must also be provided as part of **parameters**.
        - If provided, then the endpoint response will also include the **start** field indicating the first timestamp of each forecast.
    * **past_covariates**: Dictionary containing the past values of the covariates for this time series.
        - Each key in **past_covariates** correspond to the name of the covariate. Each value must be an array consisting of all-numeric or all-string values, with the length equal to the length of the **target**.
        - Covariates that appear only in **past_covariates** (and not in **future_covariates**) are treated as past-only covariates.
    * **future_covariates**: Dictionary containing the future values of the covariates for this time series (values during the forecast horizon).
        - Each key in **future_covariates** correspond to the name of the covariate. Each value must be an array consisting of all-numeric or all-string values, with the length equal to **prediction_length**.
        - Covariates that appear in both **past_covariates** and **future_covariates** are treated as known future covariates.
* **parameters**: Optional parameters to configure the model.
    * **prediction_length**: Integer corresponding to the number of future time series values that need to be predicted. Defaults to `1`. Values up to `1024` are supported.
    * **quantile_levels**: List of floats in range (0, 1) specifying which quantiles should should be included in the probabilistic forecast. Defaults to `[0.1, 0.5, 0.9]`.
        - Chronos-2 natively supports quantile levels in range `[0.01, 0.99]`. Predictions outside the range will be clipped.
    * **freq**: Frequency of the time series observations in [pandas-compatible format](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). For example, `1h` for hourly data or `2W` for bi-weekly data.
        - If **freq** is provided, then **start** must also be provided for each time series in **inputs**.
    * **batch_size**: Number of time series processed in parallel by the model. Larger values speed up inference but may lead to out of memory errors. Defaults to `256`.
    * **predict_batches_jointly**: If `True`, the model will apply group attention to all items in the batch, instead of processing each item separately (described as "full cross-learning mode" in the [technical report](https://www.arxiv.org/abs/2510.15821)). This may produce more accurate forecasts at the cost of lower inference speed. Defaults to `False`.

All keys not marked with (required) are optional.

The endpoint response contains the probabilistic (quantile) forecast for each time series included in the request.

## Working with long-format data frames

The endpoint communicates using JSON format for both input and output. However, in practice, time series data is often stored in a **long-format data frame** (where each row represents a timestamp for a specific item).

In the following example, we demonstrate how to:

1. Convert a long-format data frame into the JSON payload format required by the endpoint.
2. Send the request and retrieve predictions.
3. Convert the response back into a long-format data frame for further analysis.

First, we load an example dataset in long data frame format.

In [7]:
import pandas as pd

df = pd.read_csv(
    "https://autogluon.s3.amazonaws.com/datasets/timeseries/grocery_sales/test.csv",
    parse_dates=["timestamp"],
)
df.head()

Unnamed: 0,item_id,timestamp,scaled_price,promotion_email,promotion_homepage,unit_sales
0,1062_101,2018-01-01,0.87913,0.0,0.0,636.0
1,1062_101,2018-01-08,0.994517,0.0,0.0,123.0
2,1062_101,2018-01-15,1.005513,0.0,0.0,391.0
3,1062_101,2018-01-22,1.0,0.0,0.0,339.0
4,1062_101,2018-01-29,0.883309,0.0,0.0,661.0


We split the data into two parts:
- Past data, including historic values of the target column and the covariates.
- Future data that contains the future values of the covariates during the forecast horizon.

In [8]:
prediction_length = 8
target_col = "unit_sales"
freq = pd.infer_freq(df[df.item_id == df.item_id[0]]["timestamp"])

past_df = df.groupby("item_id").head(-prediction_length)
future_df = df.groupby("item_id").tail(prediction_length).drop(columns=[target_col])

In [9]:
past_df.head()

Unnamed: 0,item_id,timestamp,scaled_price,promotion_email,promotion_homepage,unit_sales
0,1062_101,2018-01-01,0.87913,0.0,0.0,636.0
1,1062_101,2018-01-08,0.994517,0.0,0.0,123.0
2,1062_101,2018-01-15,1.005513,0.0,0.0,391.0
3,1062_101,2018-01-22,1.0,0.0,0.0,339.0
4,1062_101,2018-01-29,0.883309,0.0,0.0,661.0


In [10]:
future_df.head()

Unnamed: 0,item_id,timestamp,scaled_price,promotion_email,promotion_homepage
23,1062_101,2018-06-11,1.005425,0.0,0.0
24,1062_101,2018-06-18,1.005454,0.0,0.0
25,1062_101,2018-06-25,1.0,0.0,0.0
26,1062_101,2018-07-02,1.005513,0.0,0.0
27,1062_101,2018-07-09,1.0,0.0,0.0


We can now convert this data into a JSON payload.

In [11]:
def convert_df_to_payload(
    past_df,
    future_df=None,
    prediction_length=1,
    freq="D",
    target="target",
    id_column="item_id",
    timestamp_column="timestamp",
):
    """
    Converts past and future DataFrames into JSON payload format for the Chronos endpoint.

    Args:
        past_df (pd.DataFrame): Historical data with `target`, `timestamp_column`, and `id_column`.
        future_df (pd.DataFrame, optional): Future covariates with `timestamp_column` and `id_column`.
            Covariates in past_df but not in future_df are treated as past-only covariates.
        prediction_length (int): Number of future time steps to predict.
        freq (str): Pandas-compatible frequency of the time series.
        target (str or list[str]): Column name(s) for target values.
            Use a string for univariate forecasting or a list of strings for multivariate forecasting.
        id_column (str): Column name for item IDs.
        timestamp_column (str): Column name for timestamps.

    Returns:
        dict: JSON payload formatted for the Chronos endpoint.
    """
    past_df = past_df.sort_values([id_column, timestamp_column])
    if future_df is not None:
        future_df = future_df.sort_values([id_column, timestamp_column])

    target_cols = [target] if isinstance(target, str) else target
    past_covariate_cols = list(past_df.columns.drop([*target_cols, id_column, timestamp_column]))
    future_covariate_cols = [] if future_df is None else [col for col in past_covariate_cols if col in future_df.columns]

    inputs = []
    for item_id, past_group in past_df.groupby(id_column):
        if len(target_cols) > 1:
            target_values = [past_group[col].tolist() for col in target_cols]
            series_length = len(target_values[0])
        else:
            target_values = past_group[target_cols[0]].tolist()
            series_length = len(target_values)

        if series_length < 5:
            raise ValueError(f"Time series '{item_id}' has fewer than 5 observations.")

        series_dict = {
            "target": target_values,
            "item_id": str(item_id),
            "start": past_group[timestamp_column].iloc[0].isoformat(),
        }

        if past_covariate_cols:
            series_dict["past_covariates"] = past_group[past_covariate_cols].to_dict(orient="list")

        if future_covariate_cols:
            future_group = future_df[future_df[id_column] == item_id]
            if len(future_group) != prediction_length:
                raise ValueError(
                    f"future_df must contain exactly {prediction_length=} values for each item_id from past_df "
                    f"(got {len(future_group)=}) for {item_id=}"
                )
            series_dict["future_covariates"] = future_group[future_covariate_cols].to_dict(orient="list")

        inputs.append(series_dict)

    return {
        "inputs": inputs,
        "parameters": {"prediction_length": prediction_length, "freq": freq},
    }

In [12]:
payload = convert_df_to_payload(
    past_df,
    future_df,
    prediction_length=prediction_length,
    freq=freq,
    target="unit_sales",
)

We can now send the payload to the endpoint.

In [13]:
response = predictor.predict(payload)

Note how Chronos-2 generated predictions for >300 time series in the dataset (with covariates!) in less than 2 seconds.

Finally, we can convert the response back to a long-format data frame.

In [14]:
def convert_response_to_df(response, freq="D"):
    """
    Converts a JSON response from the Chronos endpoint into a long-format DataFrame.

    Args:
        response (dict): JSON response containing forecasts.
        freq (str): Pandas-compatible frequency of the time series.

    Returns:
        pd.DataFrame: Long-format DataFrame with timestamps, item_id, and forecasted values.
            For multivariate forecasts, creates separate rows for each target dimension (target_1, target_2, etc.).
    """
    dfs = []
    for forecast in response["predictions"]:
        if isinstance(forecast["mean"], list) and isinstance(forecast["mean"][0], list):
            # Multivariate forecast
            timestamps = pd.date_range(forecast["start"], freq=freq, periods=len(forecast["mean"][0]))
            for dim_idx in range(len(forecast["mean"])):
                dim_data = {"item_id": forecast.get("item_id"), "timestamp": timestamps, "target": f"target_{dim_idx + 1}"}
                for key, value in forecast.items():
                    if key not in ["item_id", "start"]:
                        dim_data[key] = value[dim_idx]
                dfs.append(pd.DataFrame(dim_data))
        else:
            # Univariate forecast
            forecast_df = pd.DataFrame(forecast).drop(columns=["start"])
            forecast_df["timestamp"] = pd.date_range(forecast["start"], freq=freq, periods=len(forecast_df))
            # Reorder columns to have item_id and timestamp first
            cols = ["item_id", "timestamp"] + [c for c in forecast_df.columns if c not in ["item_id", "timestamp"]]
            forecast_df = forecast_df[cols]
            dfs.append(forecast_df)

    return pd.concat(dfs, ignore_index=True)

In [15]:
forecast_df = convert_response_to_df(response, freq=freq)
forecast_df.head()

Unnamed: 0,item_id,timestamp,mean,0.1,0.5,0.9
0,1062_101,2018-06-11,320.0,186.0,320.0,488.0
1,1062_101,2018-06-18,318.0,175.0,318.0,496.0
2,1062_101,2018-06-25,316.0,169.0,316.0,508.0
3,1062_101,2018-07-02,316.0,171.0,316.0,506.0
4,1062_101,2018-07-09,310.0,165.0,310.0,506.0


## Clean up the endpoint
Don't forget to clean up resources when finished to avoid unnecessary charges.

In [16]:
predictor.delete_predictor()