--- name: sap-hana-ml description: | SAP HANA Machine Learning Python Client (hana-ml) development skill. Use when: Building ML solutions with SAP HANA's in-database machine learning using Python hana-ml library for PAL/APL algorithms, DataFrame operations, AutoML, model persistence, and visualization. Keywords: hana-ml, SAP HANA, machine learning, PAL, APL, predictive analytics, HANA DataFrame, ConnectionContext, classification, regression, clustering, time series, ARIMA, gradient boosting, AutoML, SHAP, model storage license: GPL-3.0 metadata: version: 1.1.0 last_verified: 2025-11-27 package_version: 2.22.241011 --- # SAP HANA ML Python Client (hana-ml) **Package Version**: 2.22.241011 **Last Verified**: 2025-11-27 ## Table of Contents - [Installation & Setup](#installation--setup) - [Quick Start](#quick-start) - [Core Libraries](#core-libraries) - [Common Patterns](#common-patterns) - [Best Practices](#best-practices) - [Bundled Resources](#bundled-resources) --- ## Installation & Setup ```bash pip install hana-ml ``` **Requirements**: Python 3.8+, SAP HANA 2.0 SPS03+ or SAP HANA Cloud --- ## Quick Start ### Connection & DataFrame ```python from hana_ml import ConnectionContext # Connect conn = ConnectionContext( address='', port=443, user='', password='', encrypt=True ) # Create DataFrame df = conn.table('MY_TABLE', schema='MY_SCHEMA') print(f"Shape: {df.shape}") df.head(10).collect() ``` ### PAL Classification ```python from hana_ml.algorithms.pal.unified_classification import UnifiedClassification # Train model clf = UnifiedClassification(func='RandomDecisionTree') clf.fit(train_df, features=['F1', 'F2', 'F3'], label='TARGET') # Predict & evaluate predictions = clf.predict(test_df, features=['F1', 'F2', 'F3']) score = clf.score(test_df, features=['F1', 'F2', 'F3'], label='TARGET') ``` ### APL AutoML ```python from hana_ml.algorithms.apl.classification import AutoClassifier # Automated classification auto_clf = AutoClassifier() auto_clf.fit(train_df, label='TARGET') predictions = auto_clf.predict(test_df) ``` ### Model Persistence ```python from hana_ml.model_storage import ModelStorage ms = ModelStorage(conn) clf.name = 'MY_CLASSIFIER' ms.save_model(model=clf, if_exists='replace') ``` --- ## Core Libraries ### PAL (Predictive Analysis Library) - **100+ algorithms** executed in-database - Categories: Classification, Regression, Clustering, Time Series, Preprocessing - **Key classes**: `UnifiedClassification`, `UnifiedRegression`, `KMeans`, `ARIMA` - See: `references/PAL_ALGORITHMS.md` for complete list ### APL (Automated Predictive Library) - **AutoML capabilities** with automatic feature engineering - **Key classes**: `AutoClassifier`, `AutoRegressor`, `GradientBoostingClassifier` - See: `references/APL_ALGORITHMS.md` for details ### DataFrames - **Lazy evaluation** - builds SQL until `collect()` called - **In-database processing** for optimal performance - See: `references/DATAFRAME_REFERENCE.md` for complete API ### Visualizers - **EDA plots**, model explanations, metrics - **SHAP integration** for model interpretability - See: `references/VISUALIZERS.md` for 14 visualization modules --- ## Common Patterns ### Train-Test Split ```python from hana_ml.algorithms.pal.partition import train_test_val_split train, test, val = train_test_val_split( data=df, training_percentage=0.7, testing_percentage=0.2, validation_percentage=0.1 ) ``` ### Feature Importance ```python # APL models importance = auto_clf.get_feature_importances() # PAL models from hana_ml.algorithms.pal.preprocessing import FeatureSelection fs = FeatureSelection() fs.fit(train_df, features=features, label='TARGET') ``` ### Pipeline ```python from hana_ml.algorithms.pal.pipeline import Pipeline from hana_ml.algorithms.pal.preprocessing import Imputer, FeatureNormalizer pipeline = Pipeline([ ('imputer', Imputer(strategy='mean')), ('normalizer', FeatureNormalizer()), ('classifier', UnifiedClassification(func='RandomDecisionTree')) ]) ``` --- ## Best Practices 1. **Use lazy evaluation** - Operations build SQL without execution until `collect()` 2. **Leverage in-database processing** - Keep data in HANA for performance 3. **Use Unified interfaces** - Consistent APIs across algorithms 4. **Save models** - Use `ModelStorage` for persistence 5. **Explain predictions** - Use SHAP explainers for interpretability 6. **Monitor AutoML** - Use `PipelineProgressStatusMonitor` for long-running jobs --- ## Bundled Resources ### Reference Files - **`references/DATAFRAME_REFERENCE.md`** (479 lines) - ConnectionContext API, DataFrame operations, SQL generation - **`references/PAL_ALGORITHMS.md`** (869 lines) - Complete PAL algorithm reference (100+ algorithms) - Classification, Regression, Clustering, Time Series, Preprocessing - **`references/APL_ALGORITHMS.md`** (534 lines) - AutoML capabilities, automated feature engineering - AutoClassifier, AutoRegressor, GradientBoosting classes - **`references/VISUALIZERS.md`** (704 lines) - 14 visualization modules (EDA, SHAP, metrics, time series) - Plot types, configuration, export options - **`references/SUPPORTING_MODULES.md`** (626 lines) - Model storage, spatial analytics, graph algorithms - Text mining, statistics, error handling --- ## Error Handling ```python from hana_ml.ml_exceptions import Error try: clf.fit(train_df, features=features, label='TARGET') except Error as e: print(f"HANA ML Error: {e}") ``` --- ## Documentation - **Official Docs**: [https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.07/en-US/hana_ml.html](https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.07/en-US/hana_ml.html) - **PyPI Package**: [https://pypi.org/project/hana-ml/](https://pypi.org/project/hana-ml/)