# Example: Convert a XGBoost model to CoreML

In this example, we will train an XGBoost regression model and convert it to CoreML. For more details on all the different ways to convert these models, see the [coremltools XGBoost documentation](https://coremltools.readme.io/reference/convertersxgboostconvert).

## Create an XGBoost Model

For this example, we will follow a [regression example](https://github.com/dmlc/xgboost/tree/master/demo/CLI/regression) in the XGBoost repository. For this example, we are predicting the performance of a computer system based on some features. This data comes from the [UCI Machine Learning Repository Compute Hardware Data Set](https://archive.ics.uci.edu/ml/datasets/Computer+Hardware).

In [None]:
import xgboost as xgb
import matplotlib.pyplot as plt

First, we load the train and test data (already split):

In [None]:
dtrain = xgb.DMatrix('machine.txt.train')
dtest = xgb.DMatrix('machine.txt.test')

Train the model using parameters from the example:

In [None]:
param = {
 'objective': 'reg:squarederror',
 'eta': '1.0',
 'gamma': 1.0,
 'min_child_weight': 1,
 'max_depth': 3,
}
bst_model = xgb.train(param, dtrain, num_boost_round=2)

And plot the tree:

In [None]:
xgb.plot_tree(bst_model, 'machine.featmap.txt')
fig = plt.gcf()
fig.set_size_inches(20, 40)
plt.show()

Next we can test the accuracy of the model on the training data:

In [None]:
prediction = bst_model.predict(dtrain)
actual = dtrain.get_label()
error = prediction - actual
print('mean error:', error.mean(), 'stdev error:', error.std())
plt.hist(error)
plt.xlabel('prediction - actual')
plt.show()

## Convert to CoreML Model

Converting an XGBoost model to CoreML format is much simpler than PyTorch or TensorFlow. However, if we want to use the proper feature names for model inputs, we need to load them and pass them to the `convert` method.

In [None]:
feature_names = []
with open('machine.featmap.txt') as f:
 for line in f.readlines():
 feature_name = line.split()[1]
 feature_names.append(feature_name)
feature_names

We can then use the desired feature names and the name of the model target (the output) during model conversion:

In [None]:
import coremltools as ct

cml_model = ct.converters.xgboost.convert(bst_model, feature_names=feature_names, target='perf', mode='regressor')

We can see the feature names in the metadata describing this model by looking at the string representation:

In [None]:
cml_model

Note that the `vendor` categorical input is represented in the model as a one-hot-encoded value.

Finally, we can write the model to disk in the CoreML format.

In [None]:
cml_model.save('machine.mlmodel')

## Using the CoreML Model

As with other CoreML model types, if we are on a macOS system, we can use the predict method to run the model. We can pass our input in the form of a dictionary with the feature values.

In [None]:
example = {
 'MYCT': 125,
 'MMIN': 256,
 'MMAX': 6000,
 'CACH': 256,
 'CHMIN': 16,
 'CHMAX': 128,
}
# Set the one-hot-encoded vendor feature
for feature_name in feature_names:
 if feature_name == 'vendor:ibm':
 example[feature_name] = 1
 elif feature_name.startswith('vendor'):
 example[feature_name] = 0

In [None]:
import sys
IS_MACOS = sys.platform == 'darwin'

if IS_MACOS:
 loaded_model = ct.models.MLModel('machine.mlmodel')
 prediction = loaded_model.predict(example)
 print('prediction:', prediction)
else:
 prediction = 'Skipping prediction on non-macOS system'