Model Hosting

WARNING

Our Model Hosting API was released November 1st, and is still experimental. We may change it without notice, and you should not use it in production.

For a quickstart see our How-To Guides.

Source Packages

A source package is structured as follows

- <your-project>
    - setup.py
    - <your-package>
        - __init__.py
        - model.py
              Class Model:
                train()
                predict()
                load()
1
2
3
4
5
6
7
8
9

Any additional files and packages may be included, but the entrypoint into the model is defined in a model.py module with a Model class which has the three methods train(), predict(), and load().

setup.py

The setup.py is no different from the setup.py file in any other Python package and may look something like this

from setuptools import find_packages, setup

REQUIRED_PACKAGES = ["pandas>=0.23"]

setup(
    name="my-package-name",
    version="0.1",
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    description="My Model",
)
1
2
3
4
5
6
7
8
9
10
11

Notice that this is where you would specify your requirements, i.e. other Python packages your model depends on.

The Model class

All source packages must have a class named Model located in a module named model (i.e. in a model.py file). This class is what Model Hosting uses to interact with your code. The class should have three methods:

  • static train() Model Hosting will call this method when training is initiated. It should perform the training routine and persist the results (parameters, serialized model, or whatever makes sense in your case) so that the model can later be loaded. Persisting can be achieved by using the file_io argument (described below). The user can pass arbitrary arguments (must be JSON serializable) to the train method using the args parameter. One could pass training data directly through an argument, but The recommended way of passing training data is to pass a data spec through a data_spec parameter in args. And then inside the training routine use the Data Transfer Service to fetch data described in the data spec. In addition to user-defined parameters, some parameters are always available in the train method:
    • file_io Works as the builtin open(), but read and write to the root of some cloud storage location specific to the model version that is trained.
    • api_key The api key that initiated the training. Can be used to authenticate Data Transfer Service.
    • project The CDP project that the model version belongs to. Can be used to authenticate Data Transfer Service.
  • static load() Model Hosting will call this method when it's deploying a model version and before accepting predictions. It should load necessary persisted state from the training routine and return an instance of the Model class. The method takes one argument:
    • file_io Works as the builtin open(), but read from the root of the cloud storage location where the training routine could write to. This is how you would access persisted state from the training.
  • predict() Model Hosting will call this method when performing predictions. It can use the persisted state that were loaded in load(). As for train, one can pass arbitrary arguments, but some parameters are always available:
    • instance The instance one should perform the prediction on.
    • api_key The api key that initiated the prediction request. Can be used to authenticate Data Transfer Service.
    • project The CDP project that the model version belongs to. Can be used to authenticate Data Transfer Service.

Here is a sketch example:

from cognite.data_transfer_service import DataTransferService


class Model:

    def __init__(self, my_state):
        self._my_state = my_state

    @staticmethod
    def train(file_io, data_spec, api_key, project, **kwargs):
        dts = DataTransferService(data_spec, api_key=api_key, project=project)
        training_result = ... # Do some training, using dts to fetch training data
        with file_io("my_state", "w") as f:
            f.write(training_result)

    @staticmethod
    def load(file_io, **kwargs):
        with file_io("my_state", "r") as f:
            my_state = f.read()
        return Model(my_state)

    def predict(self, instance, **kwargs):
        prediction = ... # use self._some_state to do some prediction
        return prediction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

For a complete example see our How-To Guides.

Here is a diagram that illustrates how train, load and predict work together to function as a model version. Active model version

Last Updated: 11/27/2018, 10:00:23 AM