Model hosting

WARNING

The content in this section describe features that are currently in preview and that are not yet generally available. The features are subject to change, and we do not recommend that you use them in production systems.

You can create and upload your own models to perform predictions on data in Cognite Data Fusion, for example on time series, and write the output back to Cognite Data Fusion. The models you upload to our hosting environment, automatically get an HTTP endpoint and are instantly available. You can also define schedules to run the predictions at regular intervals.

In this article:

Additional resources:

Overview of models and versions

A model is a routine that can perform predictions. It takes data as input, runs calculations on the data, and then outputs the results of the calculations.

Each model can have any number of versions - specific implementations of the prediction routine. You can decide which version is the active one, but only one version can be active at any time.

You can train a version on data and use persisted state to do predictions, or you can do stateless calculations that don't need training.

All models and versions are exposed through HTTP endpoints. Predictions that are performed on a model endpoint, will use the currently active version. To test a version that is not active, use the version endpoint instead of the parent model.

Active model version

Source packages and artifacts

A model version consist of two parts: a source package and an optional collection of artifacts.

  • A source package is a Python package that contains the Python code that defines the prediction routine. The source package contains additional metadata and specifies its dependencies. In combination with training data or optional artifacts, it defines a model version. Source packages can be reused across models and model versions.

  • Artifacts are files that represent state for the model version - for example a trained model. Artifacts are specific to each model version, and are either uploaded by the user (for example a model that has been trained locally) or produced during training in Model Hosting. You can also deploy stateless models without any artifacts (functions/calculations).

Model version

Schedules

Use schedules to run predictions at regular intervals, for example to monitor a piece of equipment using a sensor time series. To define a schedule, you need to specify the model that will do the prediction, which data from Cognite Data Fusion to use as input to the model, and where in Cognite Data Fusion to write the output from the model. You can set up any number of schedules to use the same model.

Schedule

Deploy to Model Hosting

Follow these steps to get your code running in Model Hosting:

  1. Create and upload the source package containing your Python code.
  2. Create a model.
  3. Create a model version using one of these methods:
  4. Optional: Set up a schedule.

Deploy a pre-trained or stateless model

Create a model version, upload the artifacts and trigger deployment manually.

Deploying pre-trained/stateless model version

Artifacts are optional and without them you can create a stateless model to do a simple function/calculation that doesn't require any training. To upload artifacts, you first have to produce them locally or in another service. This method is useful for simple models that you can train locally and when you don't want to wait for the training in Model Hosting which takes a few minutes to initiate.

Train in Model Hosting and deploy

Let Model Hosting do the training, create the artifacts, and deploy a model version with those artifacts.

Train and deploy a model version

This method is useful when you want to do resource intensive training and/or you don't have the necessary local setup.

You can deploy multiple versions under the same model, and you can mix pre-trained versions and versions trained in Model Hosting. Also, you can reuse source packages across model versions.

Setting up a schedule

Setting up a schedule

To define a schedule, you need to specify the model that will do the prediction, which data from Cognite Data Fusion to use as input to the model, and where in Cognite Data Fusion to write the output from the model. You can set up any number of schedules to use the same model. You can also create schedules without input or output data to define recurring jobs.

To learn more, see Using schedules.

Technical details

This section covers the technical details of Model Hosting in more depth:

Source packages

A source package is a Python package with these requirements:

  • The source package must contain a module named model.
  • The module must contain a class named Model.

A typical source package has this structure:

- <your-project>
    - setup.py
    - <your-package>
        - __init__.py
        - model.py
              Class Model:
                train()
                load()
                predict()
1
2
3
4
5
6
7
8
9

You can include additional files and packages, but the entry point into the model is defined in the model.py module with the Model class.

setup.py

The setup.py looks similar to this:

from setuptools import find_packages, setup

REQUIRED_PACKAGES = ["cognite-model-hosting==0.1.0", "scikit-learn==0.20.3"]

setup(
    name="my-package-name",
    version="1.0",
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    description="My Model",
)
1
2
3
4
5
6
7
8
9
10
11

This is where you specify your requirements, that is other Python packages your model depends on. We recommend that you pin the versions of your requirements to avoid future breaking changes.

The Model class

All source packages must have a class named Model located in a module named model (in a model.py file). Model Hosting uses this class to interact with your code, and the class should have one or more of these three methods:

  • static train()

    Use this method to do training in Model Hosting. If you do training locally or don't need training, you can leave out this method. Model Hosting calls this method when the training is initiated.

    The method should perform the training routine and persist any artifacts (parameters, serialized model, etc.) so that the model can be loaded later. To persist artifacts, use the open_artifact argument (see below). You can pass arbitrary arguments (JSON serializable) to the train() method using the args parameter.

    You can pass training data directly through an argument, but we recommend that you pass a data spec through a data_spec parameter in args. Inside the training routine, use DataFetcher to fetch the data described in the data spec.

    In addition to user-defined parameters, these parameters are available in the train method:

    • open_artifact

      Required as the first argument. Works as the builtin open(), but reads and writes to the root of the model version specific artifacts (stored in the cloud).

    • api_key

      Optional. The API key that initiated the training. Use to authenticate against Cognite Data Fusion.

    • project

      Optional. The Cognite Data Fusion project that the model version belongs to. Use to authenticate against Cognite Data Fusion.

  • static load()

    Model Hosting calls this method when it's deploying a model version before accepting predictions. It loads the necessary artifacts and returns an instance of the Model class.

    Note that this method can be called multiple times, not just during the first deployment. A model version can be deployed on multiple machines and transferred between machines behind the scenes. The method takes one argument:

    • open_artifact

      Works as the builtin open(), but reads from the root of the model version specific artifacts (stored in the cloud).

  • predict()

    Model Hosting calls this method when performing predictions. It can use the persisted state that was loaded in load().

    You can pass arbitrary arguments the same way as with the train method, and these parameters are always available:

    • instance

      Required as the first argument. The instance you should perform the prediction on. This can be arbitrary user-defined values (JSON serializable).

    • api_key

      Optional. The API key that initiated the training. Use to authenticate against Cognite Data Fusion.

    • project

      Optional. The Cognite Data Fusion project that the model version belongs to. Use to authenticate against Cognite Data Fusion.

Here is an example Model class:

from cognite.model_hosting.data_fetcher import DataFetcher


class Model:
    def __init__(self, my_state):
        self._my_state = my_state

    @staticmethod
    def train(open_artifact, data_spec):
        data_fetcher = DataFetcher(data_spec)
        training_result = ... # Do some training, using data_fetcher to fetch training data
        with open_artifact("my_state", "w") as f:
            f.write(training_result)

    @staticmethod
    def load(open_artifact):
        with file_io("my_state", "r") as f:
            my_state = f.read()
        return Model(my_state)

    def predict(self, instance):
        prediction = ... # use self._some_state to do some prediction
        return prediction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

How artifacts are used in Model Hosting:

Illustration of how artifacts are used

For more complete examples, see https://github.com/cognitedata/cognite-python-docs/tree/master/examples/model_hosting.

Runtime environment

All code runs in a Linux environment with Python 3.5.0. If you need other packages as dependencies, make sure you specify those in your package so that they are installed.

NOTE

Multiple predictions can run in parallel, both multithreaded in the same process and on multiple machines. Also, load() can be called several times to initiate multiple model instances.

Handling data in Model Hosting

To simplify data handling in Model Hosting, use the utilities in the cognite-model-hosting package to specify which data to work on, and to fetch the data. For details see the package documentation.

Data spec

A data spec, or data specification, specifies a set of data from one or more data sources in Cognite Data Fusion, for example time series and files. A data spec is a JSON document, and can easily be passed around. You can for example use it to define what data you want to train on, and pass it into the training routine.

Using data specs is much more scalable than passing data around. If you work on large amounts of data, you don't want to send all your data through your local machine all the time. It increases latency, it takes longer time, and bandwidth can be costly at scale.

Your models in Model Hosting run in the same data center as where the Cognite Data Fusion data is stored. Passing data specs into Model Hosting ensures that you get minimum latency and maximum bandwidth for your data fetching.

Data specs follow a strict format:

{
    "timeSeries": { // Optional
        "<time series alias>": {
            "id": 123,
            "start": 123456789000,
            "end": 123456789000,
            "aggregate": "average",       // Optional
            "granularity": "1m",          // Optional
            "includeOutsidePoints": false // Optional
        },
        ...
    },
    "files": { // Optional
        "<file alias>": {
            "id": 123
        },
        ...
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

The DataSpec class from the cognite-model-hosting package helps you build data specs in Python.

A key concept in data specs is aliases, user-defined identifiers for resources. Aliases let you abstract away specific resources from your algorithms. For example, instead of hard coding a specific time series from a temperature sensor in your code, you can accept a data spec that specifies a time series alias named temperature. This way you can easily re-use your code for another temperature sensor.

DataFetcher

Use the DataFetcher class to fetch the data that a data spec describes. You can for example pass a data spec to your prediction routine and then use the DataFetcher to fetch the data to run predictions on.

ScheduleDataSpec

All schedules in Model Hosting need to have a schedule data spec that specifies recurring data to run a schedule on. It follows follow a strict format:

{
    "stride": 123456789000,
    "windowSize": 123456789000,
    "start": 123456789000,
    "timeSeries": { // Optional
        "<time series alias>": {
            "id": 123,
            "aggregate": "average",       // Optional
            "granularity": "1m",          // Optional
            "includeOutsidePoints": false // Optional
        },
        ...
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

The ScheduleDataSpec class from the cognite-model-hosting package helps you build schedule data specs in Python. You can also use it to create the individual data specs that will be passed as instances to a schedule.

ScheduleOutput

To work with schedules, prediction routines must output a specific format. The schedules.to_output() function helps you format the output correctly, and the ScheduleOutput class helps you read the output format if you want to call the prediction routine manually outside of a schedule.

Using schedules

Schedules perform recurring predictions on time series from Cognite Data Fusion using a model, and then write the output back to Cognite Data Fusion. A model is not tied to one specific input or output time series, so you can have multiple schedules per model. The input and output specs of the schedule must match the input and output fields of its corresponding model.

Two parameters of a schedule decide when it will run:

  • stride - defines how often a schedule will run. A stride is specified in millisecond granularity and must be at least 1 minute. A schedule with a stride of 10 minutes will run every ten minutes.

  • start - is the timestamp of the first time the schedule will be run. The parameter lets you delay when a schedule should start. For example, you can start a schedule the next day, or run it once a day at a specific time.

If you set the stride to 24 hours and start to 8 PM the current evening, the schedule will run every day at 8 PM.

Illustration of stride, window size and start for a schedule

Schedule input

In a schedule, you can define how long back in time a prediction should look:

  • window_size - defines the time frame a scheduled prediction will take into account. If the window size is one hour, the prediction will receive input time series from the last hour.

You pass a time window to the prediction method through a data spec. The data spec that describes all data in the time window for that prediction is passed as an instance (see data spec above.) The prediction routine can then use the DataFetcher to fetch the appropriate data to do prediction on.

You can use both aggregates and raw data points from time series as input, but if you use aggregates certain restrictions apply (see Schedules and aggregates.)

If the input data might change or be delayed sometimes, use the slack parameter to control if Model Hosting should redo predictions:

  • slack - defines how long back in time Model Hosting should look for changes to the input data. If you set slack to 10 minutes, Model Hosting will redo predictions that depend on input data that has changed in the last 10 minutes. Currently, slack is limited to maximum 30 times the stride.

Schedule output

A scheduled model must produce output in a specific format:

{
    "timeSeries": {
        "<time series alias>": [
            [t0, v0],
            [t1, v1],
            [t2, v2],
            ...
        ],
        "<time series alias>": [
            [t0, v0],
            [t1, v1],
            [t2, v2],
            ...
        ],
        ...
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Where t0, t1, t2, ... are timestamps and v0, v1, v2, ... are the corresponding values.

Use the cognite.model_hosting.schedules.to_output() function in the cognite-model-hosting package to help you format your output.

A scheduled prediction can not output to arbitrary timestamps, but is restricted to output in a time interval as wide as the schedule stride. If you have a stride of one hour (regardless of the window size), you can output data points in an interval of one hour per prediction. This ensures that schedules don't write overlapping output.

You can use the offset parameter for each output time series to control where in time, relative to the prediction time window, the output is allowed to be.

Illustration of output offset for a time series

Schedules and aggregates

Aggregates have special properties and it's easy to make mistakes when running schedules on them. To make sure aggregates are consumed correctly, we impose restrictions on schedules when they use one or more aggregated time series as input.

The restrictions are:

  1. Stride, window size and start must all be a multiple of the largest granularity unit used in any input time series.
  2. Window size must be at least as large as the largest granularity used in any input time series.

If an aggregated time series has a granularity of "3h", its granularity unit is hour (but the granularity is three hours). It's important to understand that "60m" and "1h" are different granularities. Aggregates in Cognite Data Fusion are timestamped at the beginning of the interval they represent. For example, the hour aggregate for a time series between 3 PM and 4 PM will be timestamped 3 PM.

These restrictions ensure that aggregates in a time series are consumed correctly. All aggregates received from the data spec for a time window reflects data in that time window, and you will not accidentally or indirectly rely on data outside the window.

Managing the life cycle of resources

Resources in Model Hosting can consume significant compute and storage capacity, and it's important to manage the life cycle of the resources. Because resources in Model Hosting depend on each other, we restrict which resources you can delete to prevent you from breaking other resources and from losing important metadata.

The alternative to deleting resources, is to deprecate them instead. When a resource is deprecated, it is blocked from being used in new instances, and its compute and storage resources will be freed up as soon as the resource is not in use anymore. You can not reverse the deprecation of a resource, but all metadata for deprecated resources are kept for transparency and for keeping track of history.

For example, you can not delete a source package as long as it's being used by one or more model versions, but you can deprecate the package and prevent new model versions from using it.

Illustration of dependencies between different resources

Deprecation

Resource
Precondition
Effect
Model Can not be deprecated.
Model version Can not be the active version.
  • Frees up compute resources.
  • Deletes artifacts.
  • Can not perform predictions anymore.
Source package Can not be used by new model versions.
Schedule Can not be deprecated.

Deletion

Resource
Precondition
Effect
Model
  • Deletes all model versions.
  • Deletes all schedules.
Model version Can not be the active version.
  • Frees up compute resources.
  • Deletes artifacts.
Source package Can not be in use by any non-deprecated model version.
Schedule

FAQ

How can I use Keras in Model Hosting?

Because the load_model() method in Keras doesn't accept a standard Python file object, you must move it's content into an h5py file object. Also, make sure that you store the TensorFlow graph.

@staticmethod
def load(open_artifact):
    with open_artifact("model.h5", "rb") as artifact_file:
        with h5py.File(io.BytesIO(artifact_file.read()), "r") as f:
            model = keras_load_model(f)
            graph = tf.get_default_graph()
            return Model(model, graph)
1
2
3
4
5
6
7

When your model is deployed to Model Hosting, predictions will run in multiple threads and it's important that your predict() method is threadsafe. Keras has some global state, and will crash when it's used from multiple threads. To avoid this issue, set the default graph and initialize a new session for each prediction:

def predict(self, instance):
    ...
    with self.graph.as_default():
        with tf.Session().as_default() as sess:
            tf.initialize_all_variables().run(session=sess)
            prediction = self.model.predict(...)
    ...
1
2
3
4
5
6
7
Last Updated: 5/13/2019, 8:42:37 AM