Experimental

WARNING

This section includes functionality that is currently under development or stabilization. The following guides are developed to our alpha users to get early feedback before the features are made generally available. Some functionality might only be available in API v0.6.

Get calculated time series

In this example we will show how you can do on the fly calculations to get a time series that is a function, including multiple time series. This is useful if you want to analyze or visualize a delta between two pressures, the rate or other, more complex functions.

Query time functions are not supporting aggregating the result or applying the function on raw data, as this will in general be too expensive and will result in high latency. Functions that are supported is evaluating functions on the aggregates. That is we support (avg([id1]) + avg([id2])), but not avg([id1] + [id2]).

In addition to specifying raw timeseries in the query expression, we add the possibility of adding aliases for aggregate timeseries. An alias will be a separate json object that specifies the timeseries id, the aggregate to get, and the granularity. This alias can then be referenced in the function expression.

We allow mixing aggregates with raw data, as in [id1] - [average_of_id1], to give a graph of the deviation from the mean. In a similar vein, we allow specifying the granularity for each leaf node in the expression, so we can write [average_by_minute] - [average_by_hour] to graph how the minute averages deviate from the hour average over time.

This functionality is currently only available in API v0.6 under the following endpoint.

If you want to simply plot the difference between two time series you can simply call:

POST /api/0.6/projects/<project>/timeseries/dataquery
Host: api.cognitedata.com
1
2

Post body:

{
  "items": [
    {
      "function": "[127389341691737]-[6826834587953108]",
      "name": "delta_temperature",
      "start": "1541210091980",
      "end": "1542210091980"
    }
  ]
}
1
2
3
4
5
6
7
8
9
10

Response:

{
  "data": {
    "items": [
      {
        "name": "delta_temperature",
        "datapoints": [
          {
            "timestamp": 1541210400000,
            "value": 11.110638649316405
          },
          {
            "timestamp": 1541214000000,
            "value": 11.15609912846137
          },
          ...{
            "timestamp": 1542210071187,
            "value": 9.200000000000001
          },
          {
            "timestamp": 1542210082068,
            "value": 9.200000000000001
          }
        ]
      }
    ]
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Where the function describes the operation on time series IDs (127389341691737 and 6826834587953108), within the start and end time. Supported operations are the standard SQL operations described here.

If you want to specify more complex calculations you can define an alias, which allows for combination of multiple functions. As an example, we could calculate the deviation in temperature from the hourly average.

{
  "start": "1541210091980",
  "end": "1542210091980",
  "items": [
    {
      "function": "[insidetemp]-[avginsidetemp]",
      "name": "delta_temperature",
      "aliases": [
        {
          "aggregate": "avg",
          "alias": "avginsidetemp",
          "granularity": "1h",
          "id": 127389341691737
        },
        {
          "alias": "insidetemp",
          "id": 127389341691737
        }
      ]
    }
  ]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Response:

{
  "data": {
    "items": [
      {
        "name": "delta_temperature",
        "datapoints": [
          {
            "timestamp": 1541210094784,
            "value": 0.06935443055555623
          },
          {
            "timestamp": 1541210100548,
            "value": 0.06935443055555623
          },
          ...{
            "timestamp": 1542201392465,
            "value": 0.3948487499999995
          },
          {
            "timestamp": 1542201396948,
            "value": 0.3948487499999995
          }
        ]
      }
    ]
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

This could be further expanded as the user defines. Supported aggregate operations are average/avg, max, min, count, sum, interpolation/int, stepinterpolation/step. The query parameter granularity must also be specified, and defines the individual periods to calculte over. Valid entries for the granularity parameter are: day/d, hour/h, minute/m, second/s, or a multiple of these indicated by a number as a prefix, e.g. 12hour.

Model Hosting

Pass data using a Data Spec

A data spec is used to specify which data to pass to a given analytics job. Data from different datasources, such as Time Series or Files, can be combined in a single data spec.

{
  "timeSeriesDataSpecs": [
    {
      "timeSeries": [
        {
          "id": "12345",
          "aggregates": ["agg1"],
          "missingDataStrategy": "strat1",
          "label": "mylabel"
        }
      ],
      "aggregates": ["agg1"],
      "granularity": "gran",
      "missingDataStrategy": "strat1",
      "start": 0,
      "end": 10,
      "label": "aUserSpecifiedLabel"
    }
  ],
  "filesDataSpec": {
      "fileIds": {"name1": 1234567, "name2": 12345678, "name3": 123456789, ...}
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Using the Data Transfer Service

The following example shows how you can use the Data Transfer Service to generate a data spec and download some time series data.

import json
from cognite.data_transfer_service import DataSpec, DataTransferService, TimeSeries, TimeSeriesDataSpec

API_KEY = <your-api-key>
PROJECT = <your-project>

# Define time series
ts1 = TimeSeries(id=4536445397018257, aggregates=["step"], missing_data_strategy="ffill", label="my_special_ts")
ts2 = TimeSeries(id=8953361644869258, label="my_other_special_ts")

# Define ts data spec
ts_data_spec = TimeSeriesDataSpec(
    time_series=[ts1, ts2],
    aggregates=["avg"],
    granularity="10m",
    label="my_special_dataframe",
    start=1522188000000,
    end=1522620000000,
)

# Define data spec
data_spec = DataSpec(time_series_data_specs=[ts_data_spec])

# Let's take a look at the data spec
print(data_spec)

# Instantiate data transfer service
dts = DataTransferService(data_spec, api_key=API_KEY, project=PROJECT)

# Download ts dataframe
df = dts.get_dataframe("my_special_dataframe")

# Output dataframe
print(df)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Train and deploy a model in Model Hosting

The most simple path to training and deploying a model is straightforward:

  1. Create and upload a source package
  2. Create a Model
  3. Train a Model Version using provided training data and the uploaded source package

And then you can perform predictions with your model.

This Jupyter notebook walks you through these steps using a simple example model (you can find all the files here).

Schedule a model

If you want your model to monitor something or want to run some prediction regularly, you can set up model schedules. A schedule will regularly predictions on some model with some specified input data and write the output back to some specified resource in CDP.

This Jupyter notebook walks you through a simple example of setting up a schedule (you can find all the files here).

Last Updated: 11/16/2018, 1:01:55 PM