API Documentation

This document describes the Python API available in the SDK. The RESTful API exposed by the Data Attribute Recommendation service itself is described in the SAP Help Portal.

The API exposed by the Python SDK either maps directly to a RESTful API of the service or provides a convenient wrapper around the RESTful API.

This document is split into two sections. The Public APIs are classes and methods that we expect to be the most useful. They interface directly with the Data Attribute Recommendation service.

The Internal APIs are classes and methods which are used internally by the SDK. A user of the SDK is less likely to deal with them in their day-to-day work. We still consider documentation for these parts useful to serve as a reference.

This Internal API is still part of the API contract: if there is a breaking change to either the Internal or the Public API, this will warrant a release with an updated major version number as required by the semantic versioning scheme.

Note

Before upgrading to a new major version release of the SDK, carefully check the changelog for any breaking changes that might impact you.

Public API

Workflows

A workflow orchestrates calls over several of the Data Attribute Recommendation microservices.

Train a model from a CSV file.

class sap.aibus.dar.client.workflow.model.ModelCreator(url: str, source: CredentialsSource)[source]

This class provides a high-level means of training a model from a CSV file.

To construct an instance of this class, see the various construct_ methods such as construct_from_credentials() in BaseClient.

Internally, the class wraps and orchestrates DataManagerClient and ModelManagerClient.

__init__(url: str, source: CredentialsSource)[source]

create(data_stream: BinaryIO, model_template_id: str, dataset_schema: dict, model_name: str) → dict[source]

Trains a model from a CSV file.

Internally, this method creates the required DatasetSchema and Dataset entities, uploads the data and starts the training job. The method will block until the training job finishes.

Once this method returns, the model model_name can be deployed and used for inference.

This method will raise an Exception if an error occurs.

No clean up is performed: if for example a TrainingJobFailed or TrainingJobTimeOut exception occurs, the previously created Dataset and DatasetSchema will remain within the service and must be cleaned up manually.

Parameters

data_stream – binary stream containing a CSV file in UTF-8 encoding
model_template_id – the model template ID
dataset_schema – dataset schema as dict
model_name – name of the model to be trained

Raises

TrainingJobFailed – When training job has status FAILED
TrainingJobTimeOut – When training job takes too long

Raises

DatasetValidationTimeout: if validation takes too long

Raises

DatasetValidationFailed: if validation does not finish in state SUCCEEDED

Raises

ModelAlreadyExists: if model already exists at start of process

Returns

static format_dataset_name(model_name: str) → str[source]

Derives a Dataset name from a Model name.

For the purpose of automation, we automatically create a Dataset name from a Model name.

Return value has no more than 255 characters.

Parameters: model_name – Model name
Returns: suitable Dataset name

Data Manager

Client API for the Data Manager microservice.

sap.aibus.dar.client.data_manager_client.TIMEOUT_DATASET_VALIDATION = 14400: How long to wait for a dataset validation job to succeed.

class sap.aibus.dar.client.data_manager_client.DataManagerClient(url: str, credentials_source: CredentialsSource)[source]

The client class for the DAR DataManager microservice.

This class implements all basic API calls as well as some convenience methods which wrap individual API calls.

All methods return the JSON response returned by the server as dict, unless indicated otherwise.

If a HTTP API call fails, all methods will raise an DARHTTPException.

static polling_class() → Type[Polling][source]

Returns the Polling implementation used to wait on asynchronous processes.

This is rarely of interest to the end-user.

Returns: Polling implementation

create_dataset_schema(dataset_schema: dict) → dict[source]

Creates a DatasetSchema.

Parameters: dataset_schema – a DatasetSchema as python dict
Returns: the newly created DatasetSchema as dict

read_dataset_schema_collection() → dict[source]

Reads the collection of DatasetSchemas.

Returns: Dataset collection as dict

read_dataset_schema_by_id(dataset_schema_id: str) → dict[source]

Reads the DatasetSchema with the given dataset_schema_id.

Parameters: dataset_schema_id – ID of the DatasetSchema to be retrieved
Returns: a single DatasetSchema as dict

delete_dataset_schema_by_id(dataset_schema_id: str) → None[source]

Deletes the DatasetSchema with the given dataset_schema_id.

Parameters: dataset_schema_id – ID of the DatasetSchema to be deleted
Returns: None

create_dataset(dataset_name: str, dataset_schema_id: str) → dict[source]

Creates a Dataset with the given dataset_name and dataset_schema_id.

The dataset_schema_id must reference a previously created DatasetSchema (see create_dataset_schema()).

Parameters

dataset_name – Name of the Dataset to be created
dataset_schema_id – ID of DatasetSchema used for the Dataset

Returns

the newly created DatasetSchema as dict

read_dataset_collection() → dict[source]

Reads the collection of Datasets.

Returns: Dataset collection as dict

read_dataset_by_id(dataset_id: str) → dict[source]

Reads the Dataset identified by the given dataset_id.

Parameters: dataset_id – ID of the Dataset to be retrieved
Returns: Dataset as dict

delete_dataset_by_id(dataset_id: str) → None[source]

Deletes the Dataset identified by dataset_id.

Parameters: dataset_id – ID of the Dataset to be deleted
Returns: None

upload_data_to_dataset(dataset_id: str, data_stream: BinaryIO) → dict[source]

Uploads data to a Dataset.

Data can only be uploaded once per Dataset. If the Dataset status is not NO_DATA, the server will return a corresponding error message.

During the upload process, the Dataset will have status UPLOADING. In this state, it is not possible to delete the Dataset. If the upload is interrupted (i.e. due to network problems), please wait for fifteen minutes before deleting the dataset. After fifteen minutes, it is possible to delete the Dataset even if it is in status UPLOADING.

After the upload, the status of the dataset will be VALIDATING.

Data upload is an asynchronous process. After data upload, the dataset will be validated in a background process.

Use read_dataset_by_id() to poll the dataset until is_dataset_validation_finished() returns True. An implementation of this algorithm is available in wait_for_dataset_validation().

A blocking version of entire process including upload and validation is available in upload_data_and_validate().

The data_stream parameter must be a stream which returns bytes. When reading from a file, simply open the file in binary mode:

file_handle = open("my_file.csv", mode='rb')
client.upload_data_to_dataset(
    'your-dataset-identifier',
    file_handle
)

Note

The file must already be encoded in UTF-8 format. The DAR service only supports UTF-8. If you are using a GZIP file, ensure the content of the file prior to compression is encoded as UTF-8. If the file is not encoded as UTF-8, the service will reject the file during validation.

Parameters

dataset_id – identifier of the dataset
data_stream – a data stream returning bytes

Returns

API response as dict

wait_for_dataset_validation(dataset_id: str, timeout_seconds: int = 14400) → dict[source]

Waits for a Dataset to finish validation.

This method will return once the validation process is finished. Do check the status to ensure that the validation process is SUCCEEDED.

This will repeatedly retrieve the Dataset from the DAR service until the Dataset is no longer in status VALIDATING.

The timeout in the timeout_in_seconds parameter dictates how long the method will wait for the validation to finish. Note that this is not a hard guarantee on the time it takes to execute this method! After the timeout expires, the dataset will be retrieved one last time to check the status.

Returns the API response of the last GET on the Dataset.

Note

The act of retrieving the dataset can add a significant amount of time to the timeout_in_seconds due to network latency and service behavior. Unless overriden, the underlying HTTP implementation in DARSession uses its own timeouts to prevent the HTTP requests from blocking the entire application.

Parameters

dataset_id – identifier of the dataset
timeout_seconds – how long to wait before giving up

Returns

API response of final GET on dataset

Raises

DARDatasetInvalidStateException: if dataset in status NO_DATA or UPLOADING

Raises

DatasetValidationTimeout: if validation takes longer than timeout_in_seconds

Raises

DatasetValidationFailed: if validation does not finish in state SUCCEEDED

upload_data_and_validate(dataset_id: str, data_stream: BinaryIO) → dict[source]

Uploads a dataset and waits for validation to finish.

This is a simple wrapper around upload_data_to_dataset() and wait_for_dataset_validation(). See these methods for possible exceptions.

Parameters

dataset_id – identifier of the dataset
data_stream – a data stream returning bytes

Returns

API response of final GET on Dataset as dict

static is_dataset_validation_finished(dataset: dict) → bool[source]

Returns True if a Dataset has a final state.

This does not imply that the Dataset validation is SUCCEEDED; it merely checks if the process has finished.

Also see is_dataset_validation_failed().

Parameters: dataset – Dataset Resource as returned by API
Returns: True if validation process is finished, succesful or not
Raises: DatasetInvalidStateException if validation has not yet started

static is_dataset_validation_failed(dataset: dict) → bool[source]

Returns True if a Dataset validation has failed.

A return value of False does not imply that the Dataset was validated successfully. The Deployment is simply in a non-failed state. This can also be any non-final state.

Also see is_dataset_validation_finished().

Parameters: dataset – Dataset Resource as returned by API
Returns: True if Dataset validation has failed

Constants for the DataManagerClient.

class sap.aibus.dar.client.data_manager_constants.DatasetStatus(value)[source]

Possible values for the status field of a Dataset.

See the section on Dataset Lifecycle in the official DAR documentation.

NO_DATA = 'NO_DATA': No data has been uploaded yet.

UPLOADING = 'UPLOADING': Data is currently being uploaded.

VALIDATING = 'VALIDATING': Validation is in process.

INVALID_DATA = 'INVALID_DATA': Uploaded data is invalid, i.e. not a CSV or does not match DatasetSchema.

VALIDATION_FAILED = 'VALIDATION_FAILED': Internal Server Error occured during validation. Create a new Dataset.

PROGRAM_ERROR = 'PROGRAM_ERROR': Internal Server Error occured during validation. Create a new Dataset.

SUCCEEDED = 'SUCCEEDED': Validation finished successfully. The Dataset may be used for training.

class sap.aibus.dar.client.data_manager_constants.DataManagerPaths[source]

Endpoints for the DAR DataManager microservice.

ENDPOINT_DATASET_SCHEMA_COLLECTION = '/data-manager/api/v3/datasetSchemas': Path for the DatasetSchema collection

ENDPOINT_DATASET_COLLECTION = '/data-manager/api/v3/datasets': Path for the Dataset collection

static format_dataset_schemas_endpoint_by_id(identifier: str) → str[source]

Returns the path of a DatasetSchema with given identifier.

>>> DataManagerPaths.format_dataset_schemas_endpoint_by_id(            '9ac12220-b0b2-45ec-a81b-5dd5ca6536e9')
'/data-manager/api/v3/datasetSchemas/9ac12220-b0b2-45ec-a81b-5dd5ca6536e9'

Parameters: identifier – ID of DatasetSchema
Returns: endpoint path component

static format_dataset_endpoint_by_id(identifier: str) → str[source]

Returns the path of a Dataset with given identifier.

>>> DataManagerPaths.format_dataset_endpoint_by_id(            '9678dcdd-239e-4dfc-8795-5924152c97a3')
'/data-manager/api/v3/datasets/9678dcdd-239e-4dfc-8795-5924152c97a3'

Parameters: identifier – ID of Dataset
Returns: endpoint path component

classmethod format_data_endpoint_by_id(identifier: str) → str[source]

Returns the path of the upload endpoint for a Dataset with given identifier.

>>> DataManagerPaths.format_data_endpoint_by_id(            'd862fcba-06b1-4eaa-93c1-a0b5980938f5')
'/data-manager/api/v3/datasets/d862fcba-06b1-4eaa-93c1-a0b5980938f5/data'

Parameters: identifier – ID of Dataset
Returns: endpoint path component

Model Manager

Client API for the Model Manager microservice.

sap.aibus.dar.client.model_manager_client.TIMEOUT_DEPLOYMENT_SECONDS = 1800: How long to wait for a deployment to succeed.

sap.aibus.dar.client.model_manager_client.INTERVALL_DEPLOYMENT_SECONDS = 45: How frequently to poll a deployment for its status

sap.aibus.dar.client.model_manager_client.TIMEOUT_TRAINING_JOB_SECONDS = 86400: How long to wait for a training job to succeed.

sap.aibus.dar.client.model_manager_client.INTERVALL_TRAINING_JOB_SECONDS = 60: How frequently to poll a training job for its status

class sap.aibus.dar.client.model_manager_client.ModelManagerClient(url: str, credentials_source: CredentialsSource)[source]

The client class for the DAR ModelManager microservice.

This class implements all basic API calls as well as some convenience methods which wrap individual API calls.

All methods return the JSON response returned by the server as dict, unless indicated otherwise.

If a HTTP API call fails, all methods will raise an DARHTTPException.

static polling_class() → Type[Polling][source]

Returns the Polling implementation used to wait on asynchronous processes.

This is rarely of interest to the end-user.

Returns: Polling implementation

read_model_template_collection() → dict[source]

Reads the collection of ModelTemplates.

For details, see the section on Model Templates in the official DAR documentation.

Returns: ModelTemplate collection as dict

read_model_template_by_id(model_template_id: str) → dict[source]

Reads the ModelTemplate with the given model_template_id.

For details, see the section on Model Templates in the official DAR documentation.

Parameters: model_template_id – ID of the ModelTemplate to be retrieved
Returns: a single ModelTemplate as dict

read_job_collection() → dict[source]

Reads the collection of all Jobs.

Returns: Job collection as dict

read_job_by_id(job_id: str) → dict[source]

Reads the Job with the given job_id.

Parameters: job_id – ID of the Job to be retrieved.
Returns: a single Job as dict
Raises: JobNotFound – when no Job with given model name is found

read_job_by_model_name(model_name: str) → dict[source]: Reads Job with the given model_name :param model_name: name of model :return: a single Job as dict :raises

delete_job_by_id(job_id: str) → None[source]

Deletes the Job with the given job_id.

Will raise a DARHTTPException if operation fails.

Parameters: job_id – ID of the Job to be deleted
Returns: None
Raises: DARHTTPException – if server returned an error

create_job(model_name: str, dataset_id: str, model_template_id: Optional[str] = None, business_blueprint_id: Optional[str] = None) → dict[source]

Creates a training Job.

A training Job is an asynchronous process and can take a few minutes or even several hours, depending on the data set and the system load.

Initially, the training job will be in status RUNNING or PENDING. Use read_job_by_id() to poll for status changes. Alternatively, use wait_for_job() to wait for the job to succeed.

A convenience method is available at create_job_and_wait() which will submit a job and wait for its completion.

Parameters

model_name – Name of the model to train
dataset_id – Id of previously uploaded, valid dataset
model_template_id – Model template ID for training
business_blueprint_id – Business Blueprint template ID for training

Raises

CreateTrainingJobFailed – When business_blueprint_id and model_template_id are provided or when both are not provided

Returns

newly created Job as dict

create_job_and_wait(model_name: str, dataset_id: str, model_template_id: Optional[str] = None, business_blueprint_id: Optional[str] = None)[source]

Starts a job and waits for the job to finish.

This method is a thin wrapper around create_job() and wait_for_job().

Parameters

model_name – Name of the model to train
dataset_id – Id of previously uploaded, valid dataset
model_template_id – Model template ID for training
business_blueprint_id – Business Blueprint ID for training

Raises

TrainingJobFailed – When training job has status FAILED
TrainingJobTimeOut – When training job takes too long

Returns

API response as dict

wait_for_job(job_id: str) → dict[source]

Waits for a job to finish.

Parameters

job_id – ID of job

Raises

TrainingJobFailed – When training job has status FAILED
TrainingJobTimeOut – When training job takes too long

Returns

Job resource from last API call

static is_job_finished(job_resource: dict) → bool[source]

Returns True if a Job has a final state.

This does not imply that the Job was successful; it merely checks if the process has finished.

Also see is_job_failed().

Parameters: job_resource – Job resource as returned by API
Returns: True if Job is in final state

static is_job_failed(job_resource: dict) → bool[source]

Returns True if a Job has failed.

A return value of False does not imply that the Job has finished successfully. The Job is simply in a non-failed state, e.g. in RUNNING.

Also see is_job_finished().

Parameters: job_resource – Job resource as returned by API
Returns: True if Job has failed

read_model_collection() → dict[source]

Reads the collection of trained Models.

Returns: Model collection as dict

read_model_by_name(model_name: str) → dict[source]

Reads a Model by name.

Parameters: model_name – name of Model
Returns: a single Model as dict

delete_model_by_name(model_name: str) → None[source]

Deletes a Model by name.

Parameters: model_name – name of Model to be deleted
Returns: None

read_deployment_collection() → dict[source]

Reads the collection of Deployments.

A deployment is a deployed Model and can be used for Inference.

Returns: Deployment collection as dict

read_deployment_by_id(deployment_id: str) → dict[source]

Reads a Deployment by ID.

Parameters: deployment_id – ID of the Deployment
Returns: a single Deployment as dict

create_deployment(model_name: str) → dict[source]

Creates a Deployment for the given model_name.

The creation of a Deployment is an asynchronous process and can take several minutes.

Initially, the Deployment will be in status PENDING. Use read_deployment_by_id() or the higher-level wait_for_deployment() to poll for status changes.

Parameters: model_name – name of the Model to deploy
Returns: a single Deployment as dict

delete_deployment_by_id(deployment_id: str) → None[source]

Deletes a Deployment by ID.

To delete a Deployment by Model name, see ensure_model_is_undeployed().

Parameters: deployment_id – ID of the Deployment to be deleted
Returns: None

ensure_model_is_undeployed(model_name: str) → Optional[str][source]

Ensures that a Model is not deployed.

If the given Model is deployed, the Deployment is deleted. The status of the Deployment is not considered here. Returns the Deployment ID in this case.

If the Model is not deployed, the method does nothing. It is not an error if the Model is not deployed. Returns None if the Model is not deployed.

This method is a thin wrapper around lookup_deployment_id_by_model_name() and delete_deployment_by_id().

Parameters: model_name – name of the model to undeploy
Returns: ID of the deleted Deployment or None

wait_for_deployment(deployment_id: str) → dict[source]

Waits for a deployment to succeed.

Raises a DeploymentTimeOut if the Deployment process does not finish within a given timeout (TIMEOUT_DEPLOYMENT_SECONDS). Even after the exception has been raised, the Deployment can still succeed in the background even.

Note

A Deployment in status SUCCEEDED can incur costs.

Parameters

deployment_id – ID of the Deployment

Raises

DeploymentTimeOut – If Deployment does not finish within timeout
DeploymentFailed – If Deployment fails

Returns

Deployment resource as returned by final API call

deploy_and_wait(model_name: str) → dict[source]

Deploys a Model and waits for Deployment to succeed.

This method is a thin wrapper around create_deployment() and wait_for_deployment().

Parameters

model_name – Name of the Model to deploy

Raises

DeploymentTimeOut – If Deployment does not finish within timeout
DeploymentFailed – If Deployment fails

Returns

Model resource from final API call

ensure_deployment_exists(model_name: str) → dict[source]

Ensures a Deployment exists and is not failed.

Deploys the given model_name if not Deployment exists yet. If the Deployment is in a failed state, the existing Deployment is deleted and a new Deployment is created.

Note that the newly created Deployment will be in state PENDING. See the remarks on create_deployment() and wait_for_deployment().

Parameters: model_name – Name of the Model to deploy
Returns: Deployment resource

lookup_deployment_id_by_model_name(model_name: str) → Optional[str][source]

Returns the Deployment ID for a given Model name.

If the Model is not deployed, this will return None.

Parameters: model_name – name of the Model to check
Returns: Deployment ID or None

static is_deployment_finished(deployment_resource: dict)[source]

Returns True if a Deployment has a final state.

This does not imply that the Deployment is operational; it merely checks if the creation of the Deployment failed or succeeded.

Also see is_deployment_failed().

Parameters: deployment_resource – Deployment resource as returned by API
Returns: True if Deployment has final state

static is_deployment_failed(deployment_resource: dict)[source]

Returns True if a Deployment has failed.

A return value of False does not imply that the Deployment is operational. The Deployment can also be in state PENDING.

Also see is_deployment_finished().

Parameters: deployment_resource – Deployment resource as returned by API
Returns: True if Deployment is failed

read_business_blueprint_template_collection() → dict[source]: Reads the collection of BusinessBlueprint Template. :return: BusinessBlueprint collection as dict

read_business_blueprint_template_by_id(business_blueprint_id: str) → dict[source]: Reads the BusinessBlueprintTemplate with the given business_blueprint_id. :param business_blueprint_id: ID of the BusinessBlueprint to be retrieved :return: a single BusinessBlueprintTemplate as dict

Constants for the ModelManagerClient.

class sap.aibus.dar.client.model_manager_constants.JobStatus(value)[source]

Possible values for the status field of a Job.

See the section on Training Job Lifecycle in the official DAR documentation.

PENDING = 'PENDING': Job has been enqueued.

RUNNING = 'RUNNING': Job is now being processed.

SUCCEEDED = 'SUCCEEDED': Job finished successfully and Model is ready for Deployment.

FAILED = 'FAILED': Training Job failed. Please try again.

class sap.aibus.dar.client.model_manager_constants.DeploymentStatus(value)[source]

Possible values for the status field of a Deployment.

See the section on Deployment Lifecycle in the official DAR documentation.

PENDING = 'PENDING': status PENDING for a Deployment

SUCCEEDED = 'SUCCEEDED': Deployment is successful and theMmodel can now be used for Inference.

FAILED = 'FAILED': Deployment has failed. Delete Deployment and deploy Model again.

STOPPED = 'STOPPED': Deployment is stopped (i.e. on trial accounts). Delete Deployment and deploy Model again.

class sap.aibus.dar.client.model_manager_constants.ModelManagerPaths[source]

Endpoints for the DAR ModelManager microservice.

ENDPOINT_MODEL_TEMPLATE_COLLECTION = '/model-manager/api/v3/modelTemplates': Path for the ModelTemplate collection

ENDPOINT_JOB_COLLECTION = '/model-manager/api/v3/jobs': Path for Job collection

ENDPOINT_MODEL_COLLECTION = '/model-manager/api/v3/models': Path for the Model collection

ENDPOINT_DEPLOYMENT_COLLECTION = '/model-manager/api/v3/deployments': Path for the Deployment collection

ENDPOINT_BUSINESS_BLUEPRINT_TEMPLATE_COLLECTION = '/model-manager/api/v3/businessBlueprints': Path for the BusinessBlueprint collection

classmethod format_model_templates_endpoint_by_id(model_template_id: str) → str[source]

Returns the path of a ModelTemplate with given identifier.

>>> ModelManagerPaths.format_model_templates_endpoint_by_id('d7810207-ca31-4d4d-9b5a-841a644fd81f')
'/model-manager/api/v3/modelTemplates/d7810207-ca31-4d4d-9b5a-841a644fd81f'

Parameters: model_template_id – identifier of ModelTemplate
Returns: endpoint, to be used as URL component

classmethod format_job_endpoint_by_id(job_id: str) → str[source]

Returns the path of a Job with given identifier.

>>> ModelManagerPaths.format_job_endpoint_by_id(            '222936e3-0350-4cd2-903d-67cb712b6af6')
'/model-manager/api/v3/jobs/222936e3-0350-4cd2-903d-67cb712b6af6'

Parameters: job_id – identifier of job
Returns: endpoint, to be used as URL component

classmethod format_model_endpoint_by_name(model_name: str)[source]

Returns the path of a Model with given name.

>>> ModelManagerPaths.format_model_endpoint_by_name('my-model')
'/model-manager/api/v3/models/my-model'

Parameters: model_name – name of the Model
Returns: endpoint, to be used as URL component

classmethod format_deployment_endpoint_by_id(deployment_id: str)[source]

Returns the path of a Deployment with given name.

>>> ModelManagerPaths.format_deployment_endpoint_by_id(                'c45928f5-179c-451e-ae0d-ea33c26391ea')
'/model-manager/api/v3/deployments/c45928f5-179c-451e-ae0d-ea33c26391ea'

Parameters: deployment_id – name of the Model
Returns: endpoint, to be used as URL component

classmethod format_business_blueprint_endpoint_by_id(business_blueprint_id: str) → str[source]

Returns the path of a BusinessBlueprintTemplate with given identifier.

>>> ModelManagerPaths.format_business_blueprint_endpoint_by_id('4788254b-0bad-4757-a67f-92d5b55f322d')
'/model-manager/api/v3/businessBlueprints/4788254b-0bad-4757-a67f-92d5b55f322d'

Parameters: business_blueprint_id – identifier of BusinessBlueprintTemplate
Returns: endpoint, to be used as URL component

Inference

Client API for the Inference microservice.

sap.aibus.dar.client.inference_client.LIMIT_OBJECTS_PER_CALL = 50: How many objects can be processed per inference request

sap.aibus.dar.client.inference_client.TOP_N = 1: How many labels to predict for a single object by default

class sap.aibus.dar.client.inference_client.InferenceClient(url: str, credentials_source: CredentialsSource)[source]

A client for the DAR Inference microservice.

This class implements all basic API calls as well as some convenience methods which wrap individual API calls.

If the API call fails, all methods will raise an DARHTTPException.

create_inference_request(model_name: str, objects: List[dict], top_n: int = 1, retry: bool = True) → dict[source]

Performs inference for the given objects with model_name.

For each object in objects, returns the topN best predictions.

The retry parameter determines whether to retry on HTTP errors indicated by the remote API endpoint or for other connection problems. See Resilience and Error Recovery for trade-offs involved here.

Note

This endpoint called by this method has a limit of LIMIT_OBJECTS_PER_CALL on the number of objects. See do_bulk_inference() to circumvent this limit.

Changed in version 0.13.0: The retry parameter now defaults to true. This increases reliability of the call. See corresponding note on do_bulk_inference().

Parameters

model_name – name of the model used for inference
objects – Objects to be classified
top_n – How many predictions to return per object
retry – whether to retry on errors. Default: True

Returns

API response

do_bulk_inference(model_name: str, objects: List[dict], top_n: int = 1, retry: bool = True, worker_count: int = 4) → List[Optional[dict]][source]

Performs bulk inference for larger collections.

For objects collections larger than LIMIT_OBJECTS_PER_CALL, splits the data into several smaller Inference requests.

Requests are executed in parallel.

Returns the aggregated values of the predictions of the original API response as returned by create_inference_request(). If one of the inference requests to the service fails, an artificial prediction object is inserted with the labels key set to None for each of the objects in the failing request.

Example of a prediction object which indicates an error:

{
    'objectId': 'b5cbcb34-7ab9-4da5-b7ec-654c90757eb9',
    'labels': None,
    '_sdk_error': 'RequestException: Request Error'
}

In case the objects passed to this method do not contain the objectId field, the value is set to None in the error prediction object:

{
    'objectId': None,
    'labels': None,
    '_sdk_error': 'RequestException: Request Error'
}

Note

This method calls the inference endpoint multiple times to process all data. For non-trial service instances, each call will incur a cost.

To reduce the impact of a failed request, this method will retry failed requests.

There is a small chance that even retried requests will be charged, e.g. if a problem occurs with the request on the client side outside the control of the service and after the service has processed the request. To disable retry behavior simply pass retry=False to the method.

Typically, the default behavior of retry=True is safe and improves reliability of bulk inference greatly.

Changed in version 0.7.0: The default for the retry parameter changed from retry=False to retry=True for increased reliability in day-to-day operations.

Changed in version 0.12.0: Requests are now executed in parallel with up to four threads.

Errors are now handled in this method instead of raising an exception and discarding inference results from previous requests. For objects where the inference request did not succeed, a replacement dict object is placed in the returned list. This dict follows the format of the ObjectPrediction object sent by the service. To indicate that this is a client-side generated placeholder, the labels key for all ObjectPrediction dicts of the failed inference request has value None. A _sdk_error key is added with the Exception details.

New in version 0.12.0: The worker_count parameter allows to fine-tune the number of concurrent request threads. Set worker_count to 1 to disable concurrent execution of requests.

Parameters

model_name – name of the model used for inference
objects – Objects to be classified
top_n – How many predictions to return per object
retry – whether to retry on errors. Default: True
worker_count – maximum number of concurrent requests

Raises

InvalidWorkerCount if worker_count param is incorrect

Returns

the aggregated ObjectPrediction dictionaries

create_inference_request_with_url(url: str, objects: List[dict], top_n: int = 1, retry: bool = True) → dict[source]

Performs inference for the given objects against fully-qualified URL. A complete inference URL can be the passed to the method inference, instead of constructing URL from using base url and model name

Changed in version 0.13.0: The retry parameter now defaults to true. This increases reliability of the call. See corresponding note on do_bulk_inference().

Parameters

url – fully-qualified inference URL
objects – Objects to be classified
top_n – How many predictions to return per object
retry – whether to retry on errors. Default: True

Returns

API response

Constants for the InferenceClient.

class sap.aibus.dar.client.inference_constants.InferencePaths[source]

Endpoints for the DAR Inference microservice.

static format_inference_endpoint_by_name(model_name: str)[source]

Returns the path of an InferenceRequest for the given model_name.

>>> InferencePaths.format_inference_endpoint_by_name("test-model")
'/inference/api/v3/models/test-model/versions/1'

Parameters: model_name – name of the model
Returns: endpoint, to be used as URL component

Internal API

The Credentials Module

This module is concerned with retrieval of access tokens for the DAR service.

The code here is a low-level detail and should rarely be used by regular users. Instead, refer to the higher-level API.

class sap.aibus.dar.client.util.credentials.CredentialsSource[source]

Abstract BaseCredentialsSource base class.

token() → str[source]

Returns an access token for the DAR service.

Must be implemented by subclasses.

Returns: the token as string

class sap.aibus.dar.client.util.credentials.StaticCredentialsSource(token: str)[source]

CredentialsSource which is configured with a single token.

This class is mainly useful for compatibility. It allows the use of tokens obtained by some other means or where no credentials are known.

__init__(token: str)[source]

Constructor.

Parameters: token – an existing DAR access token

token()[source]

Returns the DAR access token given during object construction.

Returns: pre-configured DAR access token

class sap.aibus.dar.client.util.credentials.OnlineCredentialsSource(url: str, clientid: str, clientsecret: str, session: Optional[HttpMethodsProtocol] = None, timer: Optional[Callable[[], float]] = None)[source]

Retrieves a token from the authentication server.

The token will be cached internally for the validity period indicated by the authentication server. Once the token is expired, a new token is fetched. It is thus a good idea to keep a single instance of this class instead of re-creating an instance on demand.

The token caching is internal to this class and opaque to the caller.

__init__(url: str, clientid: str, clientsecret: str, session: Optional[HttpMethodsProtocol] = None, timer: Optional[Callable[[], float]] = None)[source]

Constructor.

The `session` and `timer parameters are mainly useful for unit testing and have useful defaults.

See construct_from_service_key() to create an instance from a service key instead of giving the individual parameters.

Parameters

url – URL of OAuth server from DAR credentials
clientid – clientid from DAR credentials
clientsecret – clientsecret from DAR credentials
session – Optional: HTTP session class
timer – Optional: Timer function used for caching

classmethod construct_from_service_key(service_key: dict) → OnlineCredentialsSource[source]

Creates an instance from a DAR service key.

>>> # service_key is abbreviated from real example
>>> service_key = {
...  "uaa": {
...   "clientid": "sb-d3287831-4997-9deb-a09cf1dcf!b4321|dar-v3-std!b4321",
...   "clientsecret": "XXXXXX",
...   "url": "https://abcd.authentication.sap.hana.ondemand.com",
...  },
...  "url": "https://aiservices-dar.cfapps.xxx.hana.ondemand.com/"
... }
>>> source = OnlineCredentialsSource.construct_from_service_key(service_key)
>>> source.url
'https://abcd.authentication.sap.hana.ondemand.com'

Parameters: service_key – DAR service key as Python dictionary
Returns: CredentialsSource instance

token() → str[source]

Returns an access token for the DAR service.

Must be implemented by subclasses.

Returns: the token as string

Exceptions

All exceptions raised by the DAR client implementation itself.

exception sap.aibus.dar.client.exceptions.DARException[source]

General error in the DAR client.

This is the base exception class for the DAR client. All exceptions raised by the client itself inherit from this class.

Note that other libraries used internally will raise their own exceptions. In particular, see DARSession for its use of HTTP libraries and their exceptions.

exception sap.aibus.dar.client.exceptions.HTTPSRequired[source]

URLs must use a HTTPS-scheme.

__init__()[source]

exception sap.aibus.dar.client.exceptions.DARPollingTimeoutException[source]: Operation being polled took too long to finish.

exception sap.aibus.dar.client.exceptions.DatasetValidationTimeout[source]: Dataset took too long to finish its validation process.

exception sap.aibus.dar.client.exceptions.DatasetValidationFailed[source]: Dataset validation finished with a non-success state.

exception sap.aibus.dar.client.exceptions.InvalidStateException[source]: A resource was in an unexpected state.

exception sap.aibus.dar.client.exceptions.DatasetInvalidStateException[source]: Dataset was in an unexpected state.

exception sap.aibus.dar.client.exceptions.TrainingJobTimeOut[source]: Training took too long to finish.

exception sap.aibus.dar.client.exceptions.TrainingJobFailed[source]: Training job failed.

exception sap.aibus.dar.client.exceptions.DeploymentTimeOut[source]: Deployment took too long too succeed.

exception sap.aibus.dar.client.exceptions.DeploymentFailed[source]: Deployment finished with a non-success state.

exception sap.aibus.dar.client.exceptions.CreateTrainingJobFailed[source]: Create training job failed.

exception sap.aibus.dar.client.exceptions.JobNotFound[source]: Training job not found

exception sap.aibus.dar.client.exceptions.InvalidWorkerCount[source]: Invalid worker_count parameter is specified.

New in version 0.12.0.

exception sap.aibus.dar.client.exceptions.ModelAlreadyExists(model_name: str)[source]

Model already exists and must be deleted first.

Note that this is not really used by the ModelManagerClient, but rather by higher-level methods in ModelCreator and similar.

For methods interacting directly with the API, a request which will conflict will instead raise a DARHTTPException with an appropriate code.

__init__(model_name: str)[source]

Constructor.

Param: model_name: Name of the model which alreadx exists

exception sap.aibus.dar.client.exceptions.DARHTTPException(url: str, response: Response)[source]

Error occured when talking to the DAR service over HTTP.

This exception exposes many debug-level details which are highly useful when investigating a problem with the service.

Note that this exception will only be used if the server actually sent a response. Connection problems can cause the connection to abort before a response is sent.

When creating a ticket, please include as much information as possible.

__init__(url: str, response: Response)[source]

property response: Response

The full requests.Response object.

Returns: the original API response object

property request: PreparedRequest

The full requests.PreparedRequest sent to the DAR service.

Returns: the original request object

property status_code: int

The HTTP status of the response.

Returns: response status code

property response_body: str

Returns response body.

Is pretty printed if response body is JSON or returned as-is otherwise.

Returns: response body as string

property response_reason: str

Returns the reason phrase sent along the status code.

This can be useful to understand better the reason for a given status code sent by the server.

Returns: reason phrase as string

property correlation_id: Optional[str]

The correlation ID, if sent by the server.

The correlation ID is a technical identifier for individual requests and useful when investigating any problems encountered while processing a request.

Returns: correlation ID

property vcap_request_id: Optional[str]

The VCAP request ID, if sent by the server.

The VCAP request ID is a technical identifier for individual requests and useful when investigating any problems encountered while processing a request.

Returns: VCAP request ID

property server_header: Optional[str]

Value of the SERVER HTTP header, if sent by the server.

Returns: SERVER HTTP header.

property cf_router_error: Optional[str]

Value of the X-CF-RouteError header, if sent by the server.

Returns: X-CF-RouteError HTTP header.

classmethod create_from_response(url: str, response: Response)[source]

Factory method to create exception from a server response.

Parameters

url – URL of the request
response – response sent by the server

Returns

the exception object

property debug_message

Returns a debug message with useful details on request and response.

Returns: details on request and response

HTTP Connections

This module contains the HTTP Transport layer used to interact with the DAR service.

class sap.aibus.dar.client.dar_session.DARSession(base_url: str, credentials_source: CredentialsSource)[source]

A HTTP client for the DAR service.

This client provides some lower-level primitives to interact with the ReST API of the DAR service.

The client is aware of the base URL of the service and all request methods expect the path component to be passed instead of the full URL.

All requests are authenticated.

The requests methods return a requests.Response object. All methods can raise a DARHTTPException. The underlying requests library may raise requests.RequestException.

This class internally uses TimeoutRetrySession.

__init__(base_url: str, credentials_source: CredentialsSource)[source]

Constructor.

Example construction:

Parameters

base_url – Base URL of the service.
credentials_source – CredentialsSource used for authentication

get_from_endpoint(endpoint: str) → Response[source]

Performs GET request against endpoint.

Parameters: endpoint – Path component of URL
Returns: the requests.Response object.
Raise: DARHTTPException
Raise: RequestException

delete_from_endpoint(endpoint: str) → Response[source]

Performs DELETE request against endpoint.

Parameters: endpoint – Path component of URL
Returns: requests.Response
Raise: DARHTTPException
Raise: RequestException

post_to_endpoint(endpoint: str, payload: dict, retry: bool = False) → Response[source]

Performs POST request against endpoint.

The given payload is encoded as JSON and sent as the body of the request.

If retry is True, the request will be retried in case of errors. This includes HTTP error status codes in the response returned by the remote API endpoint as well as network issues such as read timeouts or connection resets. Note that errors occuring before the connection is initially established are always retried.

See Resilience and Error Recovery for trade-offs involved here.

Parameters

endpoint – Path component of URL
payload – Body of the request. Will be encoded to JSON.
retry – whether to retry on failed requests. Defaults to False.

Returns

requests.Response

Raise

DARHTTPException

Raise

RequestException

post_data_to_endpoint(endpoint: str, data_stream: BinaryIO) → Response[source]

Performs POST request with raw data against endpoint.

The data_stream argument must be a binary file or a compatible object. Effectively, the data_stream should have a read() method which returns byte, not str.

Parameters

endpoint – Path component of URL
data_stream – data to be uploaded as a file-like object

Returns

requests.Response

Raise

DARHTTPException

Raise

RequestException

post_to_url(url: str, payload: dict, retry: bool = False) → Response[source]

Performs POST request against fully-qualified URL

Parameters

url – a fully-qualified inference URL
payload – request body
retry – enables retrying a failed request

This module contains implementations of best practices for the interaction with other services over HTTP.

class sap.aibus.dar.client.util.http_transport.HttpMethodsProtocol(*args, **kwds)[source]

A protocol describing a basic HTTP client.

This is a Protocol to support structural subtyping via mypy. In the Java world, this would be similar to an Interface.

request(*wargs, **kwargs) → Response[source]

get(*args, **kwargs) → Response[source]

post(*args, **kwargs) → Response[source]

put(*args, **kwargs) → Response[source]

delete(*args, **kwargs) → Response[source]

patch(*args, **kwargs) → Response[source]

__init__(*args, **kwargs)

class sap.aibus.dar.client.util.http_transport.HttpMethodsMixin(*args, **kwds)[source]

A mixin dispatching common HTTP methods to a session property.

default_kwargs() → dict[source]

A default set of keyword arguments to be passed to each invocation of a HTTP method on the session.

This default implementation returns an empty dictionary.

Returns: an empty dictionary

post(*args, **kwargs)[source]

Invokes the post method with given arguments on the session.

Parameters

*args – Any args to be passed to session.post
**kwargs – Any keyword args to be passed to session.post

Returns

the return value of session.post

get(*args, **kwargs)[source]

Invokes the get method with given arguments on the session.

Args: :param *args: Any args to be passed to session.get :param **kwargs: Any keyword args to be passed to session.get

Returns: the return value of session.get

request(*args, **kwargs)[source]

Invokes the request method with given arguments on the session.

Param: *args: Any args to be passed to session.request
Param: **kwargs: Any keyword args to be passed to session.request
Returns: the return value of session.request

put(*args, **kwargs)[source]

Invokes the put method with given arguments on the session.

Parameters

*args – Any args to be passed to session.put
**kwargs – Any keyword args to be passed to session.put

Returns

the return value of session.put

delete(*args, **kwargs)[source]

Invokes the delete method with given arguments on the session.

Args: :param *args: Any args to be passed to session.delete :param **kwargs: Any keyword args to be passed to session.delete

Returns: the return value of session.delete

patch(*args, **kwargs)[source]

Invokes the patch method with given arguments on the session.

Parameters

*args – Any args to be passed to session.patch
**kwargs – Any keyword args to be passed to session.patch

Returns

the return value of session.patch

property adapters

Returns adapters of internally used session.

This is mainly useful for unit tests.

class sap.aibus.dar.client.util.http_transport.RetrySession(*args, **kwds)[source]

HTTP connection with retry built-in.

Retry is allowed for GET, PUT and DELETE HTTP method verbs.

__init__(num_retries: int, session: Optional[Session] = None, backoff_factor: float = 0.05, status_forcelist: Tuple = (413, 429, 500, 502, 503, 504))[source]

Constructor.

Parameters

num_retries – number of retries (total number of retries, as well as number of retries on connection-related, read errors, on bad statuses)
session – requests session
backoff_factor – factor that controls delay between single retry attempts
status_forcelist – a set of integer HTTP response codes that will lead to retry.

class sap.aibus.dar.client.util.http_transport.PostRetrySession(*args, **kwds)[source]

A RetrySession with retry enabled for POST requests.

This is identical to RetrySession, but enables retries for POST requests as well. POST is not retried by default in RetrySession. POST is not an Idempotent Method and is thus not guaranteed to be safe for retries.

This class should only be used with endpoints where retrying will not lead to undesired side-effects or where the side-effect is tolerable.

Note that connection-related errors which occur before the initial connection is established are always retried, no matter if the POST HTTP method is enabled for retries or not. For details, refer to the underlying implementation: see the documentation on the connect parameter in urllib3.util.retry.Retry.

See Resilience and Error Recovery for trade-offs involved here.

class sap.aibus.dar.client.util.http_transport.TimeoutSession(*args, **kwds)[source]

Session implementing timeouts to prevent HTTP connections from blocking indefinitely.

By default, the requests module does not set a timeout, resulting in connections which can take forever. This class implements a sane timeout policy.

Note that this class does not protect against slow connections: if the server sends one byte per second, the timeout will not expire (unless set to < 1s). The read timeout only applies to the intervals between data transfers.

__init__(session: Optional[HttpMethodsProtocol] = None, connect_timeout: float = 240, read_timeout: float = 240)[source]

Constructor.

Parameters

session – requests Session or compatible
connect_timeout – timeout for the connection
read_timeout – maximum time between bytes after connect

default_kwargs() → dict[source]

Implements the timeout policy.

Returns: keyword args implementing the timeout policy.

class sap.aibus.dar.client.util.http_transport.TimeoutRetrySession(*args, **kwds)[source]

A session combining timeout and retry policies.

If a request times out, it is retried.

This can be tested manually as follows:

…doctest:

>>> sess = TimeoutRetrySession(read_timeout=1)
>>> # Remove +SKIP below to execute next line
>>> sess.get('https://httpstat.us/200?sleep=2000') 
Traceback (most recent call last):
...
requests.exceptions.ConnectionError: ... Max retries exceeded with url: ...

__init__(num_retries: int = 7, connect_timeout: float = 240, read_timeout: float = 240)[source]

Constructor.

See TimeoutSession for a discussion of the values.

Args:: num_retries: Number of retries connect_timeout: connect timeout read_timeout: read timeout

class sap.aibus.dar.client.util.http_transport.TimeoutPostRetrySession(*args, **kwds)[source]

A TimeoutRetrySession which retries on POST.

This is identical to TimeoutRetrySession, but uses PostRetrySession internally to implement retries for POST.

Note that retries for POST are no always see. See the remarks on PostRetrySession.

sap.aibus.dar.client.util.http_transport.enforce_https_except_localhost(url: str)[source]

Raises HTTPSRequired exception if required.

Parameters: url – URL to be checked
Returns: None
Raises: HTTPSRequired – if given url does not start with https

Base Class for Client Classes

Shared infrastructure for microservice clients.

class sap.aibus.dar.client.base_client.BaseClient(url: str, credentials_source: CredentialsSource)[source]

Shared base class for all clients.

Contains shared class construction methods.

__init__(url: str, credentials_source: CredentialsSource)[source]

classmethod construct_from_credentials(dar_url: str, clientid: str, clientsecret: str, uaa_url: str) → DARClient[source]

Constructs a DARClient from credentials.

The credentials can be obtained from a service key. If a service key is available, see construct_from_service_key().

Parameters

dar_url – Service URL
clientid – Client ID
clientsecret – Client Secret
uaa_url – Authentication URL

Returns

the client instance

classmethod construct_from_service_key(service_key: dict) → DARClient[source]

Constructs a DARClient from a service key.

The service key should be provided as a Python dict after decoding it from JSON.

Parameters: service_key – DAR service key
Returns: the client instance

classmethod construct_from_jwt(dar_url: str, token: str) → DARClient[source]

Constructs a DARClient from service URL and a static token.

This is useful if a pre-existing token should be used instead of retrieving new tokens at runtime.

Note

Tokens expire after a certain amount of time, usually after several hours. It is preferable to use construct_from_service_key() or construct_from_credentials().

Parameters

dar_url – Service URL
token – Service token

Returns

the client instance

classmethod construct_from_cf_env() → DARClient[source]

Constructs a DARClient from service binding in a CloudFoundry app.

This is useful when the SDK is used in a CloudFoundry application on the SAP Business Technology Platform where the application is bound to an instance of the Data Attribute Recommendation service.

This constructor assumes that only one instance of the service is bound to the app. :return: the client instance

class sap.aibus.dar.client.base_client.BaseClientWithSession(url: str, credentials_source: CredentialsSource)[source]

Base class for individual microservice clients.

__init__(url: str, credentials_source: CredentialsSource)[source]

Utilities

This module contains a busy-wait polling implementation.

exception sap.aibus.dar.client.util.polling.PollingTimeoutException[source]: Exception to indicate that polling did not suceed before timeout.

class sap.aibus.dar.client.util.polling.Polling(intervall_seconds: int = 30, timeout_seconds: int = 14400)[source]

Simple busy-wait polling implementation: execute until a condition becomes true.

__init__(intervall_seconds: int = 30, timeout_seconds: int = 14400)[source]

static sleep(how_long: float) → None[source]

Sleeps for a certain amount of time.

Parameters: how_long – how long to sleep, in seconds
Returns: None

static timer() → float[source]

Returns the current timer value in seconds.

Note that this value does not necessarily correspond to the system clock or the wall clock.

The Python documentation for the internally used time.monotonic() states:

The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid.

Returns: current timer value

poll_until_success(polling_function: Callable[[], PolledItem], success_function: Callable[[PolledItem], bool]) → PolledItem[source]

Calls polling_function until success_function returns True.

The output of the polling_function will be the input to the success_function. The polling_function will be called repeatedly until the success_function returns True.

Between calls to polling_function, this method will sleep.

Parameters

polling_function – Function which retrieves an item
success_function – Function which checks item for success

Raises

PollingTimeoutException

Returns

final output of polling_function

Logging functionality.

class sap.aibus.dar.client.util.logging.LoggerMixin[source]

A log mixin. Provides a log() property.

property log: Logger

Returns a log instance for this class.

Returns: log for this class

static setup_basic_logging(debug=False) → None[source]

Initializes basic logging to stdout.

This is ideal for use in scripts to observe what actions the client library is performing.

It is not recommended to call this if the library is used in a bigger project, where usually custom logging setup is desired.

Utilities for lists.

sap.aibus.dar.client.util.lists.split_list(input_list: List[Item], slice_size: int) → Iterator[List[Item]][source]

Yields sub-lists of the input_list of size slice_size or less.

Parameters

input_list – input_list to be divided
slice_size – maximum number of sub list

Returns

a generator