Using lakeFS with Amazon SageMaker¶

Amazon SageMaker helps to prepare, build, train and deploy ML models quickly by bringing together a broad set of capabilities purpose-built for ML.

Initializing session and client¶

Initialize a Sagemaker session and an S3 client with lakeFS as the endpoint:

```python import sagemaker import boto3

endpoint_url = '' aws_access_key_id = '' aws_secret_access_key = '' repo = 'example-repo'

sm = boto3.client('sagemaker', endpoint_url=endpoint_url, aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)

s3_resource = boto3.resource('s3', endpoint_url=endpoint_url, aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)

session = sagemaker.Session(boto3.Session(), sagemaker_client=sm, default_bucket=repo) session.s3_resource = s3_resource ```

Usage Examples¶

Upload train and test data¶

Let's use the created session for uploading data to the 'main' branch:

```python prefix = "/prefix-within-branch" branch = 'main'

train_file = 'train_data.csv'; train_data.to_csv(train_file, index=False, header=True) train_data_s3_path = session.upload_data(path=train_file, key_prefix=branch + prefix + "/train")

test_file = 'test_data.csv'; test_data_no_target.to_csv(test_file, index=False, header=False) test_data_s3_path = session.upload_data(path=test_file, key_prefix=branch + prefix + "/test") ```

Download objects¶

You can use the integration with lakeFS to download a portion of the data you see fit:

```python repo = 'example-repo' prefix = "/prefix-to-download" branch = 'main' localpath = './' + branch

session.download_data(path=localpath, bucket=repo, key_prefix = branch + prefix) ```

Note

Advanced AWS SageMaker features, like Autopilot jobs, are encapsulated and don't have the option to override the S3 endpoint. However, it is possible to export the required inputs from lakeFS to S3.

If you're using SageMaker features that aren't supported by lakeFS, we'd love to hear from you.