Sync Cloud Storage

Connect your existing cloud storage bucket to Roset, sync file metadata, and auto-process files through the transformation pipeline. Roset reads your bucket's file listing without copying or proxying bytes -- your data stays in your storage.

The Workflow

Your Bucket --> Connect --> Sync Metadata --> Process Files --> Webhook Notification

Connect your S3, GCS, Azure, or MinIO bucket
Sync file metadata from the bucket into Roset
Process synced files through the transformation pipeline
Get notified via webhook when transformations complete

Supported Providers

Provider	Setup Method	Bucket Types
AWS	CloudFormation role	S3, S3-compatible
GCP	Service Account	GCS buckets
Azure	Service Principal	Blob containers
MinIO	Access Key	MinIO buckets
Cloudflare R2	Access Key	R2 buckets
Supabase Storage	Access Key	Supabase buckets

Step 1: Create a Connection

python

import os
from roset import Client
 
client = Client(api_key=os.getenv("ROSET_API_KEY"))
 
# Link an S3 bucket to Roset
connection = client.connections.create(
    provider="s3",
    name="Production Bucket",
    bucket="my-company-files",
    region="us-east-1",
    prefix="uploads/",    # only sync files under this prefix
)
print(f"Connection: {connection['id']}")

Step 2: Test the Connection

Verify that Roset can reach your bucket and has the required permissions.

python

result = client.connections.test(connection["id"])
print(result)  # {"success": True}

Step 3: Sync Files

Syncing enumerates files from your bucket and creates node records in Roset. This is a metadata-only operation -- Roset reads the file listing but does not transfer any file bytes.

python

sync = client.connections.sync(connection["id"])
print(f"Synced {sync['synced_count']} files")

Step 4: Browse Synced Files

After syncing, browse files through the nodes API. Nodes mirror your bucket's directory structure.

python

# List nodes from the synced connection
result = client.nodes.list(connection_id=connection["id"])
for node in result["nodes"]:
    print(f"{node['type']} {node['name']} ({node['size_bytes']} bytes)")
 
# Get a signed download URL (direct from your bucket)
download = client.nodes.download("node-xyz")
print(download["url"])

Step 5: Set Up Webhooks

python

webhook = client.webhooks.create(
    url="https://example.com/roset-webhook",
    events=[
        "connection.synced",
        "file.processing.completed",
        "file.processing.failed",
    ],
)
print(f"Webhook: {webhook['id']}")
print(f"Secret: {webhook['secret']}")  # Save this for signature verification

Manage Connections

python

# List all connections
result = client.connections.list()
 
# Get a specific connection
conn = client.connections.get("conn-abc123")
 
# Delete a connection (your bucket is unaffected)
client.connections.delete("conn-abc123")

Metadata Only

Deleting a connection removes the metadata records from Roset. Your files in the cloud storage bucket are never modified or deleted by Roset.

Build this with AI

Roset (roset.dev) is a developer API that transforms unstructured files into structured data (markdown, embeddings, metadata). It can connect to cloud storage buckets and auto-process files. Python SDK: `pip install roset`. Init: `from roset import Client; client = Client(api_key='rsk_...')`. Connect bucket: `client.connections.create(provider='aws', bucket='my-bucket', region='us-east-1')`. Sync files: `client.connections.sync(connection_id)`. Process synced files: `client.files.process(file_id)`. Webhooks: `client.webhooks.create(url='https://myapp.com/webhook', events=['file.completed'])`. Write me Python code that connects an S3 bucket, syncs file metadata, triggers processing on all synced files, and sets up a webhook for completion notifications.

Next Steps

Transform Any File -- understand the transformation pipeline that processes synced files.
Webhooks -- deep dive on event types and signature verification.
API Reference -- full connection and node endpoint documentation.

Sync Cloud Storage

The Workflow#

Supported Providers#

Step 1: Create a Connection#

Step 2: Test the Connection#

Step 3: Sync Files#

Step 4: Browse Synced Files#

Step 5: Set Up Webhooks#

Manage Connections#

Next Steps#

The Workflow

Supported Providers

Step 1: Create a Connection

Step 2: Test the Connection

Step 3: Sync Files

Step 4: Browse Synced Files

Step 5: Set Up Webhooks

Manage Connections

Next Steps