Skip to content

Sync Cloud Storage

Connect your existing cloud storage bucket to Roset, sync file metadata, and auto-process files through the transformation pipeline. Roset reads your bucket's file listing without copying or proxying bytes -- your data stays in your storage.

The Workflow

Your Bucket --> Connect --> Sync Metadata --> Process Files --> Webhook Notification
  1. Connect your S3, GCS, Azure, or MinIO bucket
  2. Sync file metadata from the bucket into Roset
  3. Process synced files through the transformation pipeline
  4. Get notified via webhook when transformations complete

Supported Providers

ProviderSetup MethodBucket Types
AWSCloudFormation roleS3, S3-compatible
GCPService AccountGCS buckets
AzureService PrincipalBlob containers
MinIOAccess KeyMinIO buckets
Cloudflare R2Access KeyR2 buckets
Supabase StorageAccess KeySupabase buckets

Step 1: Create a Connection

python
import os
from roset import Client
 
client = Client(api_key=os.getenv("ROSET_API_KEY"))
 
# Link an S3 bucket to Roset
connection = client.connections.create(
    provider="s3",
    name="Production Bucket",
    bucket="my-company-files",
    region="us-east-1",
    prefix="uploads/",    # only sync files under this prefix
)
print(f"Connection: {connection['id']}")

Step 2: Test the Connection

Verify that Roset can reach your bucket and has the required permissions.

python
result = client.connections.test(connection["id"])
print(result)  # {"success": True}

Step 3: Sync Files

Syncing enumerates files from your bucket and creates node records in Roset. This is a metadata-only operation -- Roset reads the file listing but does not transfer any file bytes.

python
sync = client.connections.sync(connection["id"])
print(f"Synced {sync['synced_count']} files")

Step 4: Browse Synced Files

After syncing, browse files through the nodes API. Nodes mirror your bucket's directory structure.

python
# List nodes from the synced connection
result = client.nodes.list(connection_id=connection["id"])
for node in result["nodes"]:
    print(f"{node['type']} {node['name']} ({node['size_bytes']} bytes)")
 
# Get a signed download URL (direct from your bucket)
download = client.nodes.download("node-xyz")
print(download["url"])

Step 5: Set Up Webhooks

Register a webhook to get notified when syncs complete and files finish processing.

python
webhook = client.webhooks.create(
    url="https://example.com/roset-webhook",
    events=[
        "connection.synced",
        "file.processing.completed",
        "file.processing.failed",
    ],
)
print(f"Webhook: {webhook['id']}")
print(f"Secret: {webhook['secret']}")  # Save this for signature verification

Manage Connections

python
# List all connections
result = client.connections.list()
 
# Get a specific connection
conn = client.connections.get("conn-abc123")
 
# Delete a connection (your bucket is unaffected)
client.connections.delete("conn-abc123")
Metadata Only

Deleting a connection removes the metadata records from Roset. Your files in the cloud storage bucket are never modified or deleted by Roset.

Build this with AI
Roset (roset.dev) is a developer API that transforms unstructured files into structured data (markdown, embeddings, metadata). It can connect to cloud storage buckets and auto-process files. Python SDK: `pip install roset`. Init: `from roset import Client; client = Client(api_key='rsk_...')`. Connect bucket: `client.connections.create(provider='aws', bucket='my-bucket', region='us-east-1')`. Sync files: `client.connections.sync(connection_id)`. Process synced files: `client.files.process(file_id)`. Webhooks: `client.webhooks.create(url='https://myapp.com/webhook', events=['file.completed'])`. Write me Python code that connects an S3 bucket, syncs file metadata, triggers processing on all synced files, and sets up a webhook for completion notifications.

Next Steps

  • Transform Any File -- understand the transformation pipeline that processes synced files.
  • Webhooks -- deep dive on event types and signature verification.
  • API Reference -- full connection and node endpoint documentation.