Sync Cloud Storage
Connect your existing cloud storage bucket to Roset, sync file metadata, and auto-process files through the transformation pipeline. Roset reads your bucket's file listing without copying or proxying bytes -- your data stays in your storage.
The Workflow
Your Bucket --> Connect --> Sync Metadata --> Process Files --> Webhook Notification
- Connect your S3, GCS, Azure, or MinIO bucket
- Sync file metadata from the bucket into Roset
- Process synced files through the transformation pipeline
- Get notified via webhook when transformations complete
Supported Providers
| Provider | Setup Method | Bucket Types |
|---|---|---|
| AWS | CloudFormation role | S3, S3-compatible |
| GCP | Service Account | GCS buckets |
| Azure | Service Principal | Blob containers |
| MinIO | Access Key | MinIO buckets |
| Cloudflare R2 | Access Key | R2 buckets |
| Supabase Storage | Access Key | Supabase buckets |
Step 1: Create a Connection
import os
from roset import Client
client = Client(api_key=os.getenv("ROSET_API_KEY"))
# Link an S3 bucket to Roset
connection = client.connections.create(
provider="s3",
name="Production Bucket",
bucket="my-company-files",
region="us-east-1",
prefix="uploads/", # only sync files under this prefix
)
print(f"Connection: {connection['id']}")Step 2: Test the Connection
Verify that Roset can reach your bucket and has the required permissions.
result = client.connections.test(connection["id"])
print(result) # {"success": True}Step 3: Sync Files
Syncing enumerates files from your bucket and creates node records in Roset. This is a metadata-only operation -- Roset reads the file listing but does not transfer any file bytes.
sync = client.connections.sync(connection["id"])
print(f"Synced {sync['synced_count']} files")Step 4: Browse Synced Files
After syncing, browse files through the nodes API. Nodes mirror your bucket's directory structure.
# List nodes from the synced connection
result = client.nodes.list(connection_id=connection["id"])
for node in result["nodes"]:
print(f"{node['type']} {node['name']} ({node['size_bytes']} bytes)")
# Get a signed download URL (direct from your bucket)
download = client.nodes.download("node-xyz")
print(download["url"])Step 5: Set Up Webhooks
Register a webhook to get notified when syncs complete and files finish processing.
webhook = client.webhooks.create(
url="https://example.com/roset-webhook",
events=[
"connection.synced",
"file.processing.completed",
"file.processing.failed",
],
)
print(f"Webhook: {webhook['id']}")
print(f"Secret: {webhook['secret']}") # Save this for signature verificationManage Connections
# List all connections
result = client.connections.list()
# Get a specific connection
conn = client.connections.get("conn-abc123")
# Delete a connection (your bucket is unaffected)
client.connections.delete("conn-abc123")Deleting a connection removes the metadata records from Roset. Your files in the cloud storage bucket are never modified or deleted by Roset.
Next Steps
- Transform Any File -- understand the transformation pipeline that processes synced files.
- Webhooks -- deep dive on event types and signature verification.
- API Reference -- full connection and node endpoint documentation.