Quickstart
Upload a document, let Roset orchestrate extraction, and retrieve structured markdown -- all in under 5 minutes. This guide walks through the complete flow: upload, wait for processing, and retrieve variants.
Prerequisites
Before you start, make sure you have:
- A Roset account at console.roset.dev
- An API key (starts with
rsk_) from Settings > API Keys - Python 3.9+ or Node.js 18+
Step 1: Install the SDK
pip install rosetStep 2: Upload a File
Upload a document to Roset. The API creates a file record and a processing job automatically. Roset routes the file to the appropriate extraction provider based on content type.
import os
from roset import Client
client = Client(api_key=os.getenv("ROSET_API_KEY"))
# Upload a PDF -- Roset routes it to Reducto for extraction
file = client.files.upload(
filename="contract.pdf",
content_type="application/pdf",
size_bytes=45678,
)
print(f"Uploaded: {file['id']}, Job: {file['job_id']}")Step 3: Wait for Processing
Roset processes files asynchronously. The processing job moves through a state machine: queued -> processing -> completed or failed. Poll the file status until it reaches a terminal state.
import time
status = file["status"]
while status not in ("completed", "failed"):
current = client.files.get(file["id"])
status = current["status"]
print(f"Status: {status}")
if status not in ("completed", "failed"):
time.sleep(2)For production use, register a webhook instead of polling. Roset will POST to your endpoint when processing completes.
Step 4: Retrieve Results
Once processing completes, the extracted content is available as variants on the file. Variants are the outputs of the extraction pipeline -- typically markdown and optionally vector embeddings.
# List all variants (markdown, embeddings, etc.)
result = client.files.list_variants(file["id"])
for v in result["variants"]:
print(f"{v['type']}: {v['size_bytes']} bytes")
# Retrieve the markdown variant specifically
markdown = client.files.get_variant(file["id"], "markdown")
print(markdown["content"][:500])What Happened
- You uploaded a document to Roset.
- Roset created a file metadata record and a processing job.
- The job was routed to Reducto (for PDFs/documents), Gemini (for images), or Whisper (for audio) based on content type.
- The extraction provider returned structured markdown, which Roset stored as a variant on the file.
- Vector embeddings were generated via OpenAI as a second variant.
Roset never touched the file bytes directly -- it orchestrated the extraction pipeline and stored the resulting metadata.
Next Steps
- Transform Any File -- understand the full transformation workflow.
- Sync Cloud Storage -- connect your S3, GCS, or Azure Blob Storage buckets.
- Webhooks -- get notified when processing completes instead of polling.
- Python SDK -- full Python client reference.
- TypeScript SDK -- full TypeScript client reference.
- API Reference -- complete endpoint documentation.