Skip to content

Quickstart

Upload a document, let Roset orchestrate extraction, and retrieve structured markdown -- all in under 5 minutes. This guide walks through the complete flow: upload, wait for processing, and retrieve variants.

Prerequisites

Before you start, make sure you have:

  1. A Roset account at console.roset.dev
  2. An API key (starts with rsk_) from Settings > API Keys
  3. Python 3.9+ or Node.js 18+

Step 1: Install the SDK

bash
pip install roset

Step 2: Upload a File

Upload a document to Roset. The API creates a file record and a processing job automatically. Roset routes the file to the appropriate extraction provider based on content type.

python
import os
from roset import Client
 
client = Client(api_key=os.getenv("ROSET_API_KEY"))
 
# Upload a PDF -- Roset routes it to Reducto for extraction
file = client.files.upload(
    filename="contract.pdf",
    content_type="application/pdf",
    size_bytes=45678,
)
 
print(f"Uploaded: {file['id']}, Job: {file['job_id']}")

Step 3: Wait for Processing

Roset processes files asynchronously. The processing job moves through a state machine: queued -> processing -> completed or failed. Poll the file status until it reaches a terminal state.

python
import time
 
status = file["status"]
while status not in ("completed", "failed"):
    current = client.files.get(file["id"])
    status = current["status"]
    print(f"Status: {status}")
    if status not in ("completed", "failed"):
        time.sleep(2)
Note

For production use, register a webhook instead of polling. Roset will POST to your endpoint when processing completes.

Step 4: Retrieve Results

Once processing completes, the extracted content is available as variants on the file. Variants are the outputs of the extraction pipeline -- typically markdown and optionally vector embeddings.

python
# List all variants (markdown, embeddings, etc.)
result = client.files.list_variants(file["id"])
for v in result["variants"]:
    print(f"{v['type']}: {v['size_bytes']} bytes")
 
# Retrieve the markdown variant specifically
markdown = client.files.get_variant(file["id"], "markdown")
print(markdown["content"][:500])

What Happened

  1. You uploaded a document to Roset.
  2. Roset created a file metadata record and a processing job.
  3. The job was routed to Reducto (for PDFs/documents), Gemini (for images), or Whisper (for audio) based on content type.
  4. The extraction provider returned structured markdown, which Roset stored as a variant on the file.
  5. Vector embeddings were generated via OpenAI as a second variant.

Roset never touched the file bytes directly -- it orchestrated the extraction pipeline and stored the resulting metadata.

Next Steps