Variant Types

Variants are the structured outputs of Roset's transformation pipeline. Each variant is a different representation of the same source file, linked by lineage. When you upload a file, Roset produces up to 4 variant types automatically.

Overview

Type	What it contains	Produced by	Use case
`markdown`	Extracted text in markdown format	Reducto, Gemini, Whisper	Display, downstream processing
`embeddings`	Vector embeddings for each chunk	OpenAI	Semantic search, RAG
`metadata`	Page count, language, confidence	Extraction provider	Filtering, quality checks
`searchable-index`	Full-text search index	Roset	Keyword search

Markdown

The primary extraction output. Contains the full text content of the file converted to markdown format.

Fields:

Field	Type	Description
`content`	string	The extracted text in markdown
`pageCount`	number	Number of pages extracted
`wordCount`	number	Total word count
`characterCount`	number	Total character count

When it's produced: For every file that has textual content. Documents (PDF, DOCX) via Reducto, images via Gemini (OCR), audio via Whisper (transcription).

python

markdown = client.files.get_variant(file_id, "markdown")
print(f"Pages: {markdown.get('pageCount')}")
print(f"Words: {markdown.get('wordCount')}")
print(markdown["content"][:500])

Embeddings

Vector embeddings generated from the extracted text, chunked for semantic search and RAG.

Fields:

Field	Type	Description
`chunks`	array	Array of chunk objects with text and vector
`model`	string	Embedding model used (e.g., `text-embedding-3-small`)
`dimensions`	number	Vector dimensions (e.g., 1536)
`totalChunks`	number	Number of chunks generated

When it's produced: After markdown extraction completes, if an OpenAI key is available (managed by default).

python

embeddings = client.files.get_variant(file_id, "embeddings")
print(f"Model: {embeddings.get('model')}")
print(f"Chunks: {embeddings.get('totalChunks')}")
print(f"Dimensions: {embeddings.get('dimensions')}")

Metadata

Extraction metadata including page count, detected language, and quality signals.

Fields:

Field	Type	Description
`pageCount`	number	Number of pages in the source file
`language`	string	Detected language (ISO 639-1)
`extractionConfidence`	number	Confidence score (0--1)
`qualityWarnings`	string[]	Any quality issues detected

When it's produced: Alongside the markdown variant during extraction.

python

metadata = client.files.get_variant(file_id, "metadata")
print(f"Language: {metadata.get('language')}")
print(f"Confidence: {metadata.get('extractionConfidence')}")
if metadata.get("qualityWarnings"):
    print(f"Warnings: {', '.join(metadata['qualityWarnings'])}")

Searchable Index

A full-text search index built from the extracted content. Powers text and hybrid search modes.

Fields:

Field	Type	Description
`indexedAt`	string	When the index was last built
`termCount`	number	Number of unique terms indexed
`segmentCount`	number	Number of text segments

When it's produced: After markdown extraction completes. Used internally by the search API.

python

index = client.files.get_variant(file_id, "searchable-index")
print(f"Terms: {index.get('termCount')}")
print(f"Segments: {index.get('segmentCount')}")
print(f"Indexed at: {index.get('indexedAt')}")

List All Variants for a File

python

result = client.files.list_variants(file_id)
for v in result["variants"]:
    print(f"  {v['type']}: {v['size_bytes']} bytes (provider: {v.get('provider', 'roset')})")

Selective Variant Generation

You can request only specific variant types when uploading:

python

# Only generate markdown and embeddings
file = client.files.upload(
    filename="report.pdf",
    content_type="application/pdf",
    size_bytes=45678,
    variants=["markdown", "embeddings"],
)

Next Steps

Transform Any File -- the complete transformation workflow.
Search -- how variants power search.
API Reference -- full variant endpoint documentation.

Variant Types

Overview#

Markdown#

Embeddings#

Metadata#

Searchable Index#

List All Variants for a File#

Selective Variant Generation#

Next Steps#

Overview

Markdown

Embeddings

Metadata

Searchable Index

List All Variants for a File

Selective Variant Generation

Next Steps