knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>",
  eval = identical(tolower(Sys.getenv("LLMR_RUN_VIGNETTES", "false")), "true") )OpenAI-compatible (OpenAI, Groq, Together, x.ai,
DeepSeek)
Chat Completions accept a response_format (e.g.,
{"type":"json_object"} or a JSON-Schema payload).
Enforcement varies by provider but the interface is OpenAI-shaped.
See OpenAI
API overview, Groq API
(OpenAI-compatible), Together: OpenAI
compatibility, x.ai: OpenAI API
schema, DeepSeek:
OpenAI-compatible endpoint
Anthropic (Claude)
No global “JSON mode.” Instead, you define a tool with
an input_schema (JSON Schema) and
force it via tool_choice, so the model
must return a JSON object that validates the schema.
See Anthropic
Messages API: tools & input_schema
Google Gemini (REST)
Set responseMimeType = "application/json" in
generationConfig to request JSON. Some models also accept
responseSchema for constrained JSON
(model-dependent).
See Gemini
documentation —
llm_parse_structured() strips fences and extracts the
largest balanced {...} or
[...] before parsing.llm_parse_structured_col() hoists fields (supports
dot/bracket paths and JSON Pointer) and keeps non-scalars as
list-columns.llm_validate_structured_col() validates locally via
jsonvalidate (AJV).enable_structured_output() flips the right provider
switch (OpenAI-compat response_format, Anthropic
tool + input_schema, Gemini
responseMimeType/responseSchema).All chunks use a tiny helper so your document knits even without API keys.
safe({
  library(LLMR)
  cfg <- llm_config(
    provider = "openai",                # try "groq" or "together" too
    model    = "gpt-4.1-nano",
    temperature = 0
  )
  # Flip JSON mode on (OpenAI-compat shape)
  cfg_json <- enable_structured_output(cfg, schema = NULL)
  res    <- call_llm(cfg_json, 'Give me a JSON object {"ok": true, "n": 3}.')
  parsed <- llm_parse_structured(res)
  cat("Raw text:\n", as.character(res), "\n\n")
  str(parsed)
})What could still fail? Proxies labeled
“OpenAI-compatible” sometimes accept response_format but
don’t strictly enforce it; LLMR’s parser recovers from fences or
pre/post text.
Groq serves Qwen 2.5 Instruct models with OpenAI-compatible APIs.
Their Structured Outputs feature enforces JSON Schema
and (notably) expects all properties to be listed under
required.
safe({
  library(LLMR); library(dplyr)
  # Schema: make every property required to satisfy Groq's stricter check
  schema <- list(
    type = "object",
    additionalProperties = FALSE,
    properties = list(
      title = list(type = "string"),
      year  = list(type = "integer"),
      tags  = list(type = "array", items = list(type = "string"))
    ),
    required = list("title","year","tags")
  )
  cfg <- llm_config(
    provider = "groq",
    model    = "qwen-2.5-72b-instruct",   # a Qwen Instruct model on Groq
    temperature = 0
  )
  cfg_strict <- enable_structured_output(cfg, schema = schema, strict = TRUE)
  df  <- tibble(x = c("BERT paper", "Vision Transformers"))
  out <- llm_fn_structured(
    df,
    prompt   = "Return JSON about '{x}' with fields title, year, tags.",
    .config  = cfg_strict,
    .schema  = schema,          # send schema to provider
    .fields  = c("title","year","tags"),
    .validate_local = TRUE
  )
  out %>% select(structured_ok, structured_valid, title, year, tags) %>% print(n = Inf)
})If your key is set, you should see structured_ok = TRUE,
structured_valid = TRUE, plus parsed columns.
Common gotcha: If Groq returns a 400 error
complaining about required, ensure all
properties are listed in the required array.
Groq’s structured output implementation is stricter than OpenAI’s.
max_tokens)safe({
  library(LLMR)
  schema <- list(
    type="object",
    properties=list(answer=list(type="string"), confidence=list(type="number")),
    required=list("answer","confidence"),
    additionalProperties=FALSE
  )
  cfg <- llm_config("anthropic","claude-3-5-haiku-latest", temperature = 0)
  cfg <- enable_structured_output(cfg, schema = schema, name = "llmr_schema")
  res <- call_llm(cfg, c(
    system = "Return only the tool result that matches the schema.",
    user   = "Answer: capital of Japan; include confidence in [0,1]."
  ))
  parsed <- llm_parse_structured(res)
  str(parsed)
})Anthropic requires
max_tokens; LLMR warns and defaults if you omit it.
safe({
  library(LLMR)
  cfg <- llm_config(
    "gemini", "gemini-2.5-flash-lite",
    response_mime_type = "application/json"  # ask for JSON back
    # Optionally: gemini_enable_response_schema = TRUE, response_schema = <your JSON Schema>
  )
  res <- call_llm(cfg, c(
    system = "Reply as JSON only.",
    user   = "Produce fields name and score about 'MNIST'."
  ))
  str(llm_parse_structured(res))
})safe({
  library(LLMR); library(tibble)
  messy <- c(
    '```json\n{"x": 1, "y": [1,2,3]}\n```',
    'Sure! Here is JSON: {"x":"1","y":"oops"} trailing words',
    '{"x":1, "y":[2,3,4]}'
  )
  tibble(response_text = messy) |>
    llm_parse_structured_col(
      fields = c(x = "x", y = "/y/0")   # dot/bracket or JSON Pointer
    ) |>
    print(n = Inf)
})Why this helps Works when outputs arrive fenced,
with pre/post text, or when arrays sneak in. Non-scalars become
list-columns (set allow_list = FALSE to force scalars
only).
For production ETL workflows, combine schema validation with parallelization:
library(LLMR); library(dplyr)
cfg_with_schema = llm_config('openai','gpt-4.1-nano')
  
setup_llm_parallel(workers = 10)
### Assuming there is a large data frame large_df
large_df |>
  llm_mutate_structured(
    result,
    prompt = "Extract: {text}",
    .config = cfg_with_schema,
    .schema = schema,
    .fields = c("label", "score"),
    tries = 3  # auto-retry failures
  )
reset_llm_parallel()This processes thousands of rows efficiently with automatic retries and validation.
enable_structured_output() and run
llm_parse_structured() + local validation.input_schema: https://docs.claude.com/en/api/messages#body-tool-choice