Skip to content

Evaluation

Use this async workflow to start model evaluations and poll for completion.

Start async evaluation

POST /cr/finetune_model/evaluate_async

Queues evaluation and returns immediately with job_id.

FieldTypeRequiredDescription
dataset_namestringyesDataset id.
model_namestringyesModel name to evaluate.

Returns

  • 200 OK with evaluation job identifier.
{
  "job_id": "5b6273d2-c7e6-4a8f-a8cd-c4b2f9be17cd",
  "status": "queued"
}

Poll evaluation status

GET /cr/finetune_model/evaluate_status?job_id=...

Poll until status reaches completed or failed.

FieldTypeRequiredDescription
job_idquery stringyesJob id from async submit.

Returns

  • 200 OK with job status.
  • Terminal states include completed (with result) and failed (with error).
{
  "status": "completed",
  "result": {
    "dataset_name": "your_dataset_name",
    "model_name": "Generalist: Base",
    "split": "test",
    "correct": 81,
    "total": 100,
    "accuracy": 0.81,
    "judge_model": "gpt-5.4-nano"
  }
}