Inference¶
POST /cr/gcloud_predict_firebase_base_oss
Primary prediction endpoint. Supports streaming responses and optional RAG retrieval.
| Field | Type | Required | Description |
|---|---|---|---|
prompt | string | yes | User prompt text. |
query_only | string | yes | Query form for routing/retrieval. |
model_name | string | no | Model name from /cr/get_allowed_models. |
stream | boolean | no | Defaults to true. |
use_rag | boolean | no | Enable retrieval context. |
sequential_budget | integer | no | Sequential reasoning budget. |
parallel_budget | integer | no | Parallel reasoning budget. |
temperature | number | no | Sampling temperature. |
Returns
200 OKas a streaming SSE response whenstream=true.- Events typically include
tokenchunks and a finalfinal_resultpayload.
{"type":"final_result","predictions":{"0":{"answers":["..."],"thoughtTraces":["..."]}}}