Full Pipeline¶
This guide walks through the full lifecycle: authenticate, discover available models, run streaming inference in your application, build a custom finetuned model from your files, evaluate quality, and deploy the resulting artifact where you need it.
Authentication header¶
All documented endpoints require:
X-API-Key: <your_api_key>
Check allowed models¶
Once you have your API key, first check which models your account can access.
curl -s "https://cr-api.icosacomputing.com/cr/get_allowed_models" \
-H "X-API-Key: <your_api_key>"
Create your own model from files¶
To specialize a model for your domain, submit source files (such as PDFs and DOCX files) to the finetuning workflow. The API reads your files, constructs an appropriate training dataset, and tunes a LoRA adapter on the base model you select. This gives you a model behavior aligned to your content and use case.
curl -s -X POST "https://cr-api.icosacomputing.com/cr/generate_questions" \
-H "X-API-Key: <your_api_key>" \
-F "files=@/absolute/path/to/file1.pdf" \
-F "files=@/absolute/path/to/file2.docx" \
-F "numQs=1000" \
-F "mode=pretraining" \
-F "modelName=my-custom-model" \
-F "modelType=openai/gpt-oss-20b" \
-F "use_case=an accurate enterprise assistant"
Then poll until the job is done:
curl -s "https://cr-api.icosacomputing.com/cr/generate_questions_status?job_id=<job_id>" \
-H "X-API-Key: <your_api_key>"
Run inference¶
Inference supports streaming responses ("stream": true), so tokens can arrive progressively instead of waiting for one final payload. This makes it straightforward to wire the endpoint into interactive UIs, chat surfaces, and backend pipelines that need partial output as soon as it is available.
curl -s -X POST "https://cr-api.icosacomputing.com/cr/gcloud_predict_firebase_base_oss" \
-H "Content-Type: application/json" \
-H "X-API-Key: <your_api_key>" \
-d '{
"prompt": "Summarize the main point in one sentence.",
"query_only": "Summarize the main point in one sentence.",
"model_name": "Generalist: Base",
"stream": true,
"use_rag": false,
"sequential_budget": 1,
"parallel_budget": 1,
"temperature": 0.0
}'
Run evaluation¶
Once your model pipeline is ready, run async evaluation to measure how well it performs against your dataset. This is the quality gate before wider rollout.
curl -s -X POST "https://cr-api.icosacomputing.com/cr/finetune_model/evaluate_async" \
-H "Content-Type: application/json" \
-H "X-API-Key: <your_api_key>" \
-d '{
"dataset_name": "your_dataset_name",
"model_name": "Generalist: Base"
}'
Then poll until completion:
curl -s "https://cr-api.icosacomputing.com/cr/finetune_model/evaluate_status?job_id=<job_id>" \
-H "X-API-Key: <your_api_key>"
Download and use your model¶
After evaluation, you can use the model directly on the Icosa platform at chat.icosa.co, or download the artifact and integrate it into your own infrastructure and applications.
curl -L "https://cr-api.icosacomputing.com/cr/finetune_model/download?model_name=<model_name>" \
-H "X-API-Key: <your_api_key>" \
-o "model.gguf"