Exercise guide — refer to the official documentation for full details.
Inference workloads serve trained models as API endpoints. NVIDIA Run:ai manages scaling, GPU allocation, and routing for inference services.
Navigate to Workload Manager > Workloads
Click + New Workload and select Inference
Select a project
Select Start from Scratch
Select NVIDIA NIM as the inference type
Fill in:
| Field | Value |
|---|---|
| Name | nim-llama3-8b-instruct |
Click Continue
Select:
| Field | Value |
|---|---|
| Model | meta/llama3-8b-instruct |
| Source | Shared secret |
| Type | NGC API Key |
| Secret name | genericsecret-ngcgs-ngc-private-registry |
Select the one-gpu compute resource.
!!! note
Inference workloads can use fractional GPUs (e.g., 0.25 GPU) for lightweight models.
!!! tip
Inference workloads in NVIDIA Run:ai support autoscaling based on request volume. See the official docs for configuration.
Once the inference workload is Running, you can test it by sending an HTTP request to the NIM via a CURL command in your terminal.
[URL_Address] with the URL you copied in step 3.curl -X 'POST' \
'https://[URL_Address]/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta/llama3-8b-instruct",
"messages": [{"role":"user", "content":"Who was Isaac Newton?"}],
"max_tokens": 64
}'
Once the inference workload is Running, you can test it by creating a Chatbot Workspace.
Navigate to Workload Manager > Workloads
Click on the field under the Connection(s) column
Copy the URL in the Address column and save it
Click + New Workload and select Workspace
Select your cluster and project
Select Start from Scratch
Fill in:
| Field | Value |
|---|---|
| Name | chatbot |
Click Continue
Select the chatbot-ui environment
Click on Runtime Settings
Fill in:
| Field | Value |
|---|---|
| RUNAI_MODEL_NAME | meta/llama3-8b-instruct |
| RUNAI_MODEL_BASE_URL | [URL_Address] you copied in step 3 |
Select the cpu-only compute resource
Click Create Worspace
Wait for the workload to reach Running status
Select your workspace from the workload list
Click Connect
Chatbot opens in a new browser tab
https://wertwer-runai-omega-project-4-gpus-inf-juand-run.nvacademy.dev
curl -X 'POST'
'https://wertwer-runai-omega-project-4-gpus-inf-juand-run.nvacademy.dev/v1/chat/completions'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"model": "meta/llama3-8b-instruct",
"messages": [{"role":"user", "content":"Who was Isaac Newton?"}],
"max_tokens": 64
}'