Vertex AI Batch Inference Setup
Set up Google Cloud Vertex AI for batch inference with the ANTS Platform AI Optimizer.
Vertex AI Batch Inference allows the AI Optimizer to evaluate model comparisons in bulk using Google Cloud's batch prediction API at reduced cost.
Prerequisites
- A Google Cloud project with Vertex AI API enabled
- A service account with appropriate permissions
- Access to the Gemini or other models you want to use for comparison
Setup
Step 1: Enable Vertex AI API
Enable the Vertex AI API in your Google Cloud project:
gcloud services enable aiplatform.googleapis.comStep 2: Create a Service Account
Create a service account for batch inference:
gcloud iam service-accounts create ants-batch-inference \
--display-name="ANTS Platform Batch Inference"Step 3: Grant Required Permissions
The service account needs the following roles:
# Vertex AI access
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:ants-batch-inference@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# Cloud Storage access (for batch input/output)
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:ants-batch-inference@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"Step 4: Create a Service Account Key
Generate a JSON key for the service account:
gcloud iam service-accounts keys create key.json \
--iam-account=ants-batch-inference@PROJECT_ID.iam.gserviceaccount.comStore this key securely. It provides access to your Google Cloud resources.
Step 5: Configure in ANTS Platform
- Go to Project Settings > LLM API Keys
- Add a new key with Google Vertex AI adapter
- Enter your credentials:
- Google Cloud Project ID
- Service Account JSON Key (paste the contents of
key.json) - Location (e.g.,
us-central1,europe-west4)
- Check "I intend to use Batch Inference for LLM optimizer"
Supported Locations
Vertex AI batch inference is available in all regions that support Gemini models. Common regions include:
| Region | Location |
|---|---|
| US Central | us-central1 |
| US East | us-east4 |
| US West | us-west1 |
| Europe West | europe-west1, europe-west4 |
| Asia Northeast | asia-northeast1 |
| Asia Southeast | asia-southeast1 |
If no location is specified, the default us-central1 is used. See the Vertex AI available locations (opens in a new tab) for the full list.
Troubleshooting
"Permission denied" errors
Verify your service account has the roles/aiplatform.user and roles/storage.objectAdmin roles.
"API not enabled" errors
Ensure the Vertex AI API is enabled:
gcloud services list --enabled | grep aiplatformBatch job failures
Check the Vertex AI console for detailed error logs. Common issues include:
- Insufficient quota for the selected model
- Invalid model ID or region mismatch
- Service account key expired or revoked
See Batch Inference Limits for provider-specific size requirements.