Vertex AI Batch Inference Setup

Set up Google Cloud Vertex AI for batch inference with the ANTS Platform AI Optimizer.

Vertex AI Batch Inference allows the AI Optimizer to evaluate model comparisons in bulk using Google Cloud's batch prediction API at reduced cost.

Prerequisites

A Google Cloud project with Vertex AI API enabled
A service account with appropriate permissions
Access to the Gemini or other models you want to use for comparison

Setup

Step 1: Enable Vertex AI API

Enable the Vertex AI API in your Google Cloud project:

gcloud services enable aiplatform.googleapis.com

Step 2: Create a Service Account

Create a service account for batch inference:

gcloud iam service-accounts create ants-batch-inference \
  --display-name="ANTS Platform Batch Inference"

Step 3: Grant Required Permissions

The service account needs the following roles:

# Vertex AI access
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:ants-batch-inference@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
 
# Cloud Storage access (for batch input/output)
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:ants-batch-inference@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

Step 4: Create a Service Account Key

Generate a JSON key for the service account:

gcloud iam service-accounts keys create key.json \
  --iam-account=ants-batch-inference@PROJECT_ID.iam.gserviceaccount.com

⚠️

Store this key securely. It provides access to your Google Cloud resources.

Step 5: Configure in ANTS Platform

Go to Project Settings > LLM API Keys
Add a new key with Google Vertex AI adapter
Enter your credentials:
- Google Cloud Project ID
- Service Account JSON Key (paste the contents of key.json)
- Location (e.g., us-central1, europe-west4)
Check "I intend to use Batch Inference for LLM optimizer"

Supported Locations

Vertex AI batch inference is available in all regions that support Gemini models. Common regions include:

Region	Location
US Central	`us-central1`
US East	`us-east4`
US West	`us-west1`
Europe West	`europe-west1`, `europe-west4`
Asia Northeast	`asia-northeast1`
Asia Southeast	`asia-southeast1`

If no location is specified, the default us-central1 is used. See the Vertex AI available locations (opens in a new tab) for the full list.

Troubleshooting

"Permission denied" errors

Verify your service account has the roles/aiplatform.user and roles/storage.objectAdmin roles.

"API not enabled" errors

Ensure the Vertex AI API is enabled:

gcloud services list --enabled | grep aiplatform

Batch job failures

Check the Vertex AI console for detailed error logs. Common issues include:

Insufficient quota for the selected model
Invalid model ID or region mismatch
Service account key expired or revoked

See Batch Inference Limits for provider-specific size requirements.

AWS Bedrock Setup Provider Limits