Docs
Batch Inference
Vertex AI Setup

Vertex AI Batch Inference Setup

Set up Google Cloud Vertex AI for batch inference with the ANTS Platform AI Optimizer.

Vertex AI Batch Inference allows the AI Optimizer to evaluate model comparisons in bulk using Google Cloud's batch prediction API at reduced cost.

Prerequisites

  • A Google Cloud project with Vertex AI API enabled
  • A service account with appropriate permissions
  • Access to the Gemini or other models you want to use for comparison

Setup

Step 1: Enable Vertex AI API

Enable the Vertex AI API in your Google Cloud project:

gcloud services enable aiplatform.googleapis.com

Step 2: Create a Service Account

Create a service account for batch inference:

gcloud iam service-accounts create ants-batch-inference \
  --display-name="ANTS Platform Batch Inference"

Step 3: Grant Required Permissions

The service account needs the following roles:

# Vertex AI access
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:ants-batch-inference@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
 
# Cloud Storage access (for batch input/output)
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:ants-batch-inference@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

Step 4: Create a Service Account Key

Generate a JSON key for the service account:

gcloud iam service-accounts keys create key.json \
  --iam-account=ants-batch-inference@PROJECT_ID.iam.gserviceaccount.com
⚠️

Store this key securely. It provides access to your Google Cloud resources.

Step 5: Configure in ANTS Platform

  1. Go to Project Settings > LLM API Keys
  2. Add a new key with Google Vertex AI adapter
  3. Enter your credentials:
    • Google Cloud Project ID
    • Service Account JSON Key (paste the contents of key.json)
    • Location (e.g., us-central1, europe-west4)
  4. Check "I intend to use Batch Inference for LLM optimizer"

Supported Locations

Vertex AI batch inference is available in all regions that support Gemini models. Common regions include:

RegionLocation
US Centralus-central1
US Eastus-east4
US Westus-west1
Europe Westeurope-west1, europe-west4
Asia Northeastasia-northeast1
Asia Southeastasia-southeast1

If no location is specified, the default us-central1 is used. See the Vertex AI available locations (opens in a new tab) for the full list.

Troubleshooting

"Permission denied" errors

Verify your service account has the roles/aiplatform.user and roles/storage.objectAdmin roles.

"API not enabled" errors

Ensure the Vertex AI API is enabled:

gcloud services list --enabled | grep aiplatform

Batch job failures

Check the Vertex AI console for detailed error logs. Common issues include:

  • Insufficient quota for the selected model
  • Invalid model ID or region mismatch
  • Service account key expired or revoked

See Batch Inference Limits for provider-specific size requirements.