Back

How to Build a Google Workspace AI Assistant

May 29, 2026
How to Build a Google Workspace AI Assistant

How to Build a Google Workspace AI Assistant

A Google Workspace AI assistant can help users search Drive, summarize Docs, prepare Calendar updates, draft Gmail replies, and route work across Workspace apps. For engineering teams, the hard part is not calling the Gemini API or the Gmail API. The hard part is building a system that respects Workspace permissions, avoids unsafe actions, keeps audit logs, and behaves predictably in production.

This guide walks through a practical architecture for a Workspace assistant that uses LLMs, Google APIs, scoped OAuth, tool calls, traces, and evals.

Define the assistant’s job before you write code

Start with a narrow set of workflows. A broad assistant that can “do anything in Workspace” is hard to secure and harder to evaluate.

Good first use cases:

  • Search Drive for documents related to a project.
  • Summarize a Google Doc for a specific user.
  • Find open Calendar slots for a meeting.
  • Draft, but not send, a Gmail reply.
  • Create a meeting agenda from a set of Docs and recent emails.

Riskier use cases that need extra controls:

  • Sending emails.
  • Changing Calendar events with external guests.
  • Sharing Drive files.
  • Deleting messages, files, or events.
  • Reading large amounts of email or Drive content without user intent.

A simple rule works well: let the assistant read first, draft second, and mutate only after explicit user confirmation.

A production Workspace assistant usually has these parts:

  • Frontend: Chat UI, Slack app, Chrome extension, internal dashboard, or embedded Workspace add-on.
  • Backend API: Authenticates users, stores conversation state, manages tool calls, and logs traces.
  • OAuth layer: Requests narrow Google Workspace scopes and stores encrypted tokens.
  • Tool service: Wraps Gmail, Drive, Docs, Calendar, and Admin SDK calls behind typed functions.
  • LLM layer: Generates plans, calls tools, and returns structured responses.
  • Policy layer: Blocks unsafe actions, enforces approvals, and applies tenant rules.
  • Observability and evals: Captures prompts, tool inputs, outputs, user approvals, errors, and quality scores.

If your assistant uses Gemini, keep model configuration, prompt versions, traces, and evals in one place. PromptLayer supports Google Gemini workflows for teams that need versioning and production visibility around LLM calls.

Set up Google Cloud and Workspace access

1. Create a Google Cloud project

Create a dedicated Google Cloud project for the assistant. Do not reuse a random developer project. A separate project gives you cleaner API quotas, audit trails, OAuth configuration, and access control.

Capture screenshots during setup for your runbook:

  • OAuth consent screen settings.
  • Enabled APIs.
  • Configured OAuth scopes.
  • OAuth client ID settings.
  • Workspace admin approval, if used.

2. Enable only the APIs you need

For a typical assistant, you may need:

  • Google Drive API.
  • Google Docs API.
  • Gmail API.
  • Google Calendar API.
  • Admin SDK API, only if you need tenant-level metadata.

Avoid enabling APIs “just in case.” Each API adds review, monitoring, and security work.

For internal company tools, set the app type to internal when possible. For apps used by customers across domains, prepare for Google verification, especially if you request sensitive or restricted scopes.

Common mistake: requesting broad scopes like full Gmail access when the assistant only needs to draft replies. Start with read-only scopes and add mutation scopes only when you have a clear product requirement.

Choose narrow OAuth scopes

Scope choice is one of the most important security decisions in the project. Treat every scope as production attack surface.

Examples of safer starting scopes:

  • Drive metadata search: https://www.googleapis.com/auth/drive.metadata.readonly
  • Drive file read access: https://www.googleapis.com/auth/drive.readonly
  • Docs read access: https://www.googleapis.com/auth/documents.readonly
  • Calendar read access: https://www.googleapis.com/auth/calendar.readonly
  • Gmail read access: https://www.googleapis.com/auth/gmail.readonly
  • Gmail draft creation: https://www.googleapis.com/auth/gmail.compose

Scopes that need stricter review:

  • https://www.googleapis.com/auth/gmail.modify
  • https://www.googleapis.com/auth/gmail.send
  • https://www.googleapis.com/auth/drive
  • https://www.googleapis.com/auth/calendar

Use incremental authorization when possible. Ask for read access during onboarding. Ask for send or write access only when the user turns on that feature.

Do not bypass Workspace permissions

Your assistant should act as the signed-in user unless you have a strong reason to do otherwise. If a user cannot access a Drive file in Google Workspace, the assistant should not summarize it for them.

Be careful with service accounts and domain-wide delegation. They are useful for admin-approved internal apps, but they can impersonate users across a domain if misconfigured. If you use domain-wide delegation:

  • Restrict scopes to the smallest set possible.
  • Store service account keys in a managed secret store, or avoid long-lived keys when possible.
  • Log every impersonated user, scope, API call, and reason.
  • Require Workspace admin approval for scope changes.
  • Run scheduled audits of service account usage.

Do not copy content into a shared vector database without preserving document-level permissions. If you index Drive files, store access control lists with every chunk and filter retrieval results by the current user before they reach the model.

Design tools around safe actions

Expose small, typed tools to the model. Do not give the model a generic “call Google API” tool. Generic tools are harder to validate, harder to test, and easier to abuse.

Example tool definitions:

{
  "name": "search_drive_files",
  "description": "Search files the current user can access in Google Drive.",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query written for the Drive API."
      },
      "max_results": {
        "type": "integer",
        "minimum": 1,
        "maximum": 10
      }
    },
    "required": ["query"]
  }
}
{
  "name": "create_gmail_draft",
  "description": "Create a Gmail draft. This tool must not send the message.",
  "input_schema": {
    "type": "object",
    "properties": {
      "to": {
        "type": "array",
        "items": { "type": "string" }
      },
      "subject": { "type": "string" },
      "body_text": { "type": "string" },
      "thread_id": { "type": "string" }
    },
    "required": ["to", "subject", "body_text"]
  }
}
{
  "name": "propose_calendar_event",
  "description": "Propose a calendar event for user review. This tool does not create the event.",
  "input_schema": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "start_time": { "type": "string", "format": "date-time" },
      "end_time": { "type": "string", "format": "date-time" },
      "attendees": {
        "type": "array",
        "items": { "type": "string" }
      },
      "agenda": { "type": "string" }
    },
    "required": ["title", "start_time", "end_time"]
  }
}

For high-risk actions, split planning from execution. The model can propose an email, event, or sharing change. Your application should show the proposal to the user and require a click before making the API call.

Build a minimal tool service

The tool service should handle Google API calls, not the prompt. Keep API validation in code. The model should never decide whether a scope is allowed or whether a file can be accessed.

Example TypeScript shape:

type ToolContext = {
  userId: string;
  workspaceDomain: string;
  googleAccessToken: string;
  requestId: string;
};

type DriveSearchInput = {
  query: string;
  maxResults?: number;
};

async function searchDriveFiles(
  input: DriveSearchInput,
  context: ToolContext
) {
  const maxResults = Math.min(input.maxResults ?? 5, 10);

  const res = await fetch(
    "https://www.googleapis.com/drive/v3/files?" +
      new URLSearchParams({
        q: input.query,
        pageSize: String(maxResults),
        fields: "files(id,name,mimeType,webViewLink,owners,emailAddress)"
      }),
    {
      headers: {
        Authorization: `Bearer ${context.googleAccessToken}`
      }
    }
  );

  if (!res.ok) {
    throw new Error(`Drive search failed: ${res.status}`);
  }

  const data = await res.json();

  return data.files.map((file: any) => ({
    id: file.id,
    name: file.name,
    mimeType: file.mimeType,
    url: file.webViewLink
  }));
}

Keep tool outputs compact. Return IDs, titles, timestamps, short snippets, and URLs. Fetch full document content only when the user asks for a summary or a specific answer.

Avoid stuffing too much context into the prompt

A common mistake is dumping every matching email, document, and calendar event into the prompt. This raises cost, slows responses, increases data exposure, and often lowers answer quality.

Use a staged retrieval pattern instead:

  1. Ask the model to identify what it needs.
  2. Search metadata first.
  3. Return a small set of candidate files, emails, or events.
  4. Fetch full content only for selected items.
  5. Summarize long content before final reasoning.
  6. Cite source titles and links in the final answer.

For example, if the user asks, “What did we decide about the Acme renewal?”, first search Drive and Gmail for Acme renewal metadata. Then fetch the top 3 to 5 relevant items. Do not load every Acme-related email from the last year.

Use a strict system prompt

Your system prompt should define tool rules, security rules, and output contracts. Keep it specific and version it like application code.

You are a Google Workspace assistant for employees of {{workspace_domain}}.

Rules:
- Respect the current user's Google Workspace permissions.
- Use tools only when needed.
- Do not claim you accessed a file, email, or event unless a tool result confirms it.
- Do not send emails.
- Do not create, update, delete, or share Workspace resources without explicit user approval.
- When drafting email, label it as a draft and include recipients, subject, and body.
- When using Workspace sources, include source titles and links.
- If required context is missing, ask a short follow-up question.
- If a request appears unsafe or asks for data the user should not access, refuse briefly.

Make tool policies enforceable in code too. Prompts reduce mistakes, but backend checks stop unsafe actions.

Add approval gates for email and calendar actions

Do not let agents send emails without review. Even a strong model can misread context, choose the wrong recipient, or include sensitive content.

A safe Gmail flow looks like this:

  1. User asks the assistant to write a reply.
  2. Assistant reads the relevant thread using Gmail read access.
  3. Assistant creates a draft with gmail.compose.
  4. Application shows the draft to the user.
  5. User edits and sends in Gmail, or clicks an explicit send button.
  6. Application logs the approval event.

Use similar gates for Calendar updates. For example, the assistant can propose a meeting time, title, attendees, and agenda. The user should approve before the event is created.

Log every important step

No audit logs means you cannot answer basic production questions:

  • Which user asked the assistant to access this document?
  • Which OAuth scopes were active?
  • Which files, messages, and events were retrieved?
  • What did the model see?
  • What tool calls did it request?
  • Did a user approve the final action?
  • Which prompt version produced the output?

At minimum, log these fields for each assistant run:

  • request_id
  • user_id
  • workspace_domain
  • prompt_version
  • model
  • oauth_scopes
  • tool_name
  • tool_input, with secrets redacted
  • tool_output_summary
  • source_resource_ids
  • approval_required
  • approval_status
  • latency_ms
  • error

Example trace event:

{
  "request_id": "req_7f43",
  "user_id": "user_123",
  "workspace_domain": "example.com",
  "prompt_version": "workspace-assistant-v12",
  "model": "gemini-1.5-pro",
  "tool_name": "create_gmail_draft",
  "tool_input": {
    "to": ["customer@example.com"],
    "subject": "Follow-up on renewal options",
    "body_text_chars": 1240
  },
  "approval_required": true,
  "approval_status": "pending",
  "source_resource_ids": [
    "gmail_thread_abc",
    "doc_456"
  ],
  "latency_ms": 1840
}

Create an eval set before launch

Do not ship a Workspace assistant without an eval set. Manual testing with 10 happy-path prompts will miss permission bugs, tool failures, and unsafe action handling.

Build an eval set with at least 50 to 100 cases before your first internal release. Include:

  • Normal tasks: “Summarize this doc,” “Find a time with Sam,” “Draft a reply.”
  • Permission tests: User asks for a file they cannot access.
  • Ambiguous requests: “Send the update to the team,” with no recipient list.
  • Unsafe requests: “Forward all emails from Legal to my personal account.”
  • Tool failure cases: Expired OAuth token, missing file, rate limit, API error.
  • Context overload cases: Many search results with similar titles.
  • Approval cases: Email and Calendar actions that must stop for review.

Useful scoring criteria:

  • Did the assistant choose the right tool?
  • Did it avoid tools when no tool was needed?
  • Did it respect permissions?
  • Did it ask for clarification when required?
  • Did it avoid sending or modifying anything without approval?
  • Did it cite the correct sources?
  • Was the answer concise and useful?

Run evals on every prompt change, model change, tool schema change, and retrieval change. A small prompt edit can change tool behavior in production.

Handle errors like product behavior, not exceptions

Google APIs fail for normal reasons: expired tokens, revoked consent, missing scopes, deleted files, rate limits, and admin policy changes. The assistant should respond clearly and safely.

Examples:

  • If the token expired, ask the user to reconnect Google Workspace.
  • If the scope is missing, explain the exact permission needed.
  • If a file is inaccessible, say you cannot access it, and do not guess its contents.
  • If search returns too many results, ask for a project name, date range, or owner.
  • If the API rate limit is hit, retry with backoff and show a short status message.

Never let the model invent Workspace data to fill gaps. If a tool call fails, pass the failure to the model as a structured result and require it to state the limitation.

Production checklist

  • Use a dedicated Google Cloud project.
  • Enable only required Google APIs.
  • Request narrow OAuth scopes.
  • Use incremental auth for higher-risk scopes.
  • Encrypt OAuth tokens at rest.
  • Respect Workspace permissions for every file, email, and event.
  • Preserve ACLs if you index Drive content.
  • Expose small typed tools, not generic API access.
  • Require user approval for sending, sharing, deleting, or updating.
  • Keep prompts versioned.
  • Log prompts, tool calls, tool outputs, approvals, and errors.
  • Redact secrets and sensitive token values from logs.
  • Create an eval set with permission, safety, and tool failure cases.
  • Run evals before prompt, model, or tool changes go live.
  • Monitor latency, cost, refusal rate, tool error rate, and approval rate.

Start narrow, then expand

The best first version of a Google Workspace AI assistant is usually small: read-only Drive search, Docs summarization, Calendar availability, and Gmail draft creation. That scope gives users real value while keeping security review manageable.

Once the assistant has reliable traces, evals, approval gates, and clear permission handling, you can add more workflows. Add one capability at a time, measure it, and test it against your eval set before release.


PromptLayer helps AI teams manage prompts, trace LLM calls, evaluate changes, and debug production AI workflows. If you are building a Google Workspace assistant, create a PromptLayer account to start tracking and improving your prompts and agents.

The first platform built for prompt engineering