Reference

AI Engineering Glossary

Working definitions of the AI engineering, RAG, agent, AEO/GEO, and compliance terms enterprises encounter when planning production AI systems. Bookmark this page; we keep it current.

A3 terms

AEO

Answer Engine Optimization / Agent Engine Optimization

The practice of structuring content so AI answer engines and agents (ChatGPT, Claude, Perplexity, Google AI Overviews, Microsoft Copilot) cite it when answering user questions. The optimization target is citation inside an AI-generated answer, not a click.

AI Agent

An AI system with an outer reasoning loop where the model decides what to do next — including calling tools, branching, looping, and self-correcting. Agents handle multi-step tasks that fixed chains cannot complete reliably.

Agentic Memory

Persistent state across an agent's interactions, divided into four tiers: working (current task), episodic (past session summaries), semantic (extracted facts), and procedural (learned routines and policies).

B1 term

BAA

Business Associate Agreement

A legal instrument under HIPAA that makes a vendor handling PHI on a covered entity's behalf a 'business associate' with defined obligations. Without an unbroken BAA chain, a HIPAA-covered AI deployment is non-compliant regardless of technical posture.

E4 terms

Embedding

Vector Embedding

A high-dimensional numeric representation of text, images, or other media produced by an embedding model. Embeddings of similar content sit close in vector space, enabling semantic search and clustering.

Eval Harness

Evaluation Harness

An automated test infrastructure for AI systems comprising a golden set of scenarios, an LLM-as-judge rubric scorer, production trace replay, and quarterly human calibration. The minimum production-grade quality control.

EU AI Act

EU Artificial Intelligence Act

The European Union's risk-tiered AI regulation, fully applicable from August 2026. Imposes substantive obligations on high-risk AI systems (HR, credit scoring, critical infrastructure, public services) including documentation, transparency, and human oversight.

E-E-A-T

Experience, Expertise, Authoritativeness, Trustworthiness

Google's framing for content quality. Maps cleanly onto AI engine source-selection signals: identified authors, methodology disclosure, original observations, and verifiable affiliations.

F1 term

Fine-Tuning

Model Fine-Tuning

The process of adjusting a pre-trained model's weights (or adding an adapter) so that its default behaviour shifts toward patterns in a smaller training dataset. Useful for style, format, and decision rules — not for teaching new facts.

G3 terms

GEO

Generative Engine Optimization

A synonym for AEO. The practice of structuring web content so generative AI search engines surface and cite it. Key levers are answer-first structure, FAQPage schema, evidence density, and crawl-friendliness.

Groundedness

Groundedness Score

A metric measuring the proportion of factual claims in an AI-generated response that are supported by retrieved evidence. Production targets typically sit at 0.95 or above on a rolling 7-day window.

Guardrails

AI Guardrails

Runtime controls that enforce policy on AI inputs and outputs — prompt injection defenses, PII detection, toxicity scoring, citation requirement enforcement, and pre-action approval for destructive tools.

H3 terms

Hallucination

AI Hallucination

Any generated claim that is not supported by retrieved evidence or that contradicts ground truth. Types include intrinsic (contradicts retrieved passage), extrinsic (adds facts not in any passage), and citation fabrication (invents a source).

HIPAA

Health Insurance Portability and Accountability Act

US federal law governing protected health information (PHI). HIPAA-compliant AI requires a signed BAA with every vendor handling PHI, encryption in transit and at rest, audit logging at request granularity, and six-year log retention.

L5 terms

LLM

Large Language Model

A neural network trained on broad text corpora that generates language conditioned on a prompt. Production LLMs in 2026 range from sub-3B parameter edge models to frontier models above 400B parameters.

LangGraph

A framework for stateful, multi-actor AI applications built around cyclic graphs with explicit state, persistable checkpoints, and human-in-the-loop interruption. The strongest default for production agentic systems in 2026.

LoRA

Low-Rank Adaptation

A parameter-efficient fine-tuning technique that adds small low-rank matrices to a pre-trained model's weights, drastically reducing training cost and adapter size. Standard for adapting open-weight LLMs in 2026.

LLM-as-Judge

LLM-as-Judge Evaluation

A pattern where one LLM scores another's outputs against a rubric. Reliable when the judge is at least as capable as the model under test, the rubric is specific, and the judge is calibrated periodically against human labels.

llms.txt

A proposed convention (by Jeremy Howard) for a markdown digest of a website's content placed at the root URL, intended to give LLMs and AI agents a clean structured overview of the site for grounding and citation.

M3 terms

Multi-Agent System

An AI architecture with multiple specialised agents collaborating on a task — for example research, writing, fact-checking, and publishing. Common frameworks include LangGraph, CrewAI, and AutoGen.

Multi-Tier Routing

Multi-Tier Model Routing

A pattern where a small model classifies query difficulty and routes to small, medium, or large models accordingly. Typically cuts total LLM spend 40–60% with no measurable quality loss when the classifier is well-tuned.

MCP

Model Context Protocol

An open standard for connecting AI assistants to data sources and tools through a uniform interface. MCP servers expose resources, prompts, and tools that any MCP-compatible client can consume.

P3 terms

Prompt Caching

A provider feature that caches the prefill of stable prompt prefixes and charges 50–90% less for cached tokens. Materially reduces both cost and TTFT for production workloads with consistent system prompts.

Prompt Injection

An attack where malicious instructions in user input or retrieved content override the model's intended behaviour. Defences include input validation, output sanitisation, isolated tool sandboxes, and structured output enforcement.

Partial Prerendering

Partial Prerendering (PPR)

A Next.js feature combining static prerendering of the page shell with dynamic streaming of interactive components. Delivers LCP under 1.2s even when AI TTFT is 600ms or more.

Q1 term

Quantization

Model Quantization

A compression technique that reduces the numeric precision of model weights (e.g. FP16 to INT8 or INT4) to cut memory footprint and accelerate inference, with small accuracy trade-offs.

R2 terms

RAG

Retrieval-Augmented Generation

An architecture where a large language model generates answers grounded in documents retrieved at inference time, rather than relying solely on its training data. A typical RAG system has four stages: embedding, retrieval, optional reranking, and grounded generation.

Reranker

Cross-Encoder Reranker

A model that re-scores retrieved candidates by jointly encoding the (query, document) pair, producing more accurate relevance scores than the bi-encoder used in initial retrieval. Standard rerankers in 2026 include BGE-reranker-v2-m3 and Cohere Rerank 3.

S6 terms

Sovereign RAG

Sovereign Retrieval-Augmented Generation

A RAG architecture where every component — embedding model, vector store, retriever, and LLM — runs inside infrastructure the organisation legally controls. No request, embedding, or document ever leaves the customer-owned VPC, region, or air-gapped data center.

Semantic Caching

A cache that stores responses keyed by the embedding of the query, returning a cached response when a new query is semantically similar. Cuts hot-query latency to sub-50ms and reduces inference cost on repeat queries.

Schrems II

Schrems II Ruling

The 2020 Court of Justice of the European Union ruling that invalidated the EU-US Privacy Shield and made every trans-Atlantic personal data transfer a legal-risk decision rather than a default. A primary driver of sovereign AI adoption in the EU.

SOC 2

Service Organization Control 2

An AICPA framework demonstrating operational controls over Security, Availability, Processing Integrity, Confidentiality, and Privacy. Type I is point-in-time; Type II requires twelve months of operating-effectiveness evidence.

Server Components

React Server Components

React components that render exclusively on the server and ship zero JavaScript to the client. The largest single lever for bundle size and time-to-interactive in modern React applications.

Speakable Schema

A schema.org property identifying the CSS selectors on a page most suitable for voice assistants to read aloud. Telegraphs to AI engines which content is the cleanest extractable answer.

T2 terms

TTFT

Time To First Token

The latency between sending a prompt to an LLM and receiving the first character of the response. Below ~400ms feels instant; above ~1s users perceive the system as 'thinking too hard.' TTFT is the dominant UX metric for conversational AI.

Tool Use

Tool Use / Function Calling

The capability of an LLM to call structured external functions — search, database queries, code execution, API calls — as part of its response, enabling agents to take actions beyond text generation.

V1 term

Vector Database

A storage system optimised for high-dimensional vector embeddings with approximate nearest neighbour search. Major options in 2026 include Pinecone (managed), Qdrant (self-hosted), Weaviate (hybrid search), and pgvector (Postgres extension).

Strategic Deployment Active

Building With Any of
These Concepts?

Architecture audits, AI knowledge systems, autonomous agents — the engineering you need, built under your ownership.

Synthara Logo

SyntharaTechnologies

Your dedicated partner in enterprise AI transformation. We build production-ready, sovereign intelligence architectures designed explicitly to secure your strategic and competitive advantage.

Direct Communication

INITIATE
PROTOCOL.

Ready to secure your strategic advantage? Connect with our engineering nodes directly.

© 2026 SyntharaTechnologies
Private Limited Venture.Engineered in India • Deploying Strategic Nodes Globally.
Sovereign Excellence