Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

Prediction Drift

Prediction drift is the change in the distribution of a model output predictions over time, which may indicate data drift, concept drift, or model degradation.

Open page

Covariate Shift

Covariate shift is a type of data drift where the input feature distribution changes between training and production, while the relationship between features and labels remains the same.

Open page

Performance Monitoring for ML

Performance monitoring for ML tracks both system-level metrics (latency, throughput, errors) and model-level metrics (accuracy, drift) for deployed AI systems.

Open page

Throughput Monitoring

Throughput monitoring tracks the number of inference requests an ML system processes per unit of time, ensuring capacity meets demand.

Open page

Cost Monitoring for ML

Cost monitoring for ML tracks and optimizes the expenses associated with ML infrastructure, including compute, storage, data transfer, and API costs.

Open page

Token Usage Monitoring

Token usage monitoring tracks the consumption of input and output tokens in LLM applications to manage costs, enforce quotas, and optimize prompt engineering.

Open page

Alerting for ML

Alerting for ML automatically notifies teams when ML model or infrastructure metrics cross defined thresholds, enabling rapid response to issues.

Open page

Anomaly Detection for Monitoring

Anomaly detection for monitoring uses statistical or ML methods to automatically identify unusual patterns in model behavior, data, or system metrics that may indicate problems.

Open page

Azure Machine Learning

Azure Machine Learning is a cloud service for building, training, deploying, and managing ML models at scale with enterprise features for governance and collaboration.

Open page

Google Vertex AI Infrastructure

Google Vertex AI infrastructure provides managed compute, training, and serving capabilities for ML models on Google Cloud, including TPU access and AutoML.

Open page

Modal

Modal is a cloud platform for running compute-intensive Python functions serverlessly, offering GPU access, container management, and scaling for ML workloads.

Open page

RunPod

RunPod is a cloud platform providing on-demand GPU instances and serverless GPU endpoints for ML training, inference, and development at competitive prices.

Open page

Together AI Platform

Together AI is a cloud platform for running, fine-tuning, and serving open-source AI models with optimized inference and competitive pricing.

Open page

Groq Cloud

Groq Cloud provides ultra-fast LLM inference using custom LPU (Language Processing Unit) hardware designed for sequential token generation at industry-leading speeds.

Open page

Cerebras Cloud

Cerebras Cloud provides AI inference and training using the Cerebras Wafer-Scale Engine, the largest chip ever built, designed for extreme-scale AI compute.

Open page

Databricks

Databricks is a unified analytics and AI platform that combines data engineering, data science, and ML on a lakehouse architecture with Apache Spark.

Open page

Snowflake Cortex

Snowflake Cortex provides AI and ML capabilities directly within the Snowflake Data Cloud, enabling LLM functions, ML model building, and AI-powered analytics on warehouse data.

Open page

Data Pipeline Infrastructure

Data pipeline infrastructure is the technical foundation for building, running, and monitoring automated data workflows that move and transform data for ML and analytics.

Open page

dbt (Data Build Tool)

dbt is a transformation tool that enables data teams to build reliable data transformations in SQL, with version control, testing, and documentation for analytics and ML feature engineering.

Open page

Replicate Platform

Replicate is a cloud platform for running open-source ML models with a simple API, handling infrastructure, scaling, and model packaging automatically.

Open page

Ollama Infrastructure

Ollama provides local infrastructure for running large language models on personal hardware, with a simple CLI, model management, and an API server for application integration.

Open page

llama.cpp Infrastructure

llama.cpp provides the foundational C/C++ inference engine for running quantized LLMs efficiently on CPUs and consumer GPUs across all major platforms.

Open page

GPU Training

GPU training uses graphics processing units to accelerate machine learning model training through massive parallel computation of matrix operations and gradient calculations.

Open page

Model Governance Framework

A model governance framework is a structured set of policies, roles, and processes that organizations implement to manage ML models responsibly throughout their lifecycle.

Open page

ML Platform

An ML platform is a unified set of tools and infrastructure that enables data scientists and ML engineers to build, train, deploy, and monitor models efficiently.

Open page

Feature Engineering Pipeline

A feature engineering pipeline automates the process of transforming raw data into meaningful features that ML models can use for training and inference.

Open page

GPU Memory Management

GPU memory management involves techniques for efficiently allocating, using, and freeing GPU memory during ML training and inference to maximize model size and throughput.

Open page

Model Artifact

A model artifact is the serialized file or collection of files that represent a trained ML model, including weights, architecture, configuration, and metadata needed for inference.

Open page

Model Reproducibility

Model reproducibility is the ability to recreate an ML model with identical or near-identical performance by reusing the same data, code, parameters, and environment.

Open page

Training Data Management

Training data management encompasses the processes and tools for collecting, storing, versioning, labeling, and governing the datasets used to train ML models.

Open page

Model Deployment Strategy

A model deployment strategy defines the approach for releasing new ML models to production, including rollout patterns, testing procedures, and rollback plans.

Open page

Infrastructure as Code for ML

Infrastructure as Code (IaC) for ML defines and manages ML infrastructure, including GPU clusters, serving endpoints, and pipelines, through version-controlled configuration files.

Open page

ML Observability

ML observability is the ability to understand the internal state of ML systems through monitoring, logging, tracing, and analysis of models, data, and infrastructure.

Open page

Model Rollback

Model rollback is the process of reverting a production ML model to a previous version when the current version exhibits issues like degraded performance or unexpected behavior.

Open page

KV Cache

KV cache stores the key and value tensors from previous tokens during LLM inference, avoiding redundant computation and dramatically speeding up autoregressive text generation.

Open page

Continuous Batching

Continuous batching dynamically adds new inference requests to an active batch as existing requests complete, maximizing GPU utilization for LLM serving.

Open page

Model Optimization

Model optimization is the process of improving an ML model for production deployment by reducing size, increasing speed, and lowering resource requirements while maintaining quality.

Open page

Shadow Deployment

Shadow deployment runs a new ML model alongside the production model, sending real traffic to both but only serving responses from the current model, to validate the new model safely.

Open page

Model Serving Cost

Model serving cost is the total expense of running ML inference in production, including compute, memory, storage, networking, and operational overhead.

Open page

Model Caching

Model caching stores model predictions, intermediate computations, or model weights in fast-access memory to reduce latency, compute costs, and loading times.

Open page

GPU Orchestration

GPU orchestration manages the allocation, scheduling, and lifecycle of GPU resources across ML training and inference workloads in shared compute environments.

Open page

Model Registry Best Practices

Model registry best practices are guidelines for effectively organizing, versioning, and managing ML models within a registry to support reliable deployments and governance.

Open page

Vector Database Infrastructure

Vector database infrastructure provides specialized storage and retrieval systems optimized for high-dimensional embedding vectors used in AI applications like semantic search and RAG.

Open page

ML Security

ML security encompasses the practices and tools for protecting ML systems from adversarial attacks, data poisoning, model theft, and other security threats specific to AI.

Open page

Model Testing

Model testing systematically evaluates ML models beyond standard metrics, including behavioral tests, edge cases, fairness checks, and robustness assessments.

Open page

LLM Gateway

An LLM gateway is a proxy layer that routes requests to multiple LLM providers, providing unified access, cost optimization, fallback handling, and observability for AI applications.

Open page

Batch Processing for ML

Batch processing for ML runs model predictions on large datasets in bulk, optimizing for throughput and cost rather than latency for offline or scheduled workloads.

Open page

Model Packaging

Model packaging bundles a trained ML model with its dependencies, preprocessing code, and configuration into a portable, deployable artifact.

Open page

Page 76 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.