AI glossary for content assistants
Plain-English definitions of 13,917 AI terms for branded assistant teams.
Search glossary terms
13,917 glossary pages match your filters.
Category
Browse by letter
Glossary
13,917 terms. Open one for definitions and related concepts.
Gaudi 2
Gaudi 2 is the second-generation AI training and inference processor from Intel (originally Habana Labs), designed to compete with NVIDIA A100-class GPUs.
Gaudi 3
Gaudi 3 is the third-generation AI accelerator from Intel, offering a significant performance leap targeting NVIDIA H100-class workloads for AI training and inference.
MI300X
The AMD Instinct MI300X is a data center GPU accelerator featuring 192GB of HBM3 memory, designed to compete with the NVIDIA H100 for AI training and inference.
SambaNova SN40L
The SambaNova SN40L is a reconfigurable dataflow AI chip that uses a unique architecture to accelerate both training and inference, particularly for enterprise AI workloads.
Qualcomm AI
Qualcomm AI encompasses the AI processing capabilities in Qualcomm Snapdragon chips, enabling on-device AI for smartphones, PCs, automotive, and IoT applications.
Google TPU Hardware
Google TPU Hardware refers to the physical infrastructure including custom chips, pods, and interconnects that make up Google Cloud TPU systems.
HBM2
HBM2 (High Bandwidth Memory 2) is the second generation of HBM technology, providing high bandwidth memory stacked vertically on or near the processor die.
HBM2e
HBM2e is an enhanced version of HBM2 memory offering higher capacity and bandwidth per stack, used in GPUs like the NVIDIA A100.
HBM3e
HBM3e is the enhanced version of HBM3 memory, offering higher bandwidth and capacity for next-generation AI accelerators like the NVIDIA H200 and B200.
Unified Memory
Unified memory is an architecture where the CPU and GPU (or other accelerators) share a single memory pool, eliminating the need for explicit data transfers between processors.
Memory Hierarchy
A memory hierarchy is a structured arrangement of storage levels from fast but small (registers, cache) to slow but large (DRAM, disk), designed to optimize data access for AI workloads.
Memory Offloading
Memory offloading moves portions of AI model data from GPU memory to CPU memory or storage to enable running larger models than GPU memory alone allows.
CPU Offloading
CPU offloading moves specific AI model components from GPU to CPU memory and processing, enabling larger models to run on limited GPU resources.
Fog Computing
Fog computing extends cloud computing to the network edge, providing distributed processing between end devices and centralized data centers for latency-sensitive AI applications.
Hybrid Cloud
Hybrid cloud combines on-premise infrastructure with public cloud resources, allowing AI workloads to run where they are most appropriate based on data sensitivity, cost, and performance needs.
Cluster Computing
Cluster computing connects multiple computers to work together as a unified system, providing the aggregate compute power needed for training large AI models.
Quantum Advantage
Quantum advantage is the demonstrated ability of a quantum computer to solve a problem faster or more efficiently than any classical computer, a milestone for quantum computing.
Neuromorphic Computing
Neuromorphic computing is a computing paradigm that mimics the structure and function of biological neural networks in silicon, using spiking neurons and event-driven processing.
In-Memory Computing
In-memory computing performs computations directly within memory arrays, eliminating the data transfer bottleneck between processing units and memory that limits AI performance.
AI Accelerator
An AI accelerator is a specialized hardware device designed to speed up artificial intelligence workloads, including training and inference of machine learning models.
Inference Chip
An inference chip is a processor optimized specifically for running trained AI models in production, prioritizing throughput, latency, and energy efficiency over training capability.
Systolic Array
A systolic array is a grid of processing elements that rhythmically pass data between neighbors, efficiently computing matrix multiplications central to AI workloads.
Wafer-Scale Engine
A wafer-scale engine is a processor built from an entire silicon wafer rather than individual chips, providing massive compute and memory in a single device.
InfiniBand
InfiniBand is a high-speed, low-latency networking technology used to connect GPUs and servers in AI training clusters, providing the bandwidth needed for distributed training.
RDMA
Remote Direct Memory Access (RDMA) enables direct memory-to-memory data transfer between computers without involving the operating system, essential for high-performance AI training networks.
PCIe
PCI Express (PCIe) is the standard high-speed interface connecting GPUs and other accelerators to the CPU and system memory in servers and workstations.
Multi-Instance GPU
Multi-Instance GPU (MIG) is an NVIDIA technology that partitions a single GPU into multiple isolated instances, each with dedicated compute, memory, and cache resources.
GPU Virtualization
GPU virtualization enables multiple virtual machines or containers to share a single physical GPU, improving utilization and enabling multi-tenant GPU access.
Power Usage Effectiveness
Power Usage Effectiveness (PUE) is a metric measuring data center energy efficiency, calculated as total facility power divided by IT equipment power.
Liquid Cooling
Liquid cooling uses fluids to remove heat from high-power AI hardware, enabling dense GPU deployments that would be impossible with air cooling alone.
Tensor Processing
Tensor processing refers to hardware-accelerated operations on multi-dimensional arrays (tensors) that form the fundamental data structure and computation pattern in deep learning.
Thermal Design Power
Thermal Design Power (TDP) is the maximum amount of heat a processor generates under sustained workload, determining cooling requirements and power delivery for AI hardware.
Chiplet
A chiplet is a small, modular die that can be combined with other chiplets in a single package to build larger, more complex processors for AI workloads.
Process Node
A process node (e.g., 5nm, 4nm, 3nm) refers to the semiconductor manufacturing technology used to fabricate AI chips, with smaller nodes enabling more transistors and better efficiency.
Interconnect
An interconnect is the communication link between processing elements in AI systems, from chip-level buses to data center networks, critically affecting distributed AI performance.
Data Center GPU
A data center GPU is a GPU specifically designed for deployment in servers and data centers, optimized for AI training, inference, and high-performance computing workloads.
AI Chip Startup
AI chip startups are companies developing novel processor architectures specifically for artificial intelligence, challenging established GPU vendors with specialized designs.
Hardware-Accelerated Inference
Hardware-accelerated inference uses specialized processors to run trained AI models faster and more efficiently than general-purpose CPUs, enabling real-time AI applications.
GPU Cluster
A GPU cluster is a group of interconnected servers each containing multiple GPUs, providing the aggregate compute power needed to train large AI models.
Batch Processing
Batch processing in AI hardware refers to processing multiple inputs simultaneously on a GPU or accelerator, maximizing throughput and hardware utilization.
FLOPS
FLOPS (Floating-Point Operations Per Second) measures the computational throughput of a processor, serving as the primary benchmark for comparing AI hardware performance.
TOPS
TOPS (Tera Operations Per Second) measures the integer computational throughput of AI accelerators, commonly used to rate NPUs and edge AI chips.
Roofline Model
The roofline model is a performance analysis framework that shows whether an AI workload is limited by compute throughput or memory bandwidth on a given processor.
Hardware-Software Co-Design
Hardware-software co-design is the practice of developing AI hardware and software together to achieve optimal performance, where each informs the design of the other.
Sparsity in Hardware
Hardware sparsity support enables processors to skip zero-valued computations in neural networks, effectively doubling throughput for sparse models.
AI Training Infrastructure
AI training infrastructure encompasses all hardware, networking, storage, and software systems required to train machine learning models at scale.
Hardware Lottery
The hardware lottery describes how certain AI research ideas succeed not because they are fundamentally better, but because they align well with available hardware capabilities.
NVIDIA Grace Hopper
NVIDIA Grace Hopper is a superchip combining a Grace CPU and H100 GPU with a high-bandwidth NVLink-C2C interconnect, designed for memory-intensive AI workloads.
Turn owned content into answers
Use InsertChat to launch a branded assistant visitors can ask directly.
7-day free trial · No card required
Try the FAQ like a visitor.
Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.
InsertChat
Interactive FAQ
Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.
Product FAQ
What is InsertChat?
InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.
How does InsertChat use my website content?
Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.
Can I control the assistant's tone and sources?
Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.
How does InsertChat stay accurate?
Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.
Can it collect leads or route support questions?
Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.
Can I control how the assistant behaves?
Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.
Which AI models can I use?
InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.
Can I pick different models for different workflows?
Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.
Where can I deploy an assistant?
Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.
Do I need coding skills?
No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.
Can I customize the branding and UI?
Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.
Can I use my own domain?
Yes. Custom domains are supported, typically via enterprise options.
Does InsertChat support voice?
Yes. Voice dictation and text-to-speech let users speak instead of type.
Does InsertChat support vision?
Yes. Enable vision for assistants when images help clarify a request or context.
What tools and integrations are supported?
Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.
Can I control which tools the assistant is allowed to use?
Yes. Tool access is controlled per assistant so you enable only what you need.
Can the agent hand off to a human?
Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.
Do you provide analytics?
Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.
Is it mobile friendly?
Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.
What's the fastest path to a successful deployment?
Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.
What is the fastest way to get started?
Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.