Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

Gaudi 2

Gaudi 2 is the second-generation AI training and inference processor from Intel (originally Habana Labs), designed to compete with NVIDIA A100-class GPUs.

Open page

Gaudi 3

Gaudi 3 is the third-generation AI accelerator from Intel, offering a significant performance leap targeting NVIDIA H100-class workloads for AI training and inference.

Open page

MI300X

The AMD Instinct MI300X is a data center GPU accelerator featuring 192GB of HBM3 memory, designed to compete with the NVIDIA H100 for AI training and inference.

Open page

SambaNova SN40L

The SambaNova SN40L is a reconfigurable dataflow AI chip that uses a unique architecture to accelerate both training and inference, particularly for enterprise AI workloads.

Open page

Qualcomm AI

Qualcomm AI encompasses the AI processing capabilities in Qualcomm Snapdragon chips, enabling on-device AI for smartphones, PCs, automotive, and IoT applications.

Open page

Google TPU Hardware

Google TPU Hardware refers to the physical infrastructure including custom chips, pods, and interconnects that make up Google Cloud TPU systems.

Open page

HBM2

HBM2 (High Bandwidth Memory 2) is the second generation of HBM technology, providing high bandwidth memory stacked vertically on or near the processor die.

Open page

HBM2e

HBM2e is an enhanced version of HBM2 memory offering higher capacity and bandwidth per stack, used in GPUs like the NVIDIA A100.

Open page

HBM3e

HBM3e is the enhanced version of HBM3 memory, offering higher bandwidth and capacity for next-generation AI accelerators like the NVIDIA H200 and B200.

Open page

Unified Memory

Unified memory is an architecture where the CPU and GPU (or other accelerators) share a single memory pool, eliminating the need for explicit data transfers between processors.

Open page

Memory Hierarchy

A memory hierarchy is a structured arrangement of storage levels from fast but small (registers, cache) to slow but large (DRAM, disk), designed to optimize data access for AI workloads.

Open page

Memory Offloading

Memory offloading moves portions of AI model data from GPU memory to CPU memory or storage to enable running larger models than GPU memory alone allows.

Open page

CPU Offloading

CPU offloading moves specific AI model components from GPU to CPU memory and processing, enabling larger models to run on limited GPU resources.

Open page

Fog Computing

Fog computing extends cloud computing to the network edge, providing distributed processing between end devices and centralized data centers for latency-sensitive AI applications.

Open page

Hybrid Cloud

Hybrid cloud combines on-premise infrastructure with public cloud resources, allowing AI workloads to run where they are most appropriate based on data sensitivity, cost, and performance needs.

Open page

Cluster Computing

Cluster computing connects multiple computers to work together as a unified system, providing the aggregate compute power needed for training large AI models.

Open page

Quantum Advantage

Quantum advantage is the demonstrated ability of a quantum computer to solve a problem faster or more efficiently than any classical computer, a milestone for quantum computing.

Open page

Neuromorphic Computing

Neuromorphic computing is a computing paradigm that mimics the structure and function of biological neural networks in silicon, using spiking neurons and event-driven processing.

Open page

In-Memory Computing

In-memory computing performs computations directly within memory arrays, eliminating the data transfer bottleneck between processing units and memory that limits AI performance.

Open page

AI Accelerator

An AI accelerator is a specialized hardware device designed to speed up artificial intelligence workloads, including training and inference of machine learning models.

Open page

Inference Chip

An inference chip is a processor optimized specifically for running trained AI models in production, prioritizing throughput, latency, and energy efficiency over training capability.

Open page

Systolic Array

A systolic array is a grid of processing elements that rhythmically pass data between neighbors, efficiently computing matrix multiplications central to AI workloads.

Open page

Wafer-Scale Engine

A wafer-scale engine is a processor built from an entire silicon wafer rather than individual chips, providing massive compute and memory in a single device.

Open page

InfiniBand

InfiniBand is a high-speed, low-latency networking technology used to connect GPUs and servers in AI training clusters, providing the bandwidth needed for distributed training.

Open page

RDMA

Remote Direct Memory Access (RDMA) enables direct memory-to-memory data transfer between computers without involving the operating system, essential for high-performance AI training networks.

Open page

PCIe

PCI Express (PCIe) is the standard high-speed interface connecting GPUs and other accelerators to the CPU and system memory in servers and workstations.

Open page

Multi-Instance GPU

Multi-Instance GPU (MIG) is an NVIDIA technology that partitions a single GPU into multiple isolated instances, each with dedicated compute, memory, and cache resources.

Open page

GPU Virtualization

GPU virtualization enables multiple virtual machines or containers to share a single physical GPU, improving utilization and enabling multi-tenant GPU access.

Open page

Power Usage Effectiveness

Power Usage Effectiveness (PUE) is a metric measuring data center energy efficiency, calculated as total facility power divided by IT equipment power.

Open page

Liquid Cooling

Liquid cooling uses fluids to remove heat from high-power AI hardware, enabling dense GPU deployments that would be impossible with air cooling alone.

Open page

Tensor Processing

Tensor processing refers to hardware-accelerated operations on multi-dimensional arrays (tensors) that form the fundamental data structure and computation pattern in deep learning.

Open page

Thermal Design Power

Thermal Design Power (TDP) is the maximum amount of heat a processor generates under sustained workload, determining cooling requirements and power delivery for AI hardware.

Open page

Chiplet

A chiplet is a small, modular die that can be combined with other chiplets in a single package to build larger, more complex processors for AI workloads.

Open page

Process Node

A process node (e.g., 5nm, 4nm, 3nm) refers to the semiconductor manufacturing technology used to fabricate AI chips, with smaller nodes enabling more transistors and better efficiency.

Open page

Interconnect

An interconnect is the communication link between processing elements in AI systems, from chip-level buses to data center networks, critically affecting distributed AI performance.

Open page

Data Center GPU

A data center GPU is a GPU specifically designed for deployment in servers and data centers, optimized for AI training, inference, and high-performance computing workloads.

Open page

AI Chip Startup

AI chip startups are companies developing novel processor architectures specifically for artificial intelligence, challenging established GPU vendors with specialized designs.

Open page

Hardware-Accelerated Inference

Hardware-accelerated inference uses specialized processors to run trained AI models faster and more efficiently than general-purpose CPUs, enabling real-time AI applications.

Open page

GPU Cluster

A GPU cluster is a group of interconnected servers each containing multiple GPUs, providing the aggregate compute power needed to train large AI models.

Open page

Batch Processing

Batch processing in AI hardware refers to processing multiple inputs simultaneously on a GPU or accelerator, maximizing throughput and hardware utilization.

Open page

FLOPS

FLOPS (Floating-Point Operations Per Second) measures the computational throughput of a processor, serving as the primary benchmark for comparing AI hardware performance.

Open page

TOPS

TOPS (Tera Operations Per Second) measures the integer computational throughput of AI accelerators, commonly used to rate NPUs and edge AI chips.

Open page

Roofline Model

The roofline model is a performance analysis framework that shows whether an AI workload is limited by compute throughput or memory bandwidth on a given processor.

Open page

Hardware-Software Co-Design

Hardware-software co-design is the practice of developing AI hardware and software together to achieve optimal performance, where each informs the design of the other.

Open page

Sparsity in Hardware

Hardware sparsity support enables processors to skip zero-valued computations in neural networks, effectively doubling throughput for sparse models.

Open page

AI Training Infrastructure

AI training infrastructure encompasses all hardware, networking, storage, and software systems required to train machine learning models at scale.

Open page

Hardware Lottery

The hardware lottery describes how certain AI research ideas succeed not because they are fundamentally better, but because they align well with available hardware capabilities.

Open page

NVIDIA Grace Hopper

NVIDIA Grace Hopper is a superchip combining a Grace CPU and H100 GPU with a high-bandwidth NVLink-C2C interconnect, designed for memory-intensive AI workloads.

Open page

Page 129 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.