Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

Pairwise Comparison

Pairwise comparison evaluates models by directly comparing two responses to the same prompt and selecting the better one.

Open page

Elo System

The Elo system is a mathematical framework for computing relative skill levels from pairwise competition results, widely used for LLM ranking.

Open page

Win Rate

Win rate is the percentage of times a model is preferred over a baseline or competitor in pairwise evaluation comparisons.

Open page

Bootstrap Confidence

Bootstrap confidence intervals estimate the uncertainty in benchmark scores by repeatedly resampling evaluation data.

Open page

RoPE

RoPE (Rotary Position Embedding) is a position encoding method that uses rotation matrices to encode token positions in the attention mechanism.

Open page

Grouped Query Attention

Grouped query attention shares key-value heads across multiple query heads, reducing memory usage while maintaining model quality.

Open page

Multi-Head Attention

Multi-head attention runs multiple parallel attention operations, allowing the model to jointly attend to information from different representation subspaces.

Open page

Scaled Dot-Product Attention

Scaled dot-product attention is the core attention computation that measures token compatibility by computing scaled dot products of queries and keys.

Open page

QKV Projection

QKV projections are the learned linear transformations that produce query, key, and value vectors from input embeddings for attention computation.

Open page

Attention Mask

An attention mask controls which tokens can attend to which other tokens in the attention computation, enabling causal and selective attention.

Open page

Causal Mask

A causal mask is a triangular attention mask that prevents each token from attending to subsequent tokens, enabling autoregressive generation.

Open page

Padding Mask

A padding mask prevents the attention mechanism from attending to padding tokens added to equalize sequence lengths in batched processing.

Open page

Pre-Norm Architecture

Pre-norm architecture applies layer normalization before the attention and feed-forward sublayers rather than after, improving training stability.

Open page

Post-Norm Architecture

Post-norm architecture applies layer normalization after the attention and feed-forward sublayers, as in the original transformer design.

Open page

Parallel Attention

Parallel attention computes the attention and feed-forward sublayers simultaneously rather than sequentially within each transformer block.

Open page

SwiGLU Activation

SwiGLU is an activation function combining Swish and Gated Linear Units that has become standard in modern LLM feed-forward layers.

Open page

RMSNorm

RMSNorm is a simplified layer normalization that uses only root mean square statistics, providing faster computation with comparable quality.

Open page

Pre-Training Data

Pre-training data is the massive text corpus used to train the base language model, typically containing trillions of tokens from diverse sources.

Open page

Common Crawl

Common Crawl is a publicly available web archive containing petabytes of raw web data, serving as the primary source for LLM pre-training.

Open page

The Pile

The Pile is an 825 GB curated dataset of diverse English text from 22 sources, designed specifically for training large language models.

Open page

RedPajama

RedPajama is an open-source pre-training dataset replicating the data recipe of the original Llama model with publicly available sources.

Open page

RefinedWeb

RefinedWeb is a high-quality web dataset demonstrating that properly filtered web data alone can match curated multi-source datasets for LLM training.

Open page

StarCoder Data

StarCoder Data is a large-scale code dataset with permissively licensed source code from GitHub, used for training code-focused language models.

Open page

SlimPajama

SlimPajama is a deduplicated and cleaned version of RedPajama, reducing 1.2 trillion tokens to 627 billion high-quality tokens.

Open page

Dolma

Dolma is an open pre-training dataset of 3 trillion tokens created by AI2 with full transparency about its composition and processing.

Open page

FineWeb

FineWeb is a 15 trillion token web dataset from HuggingFace with advanced filtering that achieves state-of-the-art quality for web-only training data.

Open page

CulturaX

CulturaX is a massive multilingual dataset covering 167 languages, designed for training language models with broad language coverage.

Open page

Data Deduplication

Data deduplication removes duplicate and near-duplicate documents from training data to improve efficiency and reduce model bias.

Open page

Data Filtering

Data filtering applies rules and classifiers to remove low-quality, harmful, or irrelevant content from LLM training datasets.

Open page

Quality Filtering

Quality filtering uses heuristics and classifiers to score and select high-quality text for language model training.

Open page

Decontamination

Decontamination removes benchmark data from training sets to ensure evaluation scores reflect genuine model capability rather than memorization.

Open page

Toxicity Filtering

Toxicity filtering removes harmful, offensive, and unsafe content from training data to reduce the generation of toxic language model outputs.

Open page

Model Hosting

Model hosting is the infrastructure and services for deploying language models so they can serve inference requests at scale.

Open page

Model API

A model API provides programmatic access to a language model through HTTP endpoints, enabling applications to send prompts and receive responses.

Open page

Inference Cost

Inference cost is the computational expense of generating responses from a language model, measured in cost per token or cost per request.

Open page

Cost per Token

Cost per token is the price charged for each token processed by a language model API, typically different for input and output tokens.

Open page

Latency Optimization

Latency optimization reduces the time between sending a request to a language model and receiving the response or first token.

Open page

Throughput Optimization

Throughput optimization maximizes the number of tokens or requests a language model deployment can process per second.

Open page

Model Compression

Model compression reduces the size and computational requirements of a language model while preserving as much capability as possible.

Open page

Weight Sharing

Weight sharing reuses the same parameters across different parts of a model to reduce total parameter count and memory usage.

Open page

Model Sharding

Model sharding splits a language model across multiple GPUs or devices, enabling deployment of models too large for a single device.

Open page

Model Offloading

Model offloading stores parts of a model in CPU RAM or disk, loading them to GPU only when needed to enable running models on limited hardware.

Open page

CPU Inference

CPU inference runs language model computations on a CPU rather than a GPU, enabling deployment without specialized hardware at reduced speed.

Open page

GPU Inference

GPU inference uses graphics processing units to run language model computations, providing the parallel processing power needed for fast AI responses.

Open page

Edge Deployment

Edge deployment runs language models on local devices like phones and laptops rather than cloud servers, enabling offline and private AI.

Open page

Chatbot (LLM-Powered)

An LLM-powered chatbot uses large language models to understand natural language and generate contextual, human-like conversational responses.

Open page

Code Assistant

A code assistant is an AI tool powered by language models that helps developers write, debug, explain, and review code.

Open page

Writing Assistant

A writing assistant is an AI tool that helps users draft, edit, rewrite, and improve written content using language model capabilities.

Open page

Page 11 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.