Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

ELU

ELU (Exponential Linear Unit) is an activation function that uses an exponential curve for negative inputs, providing smoother outputs and faster learning than ReLU.

Open page

SELU

SELU (Scaled Exponential Linear Unit) is a self-normalizing activation function that automatically maintains zero mean and unit variance across layers.

Open page

Convolutional Neural Network

A convolutional neural network (CNN) is a deep learning architecture designed for processing grid-like data such as images, using learnable filters to detect spatial patterns.

Open page

Convolution

Convolution is the mathematical operation at the core of CNNs, where a small filter slides across input data to produce a feature map that highlights detected patterns.

Open page

Kernel

A kernel (or filter) in a CNN is a small matrix of learnable weights that slides across input data to detect specific features like edges or textures.

Open page

Feature Map

A feature map is the output produced by applying a convolutional filter to an input, representing where specific features are detected across the spatial dimensions.

Open page

Receptive Field

The receptive field is the region of the input that influences a particular neuron in a CNN, growing larger in deeper layers as features are combined.

Open page

Stride

Stride is the step size by which a convolutional filter moves across the input, controlling the spatial dimensions of the output feature map.

Open page

Padding

Padding adds extra values (typically zeros) around the edges of input data before convolution, controlling the output size and preserving spatial information at borders.

Open page

Pooling

Pooling is a downsampling operation in CNNs that reduces the spatial dimensions of feature maps by aggregating values in local regions.

Open page

Max Pooling

Max pooling is a downsampling technique that selects the maximum value from each local region of a feature map, preserving the most prominent features.

Open page

ResNet

ResNet (Residual Network) is a deep CNN architecture that uses skip connections to enable training of very deep networks by allowing gradients to flow through shortcut paths.

Open page

Recurrent Neural Network

A recurrent neural network (RNN) is a neural network designed for sequential data, maintaining a hidden state that captures information from previous time steps.

Open page

LSTM

LSTM (Long Short-Term Memory) is an RNN architecture that uses gating mechanisms to selectively remember and forget information over long sequences.

Open page

GRU

GRU (Gated Recurrent Unit) is a simplified RNN variant that uses two gates to control information flow, offering similar performance to LSTM with fewer parameters.

Open page

Hidden State

A hidden state is the internal memory vector maintained by a recurrent neural network that encodes information about previous elements in a sequence.

Open page

Sequence-to-Sequence

Sequence-to-sequence (seq2seq) is a neural network architecture that maps an input sequence to an output sequence, enabling tasks like translation and summarization.

Open page

Encoder-Decoder

Encoder-decoder is a neural network architecture pattern where an encoder compresses input into a representation and a decoder generates output from that representation.

Open page

Teacher Forcing

Teacher forcing is a training technique for sequence models where the ground truth output from the previous step is fed as input to the next step, instead of the model prediction.

Open page

Bidirectional RNN

A bidirectional RNN processes a sequence in both forward and backward directions, capturing context from both past and future elements at each position.

Open page

Transformer

The transformer is a neural network architecture based on self-attention that processes all positions in a sequence simultaneously, powering modern language models and AI systems.

Open page

Self-Attention

Self-attention is a mechanism where each element in a sequence computes attention weights over all other elements, capturing contextual relationships regardless of distance.

Open page

Multi-Head Attention

Multi-head attention runs multiple self-attention operations in parallel, allowing the model to jointly attend to information from different representation subspaces.

Open page

Positional Encoding

Positional encoding adds information about the position of each element in a sequence to its representation, since self-attention has no inherent notion of order.

Open page

Feed-Forward Network

The feed-forward network in a transformer is a two-layer MLP applied independently to each position after attention, expanding and compressing the representation.

Open page

Layer Normalization

Layer normalization is a technique that normalizes the inputs across the feature dimension for each individual example, stabilizing and accelerating neural network training.

Open page

Residual Connection

A residual connection adds the input of a layer directly to its output, creating a shortcut path that helps gradients flow through deep networks.

Open page

Causal Attention

Causal attention is a masked form of self-attention that prevents each position from attending to future positions, ensuring autoregressive generation.

Open page

Cross-Attention

Cross-attention is an attention mechanism where queries come from one sequence and keys and values come from a different sequence, enabling information transfer between modalities.

Open page

Rotary Position Embedding

Rotary position embedding (RoPE) encodes position information by rotating query and key vectors in pairs of dimensions, enabling relative position awareness.

Open page

Flash Attention

Flash Attention is an optimized attention algorithm that reduces memory usage and increases speed by computing attention in tiles without materializing the full attention matrix.

Open page

Grouped-Query Attention

Grouped-query attention shares key and value heads across multiple query heads, reducing memory bandwidth during inference while preserving most of multi-head attention quality.

Open page

Backpropagation

Backpropagation is the algorithm that computes gradients of the loss function with respect to each parameter by propagating error signals backward through the network.

Open page

Forward Pass

A forward pass is the computation that takes input data through all layers of a neural network to produce predictions and a loss value.

Open page

Backward Pass

A backward pass propagates the loss gradient from the output back through each layer, computing the gradient of the loss with respect to every parameter.

Open page

Vanishing Gradient

The vanishing gradient problem occurs when gradients become exponentially smaller as they propagate backward through many layers, preventing early layers from learning.

Open page

Exploding Gradient

The exploding gradient problem occurs when gradients grow exponentially during backpropagation, causing unstable training with wildly oscillating or diverging parameter updates.

Open page

Gradient Clipping

Gradient clipping limits the magnitude of gradients during training to prevent exploding gradients and stabilize the optimization process.

Open page

Mixed-Precision Training

Mixed-precision training uses a combination of 16-bit and 32-bit floating-point numbers to reduce memory usage and increase training speed without sacrificing model quality.

Open page

Distributed Training

Distributed training spreads the computation of training a neural network across multiple GPUs or machines to reduce training time and handle models too large for a single device.

Open page

Data Parallelism

Data parallelism is a distributed training strategy that replicates the model on each GPU and partitions the training data, averaging gradients across all replicas.

Open page

Weight Initialization

Weight initialization sets the starting values of neural network parameters before training, with proper initialization being critical for stable gradient flow and convergence.

Open page

Xavier Initialization

Xavier initialization sets weights by sampling from a distribution scaled by the number of input and output neurons, preserving signal variance through layers with symmetric activations.

Open page

He Initialization

He initialization scales initial weights by 2/fan_in to account for ReLU activations zeroing out half the inputs, enabling stable training of deep ReLU networks.

Open page

Dropout

Dropout is a regularization technique that randomly deactivates a fraction of neurons during each training step, preventing co-adaptation and reducing overfitting.

Open page

Weight Decay

Weight decay is a regularization technique that adds a penalty proportional to the magnitude of weights to the loss function, discouraging large weight values.

Open page

Label Smoothing

Label smoothing is a regularization technique that replaces hard one-hot target labels with soft labels that distribute a small probability mass to incorrect classes.

Open page

Mixup

Mixup is a data augmentation and regularization technique that trains on convex combinations of pairs of training examples and their labels.

Open page

Page 3 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.