Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

Self-Consistency

Self-consistency is a prompting technique that generates multiple reasoning paths for the same problem and selects the most common final answer.

Open page

Prompt Chaining

Prompt chaining is a technique that breaks complex tasks into sequential steps, where each prompt builds on the output of the previous one.

Open page

Prompt Template

A prompt template is a reusable prompt structure with placeholder variables that gets filled with specific data at runtime for consistent AI interactions.

Open page

Role Prompting

Role prompting assigns a specific persona or expertise to a language model, causing it to respond as if it were that character or specialist.

Open page

Meta-Prompting

Meta-prompting uses a language model to generate, evaluate, or improve prompts, automating the prompt engineering process itself.

Open page

Prompt Injection

Prompt injection is a security vulnerability where malicious user input overrides system prompt instructions, causing the model to behave unexpectedly.

Open page

Jailbreaking

Jailbreaking is the practice of crafting prompts that bypass AI safety guardrails and alignment, making the model produce outputs it was trained to refuse.

Open page

Prompt Compression

Prompt compression reduces the token count of prompts while preserving essential meaning, fitting more context into limited context windows.

Open page

Pre-training

Pre-training is the initial phase of training a language model on vast amounts of text data to learn general language understanding and generation capabilities.

Open page

Next-Token Prediction

Next-token prediction is the core training objective of most LLMs, where the model learns to predict the most likely next token in a sequence of text.

Open page

Supervised Fine-Tuning

Supervised fine-tuning (SFT) trains a pre-trained model on labeled input-output pairs to specialize it for specific tasks or improve its response quality.

Open page

RLHF

RLHF (Reinforcement Learning from Human Feedback) is a training technique that aligns AI models with human preferences using feedback from human evaluators.

Open page

DPO

DPO (Direct Preference Optimization) is a simplified alternative to RLHF that directly optimizes language models on preference data without a separate reward model.

Open page

Reward Model

A reward model is a neural network trained to predict human preferences, scoring language model outputs to guide alignment training via RLHF.

Open page

Preference Data

Preference data consists of human comparisons between AI responses, indicating which response is better, used to train reward models and align language models.

Open page

Alignment

Alignment is the process of ensuring AI models behave in accordance with human values, intentions, and safety requirements.

Open page

Scalable Oversight

Scalable oversight refers to techniques for supervising AI systems effectively even as they become more capable than human evaluators at specific tasks.

Open page

RLAIF

RLAIF (Reinforcement Learning from AI Feedback) replaces human evaluators with AI models to generate preference data for alignment training.

Open page

PPO

PPO (Proximal Policy Optimization) is a reinforcement learning algorithm commonly used in RLHF to optimize language models based on reward model scores.

Open page

Human Feedback

Human feedback is the evaluative input from people used to train and align AI models, typically through preference comparisons or quality ratings.

Open page

LoRA

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that trains small adapter matrices instead of modifying all model weights.

Open page

QLoRA

QLoRA combines quantization with LoRA, enabling fine-tuning of large models on a single consumer GPU by using 4-bit quantized base weights.

Open page

Adapter

An adapter is a small, trainable module inserted into a pre-trained model that allows task-specific customization without modifying the original weights.

Open page

Prefix Tuning

Prefix tuning prepends trainable continuous vectors to model input, learning task-specific prefixes that steer the frozen model toward desired behavior.

Open page

Prompt Tuning

Prompt tuning learns soft prompt embeddings prepended to model input, optimizing continuous vectors that replace hand-crafted text prompts.

Open page

Parameter-Efficient Fine-Tuning

Parameter-efficient fine-tuning (PEFT) encompasses methods that adapt pre-trained models by training only a small fraction of parameters, reducing cost and compute.

Open page

Full Fine-Tuning

Full fine-tuning updates all parameters of a pre-trained model on new data, providing maximum customization but requiring significant compute resources.

Open page

Layer Freezing

Layer freezing is a fine-tuning strategy that keeps certain model layers fixed while training others, balancing customization with preserved general knowledge.

Open page

Continued Pre-training

Continued pre-training extends the original pre-training process on domain-specific data, giving the model deep knowledge in a specialized area.

Open page

DoRA

DoRA (Weight-Decomposed Low-Rank Adaptation) improves on LoRA by separately adapting the magnitude and direction of weight matrices for better fine-tuning quality.

Open page

Long Context

Long context refers to language models capable of processing very large inputs, typically 100K tokens or more, enabling analysis of entire documents or codebases.

Open page

Sliding Window Attention

Sliding window attention limits each token to attend only to a fixed window of nearby tokens, reducing computation while maintaining local context.

Open page

In-Context Learning

In-context learning is the ability of language models to learn new tasks from examples or instructions provided in the prompt, without any parameter updates.

Open page

Context Extension

Context extension refers to techniques that increase a model pre-trained context window beyond its original training length without full retraining.

Open page

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) combines information retrieval with text generation, letting AI answer from external knowledge rather than just training data.

Open page

Paged Attention

Paged attention manages KV cache memory in non-contiguous blocks inspired by OS virtual memory, dramatically reducing waste and enabling more concurrent requests.

Open page

Scaling Law

Scaling laws are empirical relationships showing how model performance predictably improves with increases in model size, training data, and compute.

Open page

Chinchilla Scaling

Chinchilla scaling refers to the optimal ratio of model parameters to training tokens, showing most models were under-trained relative to their size.

Open page

Emergent Ability

An emergent ability is a capability that appears in large language models only above a certain scale threshold, absent in smaller models.

Open page

Mixture of Experts

Mixture of Experts (MoE) is a model architecture that uses multiple specialized sub-networks, routing each input to only a subset for efficient computation.

Open page

Sparse Model

A sparse model activates only a fraction of its total parameters for each input, achieving high capacity with lower computational cost per inference.

Open page

Unigram Tokenizer

A subword tokenization algorithm that starts with a large vocabulary and iteratively prunes it to find the optimal set of subword units.

Open page

Vocab Size

The total number of unique tokens in a language model tokenizer vocabulary, typically ranging from 30,000 to 100,000 or more.

Open page

EOS Token

The End-of-Sequence token is a special token that signals the model to stop generating text.

Open page

BOS Token

The Beginning-of-Sequence token is a special token placed at the start of input to signal the beginning of a new text sequence.

Open page

Pad Token

A special token used to fill shorter sequences to a uniform length so that batches of inputs can be processed together efficiently.

Open page

Mask Token

A special token used in masked language models like BERT that replaces a word so the model can learn to predict it from surrounding context.

Open page

Byte-Level BPE

A variant of byte-pair encoding that operates on raw bytes instead of Unicode characters, enabling tokenization of any text without unknown tokens.

Open page

Page 7 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.