Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

Character-Level Tokenization

A tokenization approach that treats each individual character as a separate token, producing long sequences but requiring no vocabulary training.

Open page

Merge Rule

A rule in BPE tokenization that specifies which pair of tokens should be merged into a single new token, learned from training data frequency.

Open page

Tokenizer Training

The process of learning a tokenizer vocabulary and rules from a text corpus before the language model itself is trained.

Open page

Sampling Strategy

The method used to select the next token from a probability distribution during text generation, ranging from greedy to highly random approaches.

Open page

Length Penalty

A parameter used in beam search and other decoding methods to control whether the model favors shorter or longer generated sequences.

Open page

Min-p

A dynamic sampling method that filters out tokens with probabilities below a fraction of the most likely token probability.

Open page

Mirostat

An adaptive sampling algorithm that dynamically adjusts the sampling parameters to maintain a target level of surprise (perplexity) in generated text.

Open page

Typical Sampling

A sampling method that selects tokens whose information content is close to the expected information content, filtering out both too-obvious and too-surprising tokens.

Open page

One-Shot Prompting

A prompting technique that provides the model with exactly one example of the desired input-output format before the actual query.

Open page

Reflexion

A prompting framework where the model reflects on its own outputs, identifies errors, and uses that self-feedback to improve subsequent attempts.

Open page

Plan-and-Solve

A prompting strategy that instructs the model to first create a step-by-step plan and then execute each step, improving multi-step reasoning accuracy.

Open page

Least-to-Most Prompting

A prompting technique that breaks complex problems into simpler subproblems, solving them in order from easiest to hardest and building on each result.

Open page

Persona Prompting

A prompting technique that assigns a specific identity, expertise, or personality to the model to shape the style and content of its responses.

Open page

Automatic Prompt Optimization

The use of algorithms and AI to automatically discover, refine, and improve prompts for better LLM performance on specific tasks.

Open page

Directional Stimulus Prompting

A prompting framework that provides small, targeted hints or keywords to guide the model toward a desired output without specifying the full answer.

Open page

Step-Back Prompting

A prompting technique that asks the model to first consider a higher-level or more abstract version of the question before answering the specific query.

Open page

Skeleton-of-Thought

A prompting technique that first generates an outline skeleton of the answer, then expands each point in parallel, reducing end-to-end latency.

Open page

Masked Language Modeling

A pre-training objective where random tokens are masked and the model learns to predict them from surrounding context, used by BERT-style models.

Open page

Causal Language Modeling

A pre-training objective where the model learns to predict the next token given all previous tokens, used by GPT-style generative models.

Open page

Reward Hacking

When an AI model learns to exploit flaws in the reward signal to achieve high scores without actually performing the intended task well.

Open page

GRPO

Group Relative Policy Optimization is a reinforcement learning method that scores groups of model outputs against each other rather than using a separate reward model.

Open page

AdaLoRA

An adaptive variant of LoRA that dynamically allocates the rank of low-rank adaptation matrices based on the importance of each weight matrix.

Open page

LongLoRA

An efficient fine-tuning method that extends the context length of pre-trained models using shifted sparse attention and LoRA, requiring minimal additional compute.

Open page

P-Tuning

A parameter-efficient method that prepends learnable continuous embeddings to the input, trained with an LSTM-based prompt encoder for better optimization.

Open page

IA3

Infused Adapter by Inhibiting and Amplifying Inner Activations, a PEFT method that scales model activations with learned vectors, using even fewer parameters than LoRA.

Open page

BitFit

A parameter-efficient fine-tuning method that only updates the bias terms in a pre-trained model, leaving all weight matrices frozen.

Open page

Context Length

The number of tokens a model can process in a single forward pass, synonymous with context window size.

Open page

RoPE Scaling

Techniques for extending the context length of models using Rotary Position Embeddings by modifying the frequency or interpolation of position encodings.

Open page

YaRN

Yet another RoPE extensioN, an advanced method for extending model context length that combines NTK-aware interpolation with attention scaling.

Open page

ALiBi

Attention with Linear Biases, a position encoding method that adds a linear distance-based penalty to attention scores, enabling length generalization.

Open page

StreamingLLM

A framework that enables LLMs to handle infinite-length sequences by retaining attention sinks and a sliding window of recent tokens.

Open page

KV Cache Compression

Techniques that reduce the memory footprint of the key-value cache during inference, enabling longer sequences and higher throughput.

Open page

Multi-Query Attention

An attention variant where all attention heads share a single set of key and value projections while maintaining separate queries, dramatically reducing KV cache size.

Open page

Flash Decoding

An optimized algorithm for the decoding phase of LLM inference that parallelizes attention computation across the KV cache sequence dimension.

Open page

Dynamic Batching

An inference optimization that groups incoming requests into batches dynamically based on arrival time, maximizing GPU utilization.

Open page

Model Size

The total number of parameters in a neural network, typically measured in billions for modern LLMs, determining capacity and computational requirements.

Open page

Parameter Count

The total number of trainable weights and biases in a neural network, serving as a primary measure of model capacity and complexity.

Open page

Dense Model

A neural network where all parameters are active for every input, in contrast to sparse models where only a subset of parameters is used per token.

Open page

Top-k Routing

The mechanism in Mixture of Experts models that selects the top-k most relevant experts for each input token based on a learned routing function.

Open page

Expert Parallelism

A model parallelism strategy where different experts in a Mixture of Experts model are placed on different GPUs, enabling efficient distributed inference.

Open page

Load Balancing Loss

An auxiliary training loss that encourages even distribution of tokens across experts in Mixture of Experts models, preventing expert collapse.

Open page

Over-training

Deliberately training a model on more data than is compute-optimal according to scaling laws, to produce a smaller model that is cheaper to serve at inference time.

Open page

GPT-4o Mini

A smaller, faster, and cheaper variant of GPT-4o designed for high-volume tasks that need good quality at lower cost.

Open page

o1

OpenAI's reasoning model that uses extended "thinking" before responding, achieving breakthrough performance on math, coding, and science tasks.

Open page

o3

OpenAI's advanced reasoning model succeeding o1, with improved reasoning capabilities and efficiency across math, science, and coding tasks.

Open page

Claude 3 Haiku

The fastest and most compact model in Anthropic's Claude 3 family, optimized for speed and cost-efficiency in high-volume applications.

Open page

Claude 3 Sonnet

The balanced mid-tier model in Anthropic's Claude 3 family, offering strong performance with good speed and reasonable cost.

Open page

Claude 3 Opus

The most capable model in Anthropic's Claude 3 family, excelling at complex reasoning, nuanced analysis, and sophisticated generation tasks.

Open page

Page 8 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.