Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

Prompt Engineering

Prompt engineering is the practice of crafting effective instructions and context for AI models to get better, more accurate, and more useful responses.

Open page

Hallucination

In AI, hallucination refers to when a model generates information that sounds plausible but is factually incorrect or made up.

Open page

Context Window

The context window is the maximum amount of text (measured in tokens) that an AI model can process in a single request, including both input and output.

Open page

Token

A token is a unit of text that AI models process, typically representing about 4 characters or three-quarters of a word in English.

Open page

Temperature

Temperature is a setting that controls how random or creative AI responses are, with lower values being more focused and higher values being more varied.

Open page

Foundation Model

A foundation model is a large AI model trained on broad data that can be adapted to many downstream tasks through fine-tuning or prompting.

Open page

Base Model

A base model is the raw pre-trained version of a language model before any fine-tuning or alignment, trained only on next-token prediction.

Open page

Instruct Model

An instruct model is a language model fine-tuned to follow user instructions and produce helpful, direct responses to queries.

Open page

Chat Model

A chat model is a language model optimized for multi-turn conversational interactions, maintaining context across back-and-forth exchanges.

Open page

GPT

GPT (Generative Pre-trained Transformer) is a family of large language models developed by OpenAI that generate human-like text using transformer architecture.

Open page

Claude

Claude is a family of AI assistants developed by Anthropic, designed with a focus on safety, helpfulness, and harmlessness using constitutional AI techniques.

Open page

Gemini

Gemini is a family of multimodal AI models developed by Google DeepMind, designed to natively understand and generate text, images, code, and audio.

Open page

Llama

Llama is a family of open-weight large language models released by Meta, enabling researchers and developers to run and fine-tune capable models locally.

Open page

Mistral

Mistral is a family of efficient open-weight language models from Mistral AI, known for strong performance relative to their parameter count.

Open page

Open-Source Model

An open-source model is an AI model whose code, architecture, and often training data are publicly available for anyone to use, modify, and distribute.

Open page

Open-Weight Model

An open-weight model is an AI model whose trained parameters are publicly released, allowing anyone to run and fine-tune it without full training transparency.

Open page

Proprietary Model

A proprietary model is an AI model whose architecture, weights, and training data are kept private, accessible only through paid APIs or products.

Open page

Small Language Model

A small language model (SLM) is a compact AI model with fewer parameters that runs efficiently on limited hardware while still handling many practical tasks.

Open page

Multimodal Model

A multimodal model is an AI model that can process and generate content across multiple types of data, such as text, images, audio, and video.

Open page

Code Model

A code model is a language model specifically trained or fine-tuned on source code to excel at code generation, completion, debugging, and explanation.

Open page

Reasoning Model

A reasoning model is an AI model designed to solve complex problems through step-by-step logical reasoning, often using chain-of-thought techniques.

Open page

Vision-Language Model

A vision-language model (VLM) is an AI model that jointly understands images and text, enabling tasks like image captioning, visual Q&A, and document analysis.

Open page

Tokenization

Tokenization is the process of breaking text into smaller units called tokens that language models can process numerically.

Open page

Byte-Pair Encoding

Byte-Pair Encoding (BPE) is a tokenization algorithm that iteratively merges the most frequent pairs of characters or subwords to build a vocabulary.

Open page

WordPiece

WordPiece is a subword tokenization algorithm developed by Google that uses likelihood-based merging to build vocabularies, notably used in BERT.

Open page

SentencePiece

SentencePiece is a language-independent tokenization library that treats text as raw bytes, enabling consistent tokenization across any language or script.

Open page

Vocabulary

In LLM context, vocabulary is the fixed set of all tokens a model can recognize and generate, typically ranging from 30,000 to 100,000 entries.

Open page

Special Token

A special token is a reserved token in a language model vocabulary that serves a structural purpose, such as marking message boundaries or end of text.

Open page

Tiktoken

Tiktoken is a fast tokenization library by OpenAI used to count and encode tokens for GPT models, essential for managing context windows and costs.

Open page

Token Limit

A token limit is the maximum number of tokens a model can process in a single request, encompassing both input tokens and generated output tokens.

Open page

Top-p

Top-p (nucleus sampling) is a decoding parameter that limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p.

Open page

Top-k

Top-k is a decoding parameter that restricts token selection to the k most probable next tokens, reducing randomness in text generation.

Open page

Nucleus Sampling

Nucleus sampling is a text generation method that selects tokens from the dynamic nucleus of highest-probability tokens summing to a threshold, also known as top-p.

Open page

Greedy Decoding

Greedy decoding is a text generation strategy that always selects the single most probable next token, producing deterministic but often repetitive output.

Open page

Repetition Penalty

Repetition penalty is a generation parameter that reduces the probability of tokens that have already appeared, preventing the model from repeating itself.

Open page

Frequency Penalty

Frequency penalty is a generation parameter that reduces token probability proportionally to how often that token has already appeared in the output.

Open page

Presence Penalty

Presence penalty is a generation parameter that reduces token probability if that token has appeared at all in the output, regardless of how many times.

Open page

Stop Sequence

A stop sequence is a string that, when generated by the model, causes text generation to immediately halt and return the response.

Open page

Max Tokens

Max tokens is a parameter that sets the upper limit on how many tokens the model can generate in its response, controlling output length.

Open page

Sampling

Sampling is the process of selecting the next token from a probability distribution during text generation, introducing controlled randomness into outputs.

Open page

Deterministic Generation

Deterministic generation produces identical output for identical input by eliminating randomness, typically achieved by setting temperature to zero.

Open page

Beam Search

Beam search is a decoding algorithm that explores multiple candidate sequences in parallel, keeping the top-scoring options at each step.

Open page

Contrastive Search

Contrastive search is a decoding method that balances token probability with diversity by penalizing tokens too similar to previously generated ones.

Open page

Streaming

Streaming is a technique that sends model output tokens to the client as they are generated, providing real-time progressive display instead of waiting for full completion.

Open page

Few-Shot Prompting

Few-shot prompting is a technique where examples of desired input-output pairs are included in the prompt to guide the model toward the expected behavior.

Open page

Zero-Shot Prompting

Zero-shot prompting is asking a language model to perform a task with just instructions and no examples, relying on its pre-trained knowledge.

Open page

Tree-of-Thought

Tree-of-thought prompting extends chain-of-thought by exploring multiple reasoning paths simultaneously and selecting the best one.

Open page

ReAct Prompting

ReAct is a prompting framework that interleaves reasoning and acting, allowing language models to think about what to do and then take actions like tool use.

Open page

Page 6 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.