AI glossary for content assistants
Plain-English definitions of 13,917 AI terms for branded assistant teams.
Search glossary terms
13,917 glossary pages match your filters.
Category
Browse by letter
Glossary
13,917 terms. Open one for definitions and related concepts.
Prompt Engineering
Prompt engineering is the practice of crafting effective instructions and context for AI models to get better, more accurate, and more useful responses.
Hallucination
In AI, hallucination refers to when a model generates information that sounds plausible but is factually incorrect or made up.
Context Window
The context window is the maximum amount of text (measured in tokens) that an AI model can process in a single request, including both input and output.
Token
A token is a unit of text that AI models process, typically representing about 4 characters or three-quarters of a word in English.
Temperature
Temperature is a setting that controls how random or creative AI responses are, with lower values being more focused and higher values being more varied.
Foundation Model
A foundation model is a large AI model trained on broad data that can be adapted to many downstream tasks through fine-tuning or prompting.
Base Model
A base model is the raw pre-trained version of a language model before any fine-tuning or alignment, trained only on next-token prediction.
Instruct Model
An instruct model is a language model fine-tuned to follow user instructions and produce helpful, direct responses to queries.
Chat Model
A chat model is a language model optimized for multi-turn conversational interactions, maintaining context across back-and-forth exchanges.
GPT
GPT (Generative Pre-trained Transformer) is a family of large language models developed by OpenAI that generate human-like text using transformer architecture.
Claude
Claude is a family of AI assistants developed by Anthropic, designed with a focus on safety, helpfulness, and harmlessness using constitutional AI techniques.
Gemini
Gemini is a family of multimodal AI models developed by Google DeepMind, designed to natively understand and generate text, images, code, and audio.
Llama
Llama is a family of open-weight large language models released by Meta, enabling researchers and developers to run and fine-tune capable models locally.
Mistral
Mistral is a family of efficient open-weight language models from Mistral AI, known for strong performance relative to their parameter count.
Open-Source Model
An open-source model is an AI model whose code, architecture, and often training data are publicly available for anyone to use, modify, and distribute.
Open-Weight Model
An open-weight model is an AI model whose trained parameters are publicly released, allowing anyone to run and fine-tune it without full training transparency.
Proprietary Model
A proprietary model is an AI model whose architecture, weights, and training data are kept private, accessible only through paid APIs or products.
Small Language Model
A small language model (SLM) is a compact AI model with fewer parameters that runs efficiently on limited hardware while still handling many practical tasks.
Multimodal Model
A multimodal model is an AI model that can process and generate content across multiple types of data, such as text, images, audio, and video.
Code Model
A code model is a language model specifically trained or fine-tuned on source code to excel at code generation, completion, debugging, and explanation.
Reasoning Model
A reasoning model is an AI model designed to solve complex problems through step-by-step logical reasoning, often using chain-of-thought techniques.
Vision-Language Model
A vision-language model (VLM) is an AI model that jointly understands images and text, enabling tasks like image captioning, visual Q&A, and document analysis.
Tokenization
Tokenization is the process of breaking text into smaller units called tokens that language models can process numerically.
Byte-Pair Encoding
Byte-Pair Encoding (BPE) is a tokenization algorithm that iteratively merges the most frequent pairs of characters or subwords to build a vocabulary.
WordPiece
WordPiece is a subword tokenization algorithm developed by Google that uses likelihood-based merging to build vocabularies, notably used in BERT.
SentencePiece
SentencePiece is a language-independent tokenization library that treats text as raw bytes, enabling consistent tokenization across any language or script.
Vocabulary
In LLM context, vocabulary is the fixed set of all tokens a model can recognize and generate, typically ranging from 30,000 to 100,000 entries.
Special Token
A special token is a reserved token in a language model vocabulary that serves a structural purpose, such as marking message boundaries or end of text.
Tiktoken
Tiktoken is a fast tokenization library by OpenAI used to count and encode tokens for GPT models, essential for managing context windows and costs.
Token Limit
A token limit is the maximum number of tokens a model can process in a single request, encompassing both input tokens and generated output tokens.
Top-p
Top-p (nucleus sampling) is a decoding parameter that limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p.
Top-k
Top-k is a decoding parameter that restricts token selection to the k most probable next tokens, reducing randomness in text generation.
Nucleus Sampling
Nucleus sampling is a text generation method that selects tokens from the dynamic nucleus of highest-probability tokens summing to a threshold, also known as top-p.
Greedy Decoding
Greedy decoding is a text generation strategy that always selects the single most probable next token, producing deterministic but often repetitive output.
Repetition Penalty
Repetition penalty is a generation parameter that reduces the probability of tokens that have already appeared, preventing the model from repeating itself.
Frequency Penalty
Frequency penalty is a generation parameter that reduces token probability proportionally to how often that token has already appeared in the output.
Presence Penalty
Presence penalty is a generation parameter that reduces token probability if that token has appeared at all in the output, regardless of how many times.
Stop Sequence
A stop sequence is a string that, when generated by the model, causes text generation to immediately halt and return the response.
Max Tokens
Max tokens is a parameter that sets the upper limit on how many tokens the model can generate in its response, controlling output length.
Sampling
Sampling is the process of selecting the next token from a probability distribution during text generation, introducing controlled randomness into outputs.
Deterministic Generation
Deterministic generation produces identical output for identical input by eliminating randomness, typically achieved by setting temperature to zero.
Beam Search
Beam search is a decoding algorithm that explores multiple candidate sequences in parallel, keeping the top-scoring options at each step.
Contrastive Search
Contrastive search is a decoding method that balances token probability with diversity by penalizing tokens too similar to previously generated ones.
Streaming
Streaming is a technique that sends model output tokens to the client as they are generated, providing real-time progressive display instead of waiting for full completion.
Few-Shot Prompting
Few-shot prompting is a technique where examples of desired input-output pairs are included in the prompt to guide the model toward the expected behavior.
Zero-Shot Prompting
Zero-shot prompting is asking a language model to perform a task with just instructions and no examples, relying on its pre-trained knowledge.
Tree-of-Thought
Tree-of-thought prompting extends chain-of-thought by exploring multiple reasoning paths simultaneously and selecting the best one.
ReAct Prompting
ReAct is a prompting framework that interleaves reasoning and acting, allowing language models to think about what to do and then take actions like tool use.
Turn owned content into answers
Use InsertChat to launch a branded assistant visitors can ask directly.
7-day free trial · No card required
Try the FAQ like a visitor.
Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.
InsertChat
Interactive FAQ
Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.
Product FAQ
What is InsertChat?
InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.
How does InsertChat use my website content?
Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.
Can I control the assistant's tone and sources?
Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.
How does InsertChat stay accurate?
Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.
Can it collect leads or route support questions?
Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.
Can I control how the assistant behaves?
Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.
Which AI models can I use?
InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.
Can I pick different models for different workflows?
Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.
Where can I deploy an assistant?
Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.
Do I need coding skills?
No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.
Can I customize the branding and UI?
Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.
Can I use my own domain?
Yes. Custom domains are supported, typically via enterprise options.
Does InsertChat support voice?
Yes. Voice dictation and text-to-speech let users speak instead of type.
Does InsertChat support vision?
Yes. Enable vision for assistants when images help clarify a request or context.
What tools and integrations are supported?
Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.
Can I control which tools the assistant is allowed to use?
Yes. Tool access is controlled per assistant so you enable only what you need.
Can the agent hand off to a human?
Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.
Do you provide analytics?
Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.
Is it mobile friendly?
Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.
What's the fastest path to a successful deployment?
Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.
What is the fastest way to get started?
Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.