Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

Open-Source LLM

A language model whose weights and often training code are publicly released, enabling self-hosting, modification, and community development.

Open page

Context Stuffing

The practice of filling the context window with as much relevant information as possible to maximize the model ability to generate accurate responses.

Open page

Lost in the Middle

A phenomenon where LLMs attend strongly to the beginning and end of long contexts but struggle to use information positioned in the middle.

Open page

Cosine Similarity

A mathematical measure of similarity between two vectors based on the angle between them, widely used to compare embeddings in semantic search.

Open page

Knowledge Graph

A structured representation of entities and their relationships, used to enhance LLM knowledge retrieval with structured, relational information.

Open page

Model Collapse

A degradation phenomenon where models trained on AI-generated data progressively lose diversity and quality across successive generations.

Open page

Hallucination Detection

Techniques for automatically identifying when an AI model generates false or unsupported information in its responses.

Open page

Few-Shot Learning

The ability of a model to learn and perform a new task from just a handful of examples provided in the prompt context.

Open page

MMLU

MMLU (Massive Multitask Language Understanding) is a benchmark that tests language models across 57 academic subjects, from STEM to humanities.

Open page

MMLU-Pro

MMLU-Pro is a harder, more rigorous version of MMLU with ten answer choices and improved question quality to better differentiate frontier models.

Open page

HellaSwag

HellaSwag is a benchmark that tests common-sense reasoning by asking models to choose the most plausible continuation of a scenario.

Open page

ARC Challenge

ARC Challenge is a benchmark of grade-school science questions that require reasoning beyond simple retrieval to answer correctly.

Open page

WinoGrande

WinoGrande is a large-scale benchmark testing common-sense reasoning through pronoun resolution in carefully crafted sentence pairs.

Open page

TruthfulQA

TruthfulQA is a benchmark that measures whether language models generate truthful answers rather than reproducing common misconceptions.

Open page

GSM8K

GSM8K is a benchmark of 8,500 grade-school math word problems that test multi-step arithmetic reasoning in language models.

Open page

MATH Benchmark

MATH is a benchmark of 12,500 competition-level mathematics problems testing advanced reasoning across algebra, geometry, and number theory.

Open page

HumanEval

HumanEval is a benchmark of 164 hand-written Python programming problems that test code generation ability in language models.

Open page

MBPP

MBPP (Mostly Basic Python Programs) is a benchmark of 974 crowd-sourced Python programming tasks testing fundamental code generation.

Open page

MT-Bench

MT-Bench is a benchmark that evaluates multi-turn conversation ability using GPT-4 as an automated judge across eight categories.

Open page

Chatbot Arena

Chatbot Arena is a crowdsourced platform where users compare anonymous LLM responses side-by-side, producing Elo-based rankings.

Open page

Elo Rating

Elo rating is a scoring system adapted from chess that ranks language models based on pairwise comparison outcomes in evaluation arenas.

Open page

AlpacaEval

AlpacaEval is an automated evaluation benchmark that uses LLMs to judge model responses against a reference model on 805 instructions.

Open page

LMSYS

LMSYS is a research organization that created Chatbot Arena and maintains the most widely referenced open LLM leaderboard.

Open page

IFEval

IFEval is a benchmark that measures how well language models follow specific formatting and constraint instructions in their responses.

Open page

BBH

BBH (BIG-Bench Hard) is a curated subset of 23 challenging tasks from BIG-Bench where language models previously performed below average humans.

Open page

BIG-Bench

BIG-Bench is a collaborative benchmark with over 200 diverse tasks designed to probe the capabilities and limitations of language models.

Open page

GPQA

GPQA (Graduate-Level Google-Proof QA) is a benchmark of expert-level questions where even domain specialists with internet access struggle.

Open page

SuperGLUE

SuperGLUE is a benchmark suite of eight difficult language understanding tasks that succeeded GLUE as the standard NLU evaluation.

Open page

GLUE

GLUE (General Language Understanding Evaluation) is a benchmark suite of nine NLU tasks that became the first standard for evaluating language models.

Open page

SQuAD

SQuAD (Stanford Question Answering Dataset) is a reading comprehension benchmark where models extract answers from Wikipedia passages.

Open page

Natural Questions

Natural Questions is a QA benchmark using real Google search queries paired with Wikipedia articles, testing realistic information seeking.

Open page

TriviaQA

TriviaQA is a QA benchmark of trivia questions with evidence documents, testing both factual knowledge and reading comprehension.

Open page

DROP

DROP is a reading comprehension benchmark requiring discrete reasoning operations like counting, sorting, and arithmetic over text passages.

Open page

LAMBADA

LAMBADA is a benchmark testing word prediction where the last word of a passage can only be guessed with broad context understanding.

Open page

BoolQ

BoolQ is a yes/no question answering benchmark using naturally occurring questions paired with Wikipedia passages.

Open page

CommonsenseQA

CommonsenseQA is a benchmark of multiple-choice questions that require everyday common-sense knowledge to answer correctly.

Open page

Arena Hard

Arena Hard is an automated benchmark of 500 challenging prompts derived from Chatbot Arena that predicts human preference rankings.

Open page

LiveBench

LiveBench is a continuously updated benchmark using fresh questions to prevent contamination, ensuring models cannot memorize answers.

Open page

Evaluation Harness

An evaluation harness is a standardized framework for running benchmarks on language models with consistent settings and scoring.

Open page

Contamination

Contamination occurs when benchmark test data leaks into model training data, inflating evaluation scores beyond genuine capability.

Open page

Leakage

Leakage is the unintended exposure of test or evaluation data to a model during training, compromising the validity of results.

Open page

Ceiling Effect

A ceiling effect occurs when a benchmark becomes too easy for top models, losing its ability to differentiate between them.

Open page

Saturation

Saturation describes when model performance on a benchmark plateaus near the maximum, reducing the benchmark's evaluative usefulness.

Open page

Human Baseline

A human baseline is the performance level achieved by human evaluators on a benchmark, used as a reference point for model comparison.

Open page

Inter-Annotator Agreement

Inter-annotator agreement measures how consistently multiple human evaluators rate or label the same AI outputs.

Open page

Automatic Evaluation

Automatic evaluation uses algorithms or AI judges to assess language model outputs without requiring human annotators.

Open page

Human Evaluation

Human evaluation uses human judges to assess language model outputs for quality, accuracy, helpfulness, and safety.

Open page

Preference Evaluation

Preference evaluation compares model outputs by asking judges to select the preferred response from two or more options.

Open page

Page 10 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.