Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

AI Image Tagging

AI image tagging automatically assigns descriptive labels, keywords, and metadata to images, enabling efficient organization, search, and content moderation at scale.

Open page

Text-Guided Image Editing

Text-guided image editing uses natural language instructions to modify existing images, changing specific attributes, adding elements, or transforming content based on text prompts.

Open page

AI Microscopy

AI microscopy applies deep learning to automate image analysis of biological and material samples captured by optical, electron, and fluorescence microscopes.

Open page

Pathology AI

Pathology AI uses deep learning to analyze whole-slide digital pathology images for cancer detection, grading, biomarker quantification, and prognosis prediction.

Open page

Visual Dialog

Visual dialog AI engages in multi-turn conversations about image content, answering follow-up questions that require tracking conversation history and image context together.

Open page

3D Semantic Segmentation

3D semantic segmentation assigns semantic class labels (person, car, building) to every point in 3D point clouds or voxel grids, enabling spatial understanding of 3D scenes.

Open page

Retail Computer Vision

Retail computer vision applies AI image analysis to physical stores for shelf monitoring, checkout automation, customer behavior analytics, and loss prevention.

Open page

Video Emotion Recognition

Video emotion recognition analyzes facial expressions, body language, and vocal cues across video frames to identify emotional states, sentiment, and engagement levels.

Open page

Speech Recognition

Speech recognition is the AI technology that converts spoken language into text, enabling machines to understand and process human speech.

Open page

Automatic Speech Recognition

Automatic Speech Recognition (ASR) is the computational process of converting audio speech signals into text transcriptions using machine learning models.

Open page

ASR

ASR is the abbreviation for Automatic Speech Recognition, the technology that converts spoken audio into written text using AI models.

Open page

STT

STT stands for Speech-to-Text, the technology and services that convert spoken audio into written text transcriptions.

Open page

Speaker Recognition

Speaker recognition identifies or verifies a person's identity based on their voice characteristics, distinguishing who is speaking rather than what they are saying.

Open page

Speaker Diarization

Speaker diarization segments audio into speaker-homogeneous regions, determining who spoke when in a multi-speaker recording.

Open page

Voice Activity Detection

Voice Activity Detection (VAD) identifies segments of audio that contain human speech versus silence, noise, or music, serving as a preprocessing step for speech systems.

Open page

Real-time Transcription

Real-time transcription converts speech to text as it is spoken, producing live text output with minimal delay for applications like live captioning and voice assistants.

Open page

Keyword Spotting

Keyword spotting detects specific words or phrases in an audio stream without performing full speech recognition, used for triggers, commands, and monitoring.

Open page

Wake Word Detection

Wake word detection listens continuously for a specific trigger phrase like 'Hey Siri' or 'Alexa' to activate a voice assistant, running efficiently on-device.

Open page

Whisper

Whisper is OpenAI's open-source speech recognition model that supports 99 languages, automatic language detection, translation, and timestamp generation.

Open page

Deepgram

Deepgram is a speech AI platform providing fast, accurate speech-to-text, text-to-speech, and audio intelligence APIs optimized for real-time and enterprise applications.

Open page

AssemblyAI

AssemblyAI is a speech AI platform offering transcription, speaker diarization, content moderation, and audio intelligence through developer-friendly APIs.

Open page

Google Speech-to-Text

Google Speech-to-Text is Google Cloud's speech recognition service supporting 125+ languages with real-time streaming, batch processing, and custom model adaptation.

Open page

Wav2Vec 2.0

Wav2Vec 2.0 is a self-supervised speech representation model from Meta that learns from unlabeled audio, enabling speech recognition with very little labeled training data.

Open page

TTS

TTS stands for Text-to-Speech, the technology that converts written text into spoken audio using AI voice synthesis.

Open page

Speech Synthesis

Speech synthesis is the artificial production of human speech, encompassing TTS systems, voice generation, and the creation of spoken audio from various input formats.

Open page

Voice Cloning

Voice cloning creates a synthetic replica of a specific person's voice using AI, enabling generation of speech in that person's voice from any text input.

Open page

Voice Conversion

Voice conversion transforms the voice characteristics of spoken audio from one speaker to sound like another speaker while preserving the linguistic content.

Open page

Neural TTS

Neural TTS uses deep learning models to generate highly natural synthetic speech, replacing older concatenative and parametric approaches with end-to-end learned systems.

Open page

ElevenLabs

ElevenLabs is an AI voice technology company offering high-quality text-to-speech, voice cloning, and audio generation through APIs and consumer products.

Open page

Amazon Polly

Amazon Polly is AWS's text-to-speech service offering dozens of voices across 30+ languages with Neural TTS technology and SSML control for enterprise applications.

Open page

Bark

Bark is an open-source text-to-audio model from Suno that generates highly expressive speech with laughter, breathing, music, and sound effects alongside spoken words.

Open page

VALL-E

VALL-E is a neural codec language model from Microsoft that generates speech from text using just 3 seconds of reference audio for voice cloning.

Open page

XTTS

XTTS is an open-source multilingual text-to-speech model from Coqui AI that supports voice cloning and 17 languages with a single model.

Open page

Voice Assistant

A voice assistant is an AI system that understands spoken commands and responds with voice, combining speech recognition, language understanding, and text-to-speech.

Open page

Conversational IVR

Conversational IVR replaces traditional phone menu trees with natural language voice interaction, allowing callers to state their needs in natural speech.

Open page

Voice User Interface

A Voice User Interface (VUI) is a speech-based interface that allows users to interact with devices and applications through spoken commands and natural conversation.

Open page

Voice Commerce

Voice commerce enables purchasing products and services through voice-activated devices and assistants, allowing hands-free shopping and transactions.

Open page

Voice Analytics

Voice analytics uses AI to extract insights from voice conversations, analyzing speech patterns, sentiment, keywords, and conversational dynamics.

Open page

Call Transcription

Call transcription converts phone call audio into text transcripts, typically including speaker separation, timestamps, and additional analysis like sentiment and topics.

Open page

Call Summarization

Call summarization uses AI to generate concise summaries of phone conversations, capturing key topics, action items, decisions, and customer sentiment.

Open page

Sentiment from Voice

Sentiment from voice detects emotional states and attitudes directly from speech audio, analyzing tone, pitch, pace, and energy beyond just the words spoken.

Open page

Audio Classification

Audio classification identifies the type of sound in audio recordings, categorizing them as speech, music, noise, environmental sounds, or specific events.

Open page

Sound Event Detection

Sound event detection identifies and locates specific sounds within audio recordings over time, determining what sounds occurred and when they happened.

Open page

Noise Reduction

AI noise reduction removes unwanted background noise from audio recordings using deep learning, preserving speech clarity while eliminating distractions.

Open page

Audio Enhancement

Audio enhancement uses AI to improve overall audio quality by reducing noise, removing reverb, equalizing levels, and restoring clarity in degraded recordings.

Open page

Audio Fingerprinting

Audio fingerprinting creates a compact digital signature of an audio recording that can identify the content even from short, noisy clips.

Open page

Spectrogram

A spectrogram is a visual representation of audio showing how frequencies change over time, used as the primary input format for many speech and audio AI models.

Open page

Mel Spectrogram

A mel spectrogram is an audio representation that maps frequencies to the mel scale, matching human auditory perception, and serves as the standard input for speech AI models.

Open page

Page 102 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.