Glossary

AI glossary for content assistants

Plain-English definitions of 13,917 AI terms for branded assistant teams.

Plain EnglishRAGLLMs

Start for Free

Search glossary terms

13,917 glossary pages match your filters.

Glossary

13,917 terms. Open one for definitions and related concepts.

MFCC

MFCCs (Mel-Frequency Cepstral Coefficients) are compact audio features derived from mel spectrograms that capture the spectral shape of speech, widely used in traditional speech processing.

Open page

Voice Recognition

Voice recognition identifies who is speaking by analyzing unique vocal characteristics, often used interchangeably with speaker recognition.

Open page

Speaker Identification

Speaker identification determines which person from a known set of speakers is speaking in an audio recording.

Open page

Speaker Verification

Speaker verification confirms whether a speaker is who they claim to be by comparing their voice against a stored voiceprint.

Open page

Voiceprint

A voiceprint is a mathematical representation of the unique characteristics of a person's voice used for identification or verification.

Open page

Batch Transcription

Batch transcription processes pre-recorded audio files asynchronously, converting them to text without real-time constraints.

Open page

Live Captioning

Live captioning generates real-time text captions from spoken audio during live events, meetings, or broadcasts.

Open page

Subtitle Generation

Subtitle generation automatically creates timed text overlays for video content using speech recognition and timing algorithms.

Open page

Word-Level Timestamp

Word-level timestamps assign precise start and end times to each individual word in a transcription, enabling exact audio-text alignment.

Open page

Endpoint Detection

Endpoint detection identifies the start and end of speech utterances in an audio stream, determining when a speaker begins and stops talking.

Open page

Hotword Detection

Hotword detection continuously listens for a specific trigger phrase that activates a voice system, also known as wake word detection.

Open page

Voice Command

Voice commands are spoken instructions that trigger specific actions in a device or application, enabling hands-free control.

Open page

Voice Search

Voice search allows users to perform search queries by speaking instead of typing, using speech recognition to convert spoken queries to text.

Open page

Dictation

Dictation converts continuous spoken speech into formatted written text, enabling hands-free document creation and text input.

Open page

Whisper Model

Whisper is an open-source speech recognition model from OpenAI trained on 680,000 hours of multilingual audio data.

Open page

Distil-Whisper

Distil-Whisper is a distilled version of OpenAI Whisper that runs 6x faster while retaining 99% of the accuracy.

Open page

Faster Whisper

Faster Whisper is a reimplementation of OpenAI Whisper using CTranslate2 that delivers up to 4x faster inference with lower memory usage.

Open page

Wav2Vec

Wav2Vec is a self-supervised speech representation model from Meta that learns powerful audio features from unlabeled speech data.

Open page

HuBERT

HuBERT is a self-supervised speech representation model that learns acoustic units through an offline clustering and prediction approach.

Open page

Conformer ASR

Conformer is a speech recognition architecture that combines convolution and transformer layers to capture both local and global audio patterns.

Open page

Transducer

A transducer is a sequence-to-sequence model architecture for speech recognition that jointly models acoustic and language information for streaming ASR.

Open page

CTC Decoding

CTC (Connectionist Temporal Classification) is a training and decoding technique for speech recognition that handles variable-length alignment between audio and text.

Open page

Streaming ASR

Streaming ASR processes audio in real time, producing transcription results incrementally as speech is received rather than waiting for the complete utterance.

Open page

Hybrid ASR

Hybrid ASR combines multiple recognition approaches or models to achieve higher accuracy than any single system alone.

Open page

Multi-Speaker TTS

Multi-speaker TTS generates speech in multiple distinct voices from a single model, supporting voice selection at inference time.

Open page

Zero-Shot TTS

Zero-shot TTS generates speech in a new voice from just a few seconds of reference audio, without any fine-tuning or training on that voice.

Open page

Expressive TTS

Expressive TTS generates speech with natural emotion, emphasis, and intonation, going beyond monotone synthesis to convey meaning and feeling.

Open page

Emotional TTS

Emotional TTS explicitly controls the emotional tone of synthesized speech, generating audio that conveys specific emotions like happiness, sadness, or anger.

Open page

Prosody Control

Prosody control allows fine-grained manipulation of speech rhythm, intonation, stress, and timing in text-to-speech systems.

Open page

Pitch Control

Pitch control adjusts the fundamental frequency of synthesized speech, allowing modification of how high or low the voice sounds.

Open page

Speaking Rate

Speaking rate controls how fast or slow synthesized speech is delivered, measured in words per minute or as a relative speed factor.

Open page

Naturalness

Naturalness measures how human-like and natural synthesized speech sounds, often evaluated through Mean Opinion Score listening tests.

Open page

Mean Opinion Score

Mean Opinion Score (MOS) is a standardized subjective quality measure where human listeners rate speech on a 1-5 scale.

Open page

ElevenLabs TTS

ElevenLabs is a leading AI voice platform offering high-quality text-to-speech, voice cloning, and voice design capabilities.

Open page

Google TTS

Google Text-to-Speech is a cloud-based speech synthesis service offering neural voices across 50+ languages as part of Google Cloud.

Open page

Azure Speech

Azure Speech is Microsoft's cloud speech service providing text-to-speech, speech-to-text, speech translation, and custom voice capabilities.

Open page

Bark TTS

Bark is an open-source transformer-based text-to-audio model by Suno that generates speech, music, and sound effects from text prompts.

Open page

Tortoise TTS

Tortoise TTS is an open-source multi-voice text-to-speech system known for producing extremely high-quality speech at slow generation speeds.

Open page

Coqui TTS

Coqui TTS is an open-source text-to-speech toolkit offering multiple TTS architectures and pre-trained models for research and production use.

Open page

Piper TTS

Piper is a fast, lightweight open-source TTS system designed for edge devices and offline use, supporting over 30 languages.

Open page

StyleTTS

StyleTTS is a speech synthesis approach that uses style diffusion to generate human-level natural speech by modeling style as a latent random variable.

Open page

OpenVoice

OpenVoice is an open-source instant voice cloning model that separates voice style from language content for flexible cross-lingual cloning.

Open page

Fish Speech

Fish Speech is an open-source multilingual text-to-speech model supporting voice cloning and real-time synthesis across multiple languages.

Open page

Voicebot

A voicebot is an AI-powered conversational agent that communicates with users through voice, handling phone calls and voice interactions autonomously.

Open page

Intelligent IVR

Intelligent IVR uses AI and natural language understanding to create dynamic, conversational phone menu systems that understand caller intent.

Open page

Voice Search Optimization

Voice search optimization adapts content and SEO strategies for voice-based search queries, which tend to be conversational and question-based.

Open page

Call Analytics

Call analytics uses AI to extract insights from phone conversations, analyzing content, sentiment, compliance, and performance metrics.

Open page

Speech Analytics

Speech analytics analyzes spoken interactions to extract patterns, trends, and insights from voice data across an organization.

Open page

Page 103 of 290. Showing 48 of 13,917 matching glossary pages.

Turn owned content into answers

Use InsertChat to launch a branded assistant visitors can ask directly.

Start for Free

7-day free trial · No card required

Interactive FAQ

Try the FAQ like a visitor.

Open product, pricing, security, integration, and free-tool questions in the same chat your visitors use.

InsertChat

Interactive FAQ

Hey. Pick a question below and see how InsertChat turns FAQs into clear, source-backed answers.

Just now

0 of 21 questions explored Instant FAQ answers

Product FAQ

What is InsertChat?

InsertChat is a white-label AI assistant for your website. Train it, brand it, publish it, and learn from visitor questions.

How does InsertChat use my website content?

Connect approved pages, docs, videos, FAQs, policies, and other sources. InsertChat turns them into source-backed answers and next steps.

Can I control the assistant's tone and sources?

Yes. Choose its sources, tone, welcome message, and prompts so it stays on brand.

How does InsertChat stay accurate?

Answers use approved content and source links. Analytics show unclear or missing answers so you can improve coverage.

Can it collect leads or route support questions?

Yes. InsertChat can collect details, qualify intent, add context, and send chats to the right inbox, CRM, workflow, or person.

Can I control how the assistant behaves?

Yes. Control prompts, model choice, tool access, and the branded assistant experience so behavior stays consistent.

Which AI models can I use?

InsertChat supports multiple model providers. Choose each assistant's model for quality, speed, and cost, or use BYOK.

Can I pick different models for different workflows?

Yes. Use a faster model for common questions and a stronger model for complex reasoning. InsertChat supports that balance per conversation.

Where can I deploy an assistant?

Use a widget, embed, full-page assistant, custom domain, in-app embed, or API. Reuse one setup across surfaces.

Do I need coding skills?

No. Build and deploy AI assistants using our visual builder. The embed code is one line of JavaScript.

Can I customize the branding and UI?

Yes. Customize the assistant name, logo, colors, welcome message, suggested prompts, tone, domain, and white-label presentation.

Can I use my own domain?

Yes. Custom domains are supported, typically via enterprise options.

Does InsertChat support voice?

Yes. Voice dictation and text-to-speech let users speak instead of type.

Does InsertChat support vision?

Yes. Enable vision for assistants when images help clarify a request or context.

What tools and integrations are supported?

Zendesk, HubSpot, Shopify, WooCommerce, calendar booking, web search, Perplexity, and webhooks for your own systems.

Can I control which tools the assistant is allowed to use?

Yes. Tool access is controlled per assistant so you enable only what you need.

Can the agent hand off to a human?

Yes. Configure human handoff so the agent escalates when needed. Full conversation history is passed along.

Do you provide analytics?

Yes. Track chats, leads, feedback, top questions, unanswered questions, most-used sources, and content gaps.

Is it mobile friendly?

Yes. The widget and embeds work well on desktop and mobile with no separate experience needed.

What's the fastest path to a successful deployment?

Start with one assistant and a small set of high-value sources. Iterate using real questions from analytics.

What is the fastest way to get started?

Create an account. Connect one key source. Ask a test question, brand the assistant, then publish it on one page.