Skip to main content

XTTS

XTTS

🧠 XTTS in SkyrimNet β€” the Default-Quality TTS​

XTTS (Cross-lingual Text-to-Speech) is a powerful, deep-learning-based TTS engine that brings realistic, emotionally expressive, and cloneable voices to Skyrim. Unlike simpler TTS engines, XTTS can replicate a specific voice from a short audio clip, making it ideal for immersive, character-specific dialogue in modded Skyrim.

In SkyrimNet, XTTS is used via a local HTTP endpoint, making it easy to integrate and fast enough for real-time use.
It’s currently considered the default voice generation system in SkyrimNet, especially for voice cloning and good emotional fidelity.


πŸŽ™οΈ What XTTS Does​

XTTS converts any input text into high-quality, expressive speech β€” optionally mimicking a specific voice using a voice reference sample.

Input:
Text: "You're not from around here, are you?"
Voice sample: 30-second clip of a female Nord NPC

Output:
High-fidelity audio of that line, spoken in the same voice and tone as the sample

XTTS produces rich, natural speech, with subtle pauses, intonation, and personality β€” perfect for Skyrim’s varied characters.


🌐 How XTTS Works in SkyrimNet​

XTTS is not currently embedded into SkyrimNet like Piper β€” instead, it runs as a separate local TTS service, typically on:

http://localhost:5002/api/tts

Here’s how SkyrimNet uses it:

  1. SkyrimNet sends a request to the XTTS server with:

    • The text to speak
    • Optional voice reference audio
    • Optional speaker ID or emotion hints
  2. XTTS returns a fully rendered WAV or PCM audio clip

  3. SkyrimNet plays the audio in-game, synced with dialogue

This architecture keeps SkyrimNet lightweight while still offering powerful voice features via XTTS.


🧬 Key Features of XTTS in SkyrimNet​

  • 🎭 Voice Cloning: Easily assign unique voices to NPCs using short reference clips
  • 🌍 Cross-lingual Support: Speak English in a French, Argonian, or Dunmer accent
  • 🧠 Emotion Control (planned): Adjust mood and tone of delivery for immersive reactions
  • ♻️ Reusable Voices: Store and reuse custom voices for followers, companions, or even the player

πŸ“¦ XTTS vs Piper​

FeaturePiper (In-Process)XTTS (External API)
Speed⚑ Very fast⚠️ Slower (1–2s latency)
Voice Qualityβœ… Goodβœ…βœ… Excellent
Voice Cloning❌ Not supportedβœ… Full support
Integrationβœ… Native DLLπŸ”Œ HTTP endpoint

πŸš€ Why XTTS is SkyrimNet's Default Quality TTS​

  • 🎧 Offers the good audio realism
    Natural cadence, clear articulation, and emotional depth β€” ideal for immersive dialogue.

  • πŸ” Supports voice reuse and identity
    Easily assign consistent voices to NPCs using short reference samples.

  • 🧠 Enables AI-driven dialogue to feel grounded and believable
    Dynamic lines generated by LLMs sound intentional, like a real voice actor spoke them.

  • πŸ’¬ Works with any line β€” by input or LLM-generated β€” and makes it sound intentional
    Perfect for branching narratives, roleplay mods, and reactive NPC behavior.


πŸ—£οΈ Setting Up XTTS Mantella API Server for SkyrimNet

Follow these steps to set up XTTS as your TTS backend:


πŸ“¦ Step 1: Download and Extract​

  1. Download the XTTS Mantella API Server from its Nexus Mods page.

  2. Unzip it to a folder of your choice (avoid system folders like C:\Program Files).

  3. Download the latent speaker folder for the language(s) you plan to use (also on the same Nexus page).

  4. Extract the speaker folder into the same directory as the server.


▢️ Step 2: Start the XTTS Server​

  • Launch xtts-api-server-mantella.exe inside the extracted folder.

  • On first launch, it will prompt you to confirm several settings. You can press Enter to accept defaults.

  • Device:

    • Use cuda if you have an NVIDIA GPU
    • Use cpu otherwise
  • Deepspeed:

    • Set to yes only if you have an NVIDIA GPU that supports it (check Nexus description for compatible cards)

βš™οΈ Step 3: Configure SkyrimNet​

In the SkyrimNet Web UI:

  1. Go to Test and Easy Setup
  2. Under Text-to-Speech, set:
    • TTS Backend β†’ XTTS
    • TTS Server URL β†’ http://localhost:8020
      (or your XTTS server's IP address if running on a separate machine)

You're now ready to generate voices using XTTS! βœ…

πŸ”Š For Mantella XTTS Users: Fast & Easy Way to Make a Custom Voice Latent

Want your custom NPC to use a unique voiceβ€”or fix a vanilla one that doesn’t quite fit? Here’s how to create a high-quality voice latent (custom voice model) using just a .wav file.


βœ… Step-by-Step Guide​

🧱 Step 1: (Optional) Get a Clean Voice Sample​

If you already have a clean .wav sample, skip to Step 4.

  1. Download LazyVoiceFinder.
  2. Read the mod description carefully to install requirements.
  3. Extract the tool outside your game or Windows folders.
  4. Download the Patch from the "Update Files" section and overwrite the original files.

🎧 Step 2: Extract a Voice Line​

  1. Launch LazyVoiceFinder.exe.
  2. Select Skyrim from the Game Mode dropdown.
  3. Click File β†’ Open (for vanilla/DLC voices)
    or File β†’ Open from file (for modded voices).
  4. Use the filters to find voice lines by:
    • Plugin
    • Voice type
    • Dialogue content

Example:​

  • Adrianne Avenicci uses FemaleCommander, but you want a version that better reflects her subtle Imperial accent.
  • Use keywords like "I don't claim to be the best blacksmith..." in Dialogue 1.
  • Click the green play button to preview.
  • Right-click the best-sounding line β†’ Copy voice file as WAV Format.
  • Paste the .wav into your XTTS\speakers\en folder.

A clean sample = a better latent!

Tips for a good sample:

  • Clear voice only, no background sounds or music.
  • Natural flow (no long pauses or clipped audio).
  • Length: 7–10 seconds is ideal.
  • Format: Mono, 22050Hz, 16-bit WAV

πŸ”§ Use Audacity:​

  1. Launch Audacity.
  2. File β†’ Import β†’ Audio β†’ select your .wav.
  3. Tracks β†’ Resample β†’ enter 22050 Hz.
  4. Tracks β†’ Mix β†’ Mix Stereo Down to Mono (if needed).
  5. File β†’ Export Audio β†’ Save as .wav (Mono, 22050 Hz, 16-bit).

πŸ› οΈ Step 4: Generate the Voice Latent​

  1. Move your finalized .wav into the XTTS\speakers\en folder.
  2. Rename it (e.g., adrianne.wav).
  3. Run xtts-api-server-mantella.exe.
    • It will automatically generate a .json voice latent in XTTS\latent_speaker_folder\en.

Step 5: Assign the Voice to an NPC​

  1. Launch Skyrim and get near the NPC.
  2. Open SkyrimNet Web UI.
  3. Navigate to:

Advanced Configuration β†’ Character Overrides β†’ Nearby β†’ NPC name β†’ Entity β†’ Voice ID

yaml Copiar Editar

  1. Set the Voice ID to your new voice (e.g., adrianne).

πŸŽ‰ Done!​

Your NPC now speaks with their custom voice! Enjoy the immersion.


⚠️ Note:
DO NOT share any .wav or .json latent files unless you own the voice or have clear permission to redistribute.