Chatterbox-Multilingual-TTS

Running

Zihan428 commited on Sep 23

Commit

364e836

1 Parent(s): 4a71cd1

Normalize Polish text to NFC before tokenization

Files changed (1) hide show

src/chatterbox/models/tokenizers/tokenizer.py CHANGED Viewed

@@ -306,6 +306,10 @@ class MTLTokenizer:
             txt = korean_normalize(txt)
         elif language_id == 'ru':
             txt = self.russian_stress_labeler(txt)
         # Prepend language token
         if language_id:

             txt = korean_normalize(txt)
         elif language_id == 'ru':
             txt = self.russian_stress_labeler(txt)
+        elif language_id == 'pl':
+            # Polish text normalization: ensure diacritic characters are preserved
+            import unicodedata
+            txt = unicodedata.normalize('NFC', txt)
         # Prepend language token
         if language_id: