![]() ![]()
From (IBM) (public domain in USA): wget įfmpeg -i Think_Thomas_J_Watson_Sr.ogg -ar 16000 -ac 1 think.wav Next I also tried with the vosk-model-en-us-aspire-0.2 which was a 1.4GB download compared to 36MB of vosk-model-small-en-us-0.3 and is listed at : mv model model.vosk-model-small-en-us-0.3 ![]() So we can see that several mistakes were made, presumably in part because we have the understanding that all words are numbers to help us. The "z" of the before last "zero" sounds a bit like an "s". The "nine oh two one oh" is said very fast, but still clear. The test.wav example given in the repository says in perfect American English accent and perfect sound quality three sentences which I transcribe as: one zero zero zero one The sections below show some testing I did with it. #Windows speech recorder commands install#The same directory also contains an SRT subtitle output example, which is more human readable and can be directly useful to people with that use case: python3 -m pip install srt Then install vosk-api with pip: pip3 install vosk #Windows speech recorder commands code#
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |