Use speech recognition in gambas

Post

Posted
Rating:
#1 (In Topic #1018)
Enthusiast
gambafeliz is in the usergroup ‘Enthusiast’
 Hello everyone

I am trying to start a project. It is a voice recognition but very brief. It's probably two words. My questions are:

1. Does this possibility exist?
2. If it exists, is it possible to obtain the written result for an application made in Gambas?

They are encouraged to guide me in this challenge.

Thank you.

Note:
======
Have:
Debian as the operating system.

More:

It is possible that I have not explained well.

What I want is this:
1. A user says two words into a microphone.
2. Those two words are received by a free software voice recognizer that I still don't know what it will be.
3. This library will convert speech to text.
4. This is exactly what I want to do. Recover the text and compare it with orders that I am going to give to the system from Gambas

So I need:
1. What voice recognition that converts the sound of the microphone to text do I need so that Gambas can use it, or just know how to use a voice recognition and its result in text to use it for Gambas.

I hope now my idea is clear.

For your misfortunes I am Spanish and I only know Spanish, please, be patient with me, Thank you. :)
Online now: No Back to the top

Post

Posted
Rating:
#2
Regular
vuott is in the usergroup ‘Regular’
I say banally that it is evident you have to use the program, to convert speech to text, together with the "Shell" command, or use the functions of the external library of the resource that allows that conversion.

Europaeus sum !

<COLOR color="#FF8000">Amare memorentes atque deflentes ad mortem silenter labimur.</COLOR>
Online now: No Back to the top

Post

Posted
Rating:
#3
Enthusiast
gambafeliz is in the usergroup ‘Enthusiast’
 Yes, of course, you're not going wrong. But someone has done the experience with, for example, a library to be able to teach me the process so that I can later use code in Gambas.

I have seen this:
Vosk Speech Recognition Toolkit

But to be honest I have no idea how I can talk to Vosk and then use it on Gambas. Because at the end of everything I only want this:

A person says something to a computer and through Vosk, for example, it translates what the person says into text, and then I take the text and if it meets what I want in a comparison that I will do in Gambas, I execute an order so that another order by net fulfills the wish of someone elsewhere.

For your misfortunes I am Spanish and I only know Spanish, please, be patient with me, Thank you. :)
Online now: No Back to the top

Post

Posted
Rating:
#4
Regular
vuott is in the usergroup ‘Regular’

gambafeliz said

But to be honest I have no idea how I can talk to Vosk and then use it on Gambas.
Well, I found this code in C language:
   https://github.com/alphacep/vosk-api/blob/master/c/test_vosk_speaker.c
I don't know if it's suitable; it would seem so.
It should be noted that this code does not translate speech directly to text via microphone, but uses a "wav" format audio file, in which the speech has been previously recorded.
I didn't install Vosk library, however  :? I tried to translate it into Gambas language using the external functions of Vosk API:
   https://github.com/alphacep/vosk-api/blob/master/src/vosk_api.h
I also specify that, since I haven't installed the Vosk resource, I obviously  :? couldn't test my code.

Code (gambas)

  1. Library "libvosk..."
  2.  
  3. ' VoskModel *vosk_model_new(const char *model_path)
  4. ' Loads model data from the file and returns the model object.
  5. Private Extern vosk_model_new(model_path As String) As Pointer
  6.  
  7. ' VoskSpkModel *vosk_spk_model_new(const char *model_path)
  8. ' Loads speaker model data from the file and returns the model object.
  9. Private Extern vosk_spk_model_new(model_path As String) As Pointer
  10.  
  11. ' VoskRecognizer *vosk_recognizer_new_spk(VoskModel *model, float sample_rate, VoskSpkModel *spk_model)
  12. ' Creates the recognizer object with speaker recognition.
  13. Private Extern vosk_recognizer_new_spk(model As Pointer, sample_rate As Single, spk_model As Pointer)
  14.  
  15. ' int vosk_recognizer_accept_waveform(VoskRecognizer *recognizer, const char *data, int length)
  16. ' Accept voice data
  17. Private Extern vosk_recognizer_accept_waveform(recognizer As Pointer, data As Byte[], length As Integer) As Integer
  18.  
  19. ' const char *vosk_recognizer_result(VoskRecognizer *recognizer)
  20. ' Returns speech recognition result.
  21. Private Extern vosk_recognizer_result(recognizer As Pointer) As String
  22.  
  23. ' const char *vosk_recognizer_partial_result(VoskRecognizer *recognizer)
  24. ' Returns partial speech recognition.
  25. Private Extern vosk_recognizer_partial_result(recognizer As Pointer) As String
  26.  
  27. ' const char *vosk_recognizer_final_result(VoskRecognizer *recognizer)
  28. ' Returns speech recognition result. It doesn't wait for silence.
  29. Private Extern vosk_recognizer_final_result(recognizer As Pointer) As String
  30.  
  31. ' void vosk_recognizer_free(VoskRecognizer *recognizer)
  32. ' Releases recognizer object.
  33. Private Extern vosk_recognizer_free(recognizer As Pointer)
  34.  
  35. ' void vosk_spk_model_free(VoskSpkModel *model)
  36. ' Releases the model memory.
  37. Private Extern vosk_spk_model_free(model As Pointer)
  38.  
  39. ' void vosk_model_free(VoskModel *model)
  40. ' Releases the model memory.
  41. Private Extern vosk_model_free(model As Pointer)
  42.  
  43.  
  44. Library "libc:6"
  45.  
  46. Private Enum SEEK_SET = 0, SEEK_CUR, SEEK_END
  47.  
  48. ' FILE *fopen (const char *__restrict __filename, const char *__restrict __modes)
  49. ' Open a file and create a new stream for it.
  50. Private Extern fopen(__filename As String, __modes As String) As Pointer
  51.  
  52. ' int fseek(FILE *__stream, long int __off, int __whence)
  53. ' Seek to a certain position on STREAM.
  54. Private Extern fseek(__stream As Pointer, __off As Long, __whence As Integer) As Integer
  55.  
  56. ' int feof (FILE *__stream)
  57. ' Return the EOF indicator for STREAM.
  58.  
  59. ' size_t fread(void *__restrict __ptr, size_t __size, size_t __n, FILE *__restrict __stream)
  60. ' Read chunks of generic data from STREAM.
  61. Private Extern fread(__ptr As Pointer, __size As Long, __n As Long, __stream As Pointer) As Long
  62.  
  63. ' int fclose (FILE *__stream)
  64. ' Close STREAM.
  65. Private Extern fclose(__stream As Pointer) As Integer
  66.  
  67.  
  68. Public Sub Main()
  69.  
  70.   Dim wavin, model, spk_model, recognizer As Pointer
  71.   Dim buf As New Byte[3200]
  72.   Dim nread, final As Integer
  73.  
  74.   model = vosk_model_new("model")
  75.   spk_model = vosk_spk_model_new("spk-model")
  76.   recognizer = vosk_recognizer_new_spk(model, 16000.0, spk_model)
  77.  
  78.   wavin = fopen("/path/of/file.wav", "rb")
  79.   fseek(wavin, 44, SEEK_SET)
  80.  
  81.   While Not feof(wavin)
  82.     nread = fread(buf, 1, buf.Count, wavin)
  83.     final = vosk_recognizer_accept_waveform(recognizer, buf, nread)
  84.     If final
  85.       Print vosk_recognizer_result(recognizer)
  86.     Else
  87.       Print vosk_recognizer_partial_result(recognizer)
  88.     Endif
  89.   Wend
  90.   Print vosk_recognizer_final_result(recognizer)
  91.  
  92.   fclose(wavin)
  93.   vosk_recognizer_free(recognizer)
  94.   vosk_spk_model_free(spk_model)
  95.   vosk_model_free(model)
  96.  

Europaeus sum !

<COLOR color="#FF8000">Amare memorentes atque deflentes ad mortem silenter labimur.</COLOR>
Online now: No Back to the top

Post

Posted
Rating:
#5
Avatar
Guru
cogier is in the usergroup ‘Guru’
This might be of interest, https://unix.stackexch…nition-software-for-linux which shows some examples of the Vosk software that vuott talks about.
Online now: No Back to the top

Post

Posted
Rating:
#6
Enthusiast
gambafeliz is in the usergroup ‘Enthusiast’
 Thank you very much sirs

As always both to the rescue. I will try to use what you indicate to see if I am able to start the idea.

For your misfortunes I am Spanish and I only know Spanish, please, be patient with me, Thank you. :)
Online now: No Back to the top

Post

Posted
Rating:
#7
Avatar
Regular
thatbruce is in the usergroup ‘Regular’
Interesting! In fact I have spent the entire afternoon looking at the state of linux speech-to-text software options.

To be frank, in general they are still mainly useless. The accuracy is generally very poor.

I assume that you want to use a utility that doesn't require training too be done by the speaker, in other words you want to use a default model provided by the utility. Now, most of these are "english as she is spoke by Amer-kans" which is to be expected. (I am a "Strine" which has much nicer phonemes by the way!)
After repeating the following input into the microphone a dozen times I gave up attempting to say the phrase the same way without mistakes and finally installed an audio recorder, I used the gnome-audio-recorder by Osmo Antero just for convience sake. It's pretty rudimentary but does the job. This is the input phrase:

"There was movement at the station for the word had passed around,
that the colt from 'Old Regret' had got away.
"

I'll ignore the other dozen or so that I tried that just delivered garbage like "ten wars moon men" and just report the two that stood out.

pocketsphinx
PRO: fast
CON: medium accuracy for untrained models
RESULT: there was movement at the station the word that caused the rare that the call from all regret had gone already

vosk
PRO: much better untrained accuracy than any other I tried
CON: very slow at first as it has to generate it's default model, but speeds up as long as you don't reboot.
RESULT: there was movement at the station for the word had passed around that the cult from all regret had got away
 
Both do have API's that can be used as Vuott says. I haven't looked at them yet apart from the pocketsphinx api looks a lot simpler than the vosk one, which is  v e r y  complex (but possibly worth the effort).
Looking forwards to your results!

p.s. Tried to attach the input mp3 file I used but it appears that phpBB has never heard of audio files :-)

Online now: No Back to the top

Post

Posted
Rating:
#8
Enthusiast
gambafeliz is in the usergroup ‘Enthusiast’
 Thank you very much, I appreciate your interest.
I also note that you found something useful.

I tell you I don't want a conversation recognizer.

I look for the user to say something that he sees on the screen, example 4A, and with that I have enough to interpret it as a command.

I'm telling you this so you know exactly what I'm looking for.

Let's say it's this:

1. I present some codes on the screen at will.
2. The user chooses with his voice.
3. I get this converted to text in Gambas.
  And finally I execute some command programmed for this obtained string.

For your misfortunes I am Spanish and I only know Spanish, please, be patient with me, Thank you. :)
Online now: No Back to the top

Post

Posted
Rating:
#9
Avatar
Regular
thatbruce is in the usergroup ‘Regular’
 Here's just a couple of thoughts.
Regardless of the speech-to-text library you employ, you will need to "convert" the string it "heard" to something your program will know. For example, suppose the user picks "4X", in english models (and with an english speaker) you are likely to end up with something like "four eggs" or "for eggs". So you need to train your program, not the S2T library, that "four eggs" and "for eggs" is the possible text for the "4X" command.
Now given your location I raise the question, what language(s) are your users going to use? Are dialects going to be a problem? etc
So to get you moving I think you will need some sort of a lookup table to convert the delivered text as spoken by user X in language Y with dialect Z into the required command.
 b

Online now: No Back to the top
1 guest and 0 members have just viewed this.