Speech-to-text transcription consists in the conversion of the audio speech into text by using voice recognition models.