# Speech to Text ![Speech to Text](images/speech_to_text_node.png) The speech to text node accepts a Buffer of data representing a fragment of audio. This is supplied in `msg.payload`. When the node executes, the audio is parsed through a speech recognition algorithm and the corresponding text representation is returned in the new `msg.payload` value. The structure of the response is defined by the GCP Speech to Text API and is an instance of a [RecognizeResponse](https://googleapis.dev/nodejs/speech/latest/google.cloud.speech.v1p1beta1.html#.RecognizeResponse) object. Processing an audio fragment takes time. A status visualization is associated with the node which is visible when the node is processing audio. The node has configuration options including: * Language Code - The language code to be used for processing. The set of allowed codes can be found [here](https://cloud.google.com/speech-to-text/docs/languages). The default is en-US. * Sample Rate - The sampling rate of the audio input. * Encoding - One of: * LINEAR16 * FLAC * MULAW * AMR * AMR_WB * OGG_OPUS * MP3