FBM3 - “Speech Understanding”

From RoCKIn Wiki
Jump to: navigation, search

Format of input audio files and output CFR format

Input: WAV files or live spoken sentences

Output: text file (named results.txt or <teamname>.txt) with one line for each sentence


Example of output file

 file1.wav|move to the living room|MOTION(goal:"living room")
 file2.wav|could you please find my jacket?|SEARCHING(theme:"my jacket")

Please refer to the RoCKIn@Home Rulebook for more details about input and output formats.

Resources that are available to design or/and validate the processing system realized for this benchmark are listed and linked in the following.

  • new updated data sets containing audio files, correct transcriptions and interpretations are available in the RoCKIn datasets page.
  • additional data can be found at: http://sag.art.uniroma2.it/HuRIC.html
  • a parser that implements the CFR grammar described in the specification module and that can be used to check if the produced output follows the correct syntax can be downloaded at "Download the parser".
  • the UPDATED lexicon composing the commands that will be used for the benchmark, in terms of different grammatical categories. It is provided within a zip file called lexicon.zip ("Download lexicon.zip") containing:
    1. verbs.txt: contains all the verbs used in the commands, comprehending modal and auxiliary verbs (e.g. "can").
    2. nouns.txt: contains all the nouns present in the commands.
    3. prepositions_and_positional_adverbs.txt: contains all the prepositions and positional adverbs (e.g. "close") used in the commands.
    4. personal_pronouns.txt: contains the list of the personal pronouns used in the commands (e.g. "me").
    5. adjectives.txt: contains the adjectives used in the commands, considering also the one derived from verbs (e.g. "dining" for "dining room").
    6. others.txt: contains all the words that are not considered in the previous categories, as articles or other kind of adverbs (e.g. "carefully").

Note: these files report all the lexicon that could be used in the commands. No words outside this lexicon will be used. On the other hand, only one subset of it could be used in the commands, meaning that not all of them must be present in the commands.