User:M!dgard/Record your voice for OsmAnd

From OpenStreetMap Wiki
Jump to navigation Jump to search

The available voices in OsmAnd aren't great. The TTS voice of your phone likely isn't great either. The good news is you can record them yourself! The bad news is that this tutorial is still in its very early days and you'll need to have basic JavaScript or Prolog skills and figure out a lot by yourself to get this to work. Ping me on the talk page if you're interested, that'll motivate me to get back to this sooner.

We use the voice/ directory of https://github.com/osmandapp/OsmAnd-resources as base.

Requirements

To create your own voice, you'll need:

  • A quiet space
  • A microphone and audio editing software such as Audacity
  • You need to run a shell script – so you'll need a computer with Linux or macOS[1]. It may be possible on WSL, but we don't support that
  • An Android device. You need to place a file within the OsmAnd directory on your phone. This is not possible with iPhones

Testing voice prompts

Before starting the project, it is useful to test the voice prompts. To do this, go to 'plugins' and enable 'OsmAnd development' (at the bottom of the list).

In its 'settings', there is an entry 'Test Voice Prompts' which lets you select a voice. Tapping one of the instructions will make it be read aloud, as if you were using the navigation functionality.

You can focus on creating one of those sentences first, in order to have a complete test of your voice. Good candidates are "Turn left / then / in / 100 / meters / keep right / and arrive at your destination." and "In / 450 / meters / enter the roundabout and take the / first / exit." because they involve a lot of different parts.

Message definitions

I found the messages OsmAnd uses in Dutch unsatisfactory. They say for example sla linksaf na 300 meter, but I like na 300 meter, sla linksaf better, so I modified them. I also introduced a distinction between "end of sentence" units (Na 300 meter, …) and "middle of sentence" units (Afstand: 50 kilometer.). Pieter Vander Vennet built upon my initial work, you can download his files for Dutch at https://pietervdvn.github.io/OsmandVoicesCreation.zip . For other languages, at this time it may be better to start from the version at OsmAnd's repository at https://github.com/osmandapp/OsmAnd-resources .

At the time of writing (2019-02), the definitions are in Prolog, but the OsmAnd team is working on a JavaScript implementation. If the Prolog method ever breaks, please update this wiki.

Write the text you're going to read

I created this text you can read, it contains all necessary messages and is designed to have non-clashing sounds between messages. (In my first iteration I noticed that e.g. in "tien / meter" the N and M blend together and it's impossible to correctly extract the messages.) Do note that this is designed for my modified message definitions: it doesn't work with the normal ones.

  • Dutch Red x.svg To do: there's still room for improvement, e.g. "eerste / afrit" can blend together

Record

Reading all the phrases out loud takes around 6 minutes.

  • Choose the quietest location you can find to record.
  • Keep a glass of water at the ready.
  • Make sure you do it in one go. Keep a constant tempo, constant timbre, constant volume and constant distance from the microphone.
  • First, record a few seconds of silence. This is later used to filter out noise.
  • If you messed up a phrase or there was temporary background noise, just wait a second and read it again. Don't get agitated, you would hear that in the recorded voice.

I recommend Audacity for recording.

Process and split

You'll have to split the different messages and export them as OGG with the appropriate names.[2]

Remember #Testing voice prompts, it's a good idea to test with a small amount of strings first. This gives you a chance to evaluate how well you have split the audio.

If you use Audacity, do this:

  1. First do noise reduction: Select a second of silence. From the "Effect" menu, select "Noise Reduction". Click "Get Noise Profile". Select all audio with Ctrl+A, then open "Noise Reduction" again but this time click "OK".
  2. Split the different parts as you prepared. Create a new track for each segment, align all tracks to zero (drag them all to the left with the drag tool). Remove audio you don't need.
  3. Name each track like the file with that audio should be named.
  4. Click 'File' → 'Export' → 'Export Multiple' to export all tracks to separate files. The format should be '.ogg'.

Create the voice pack

Normally the gen_voice.sh script requests voices from an online TTS service (that's why OsmAnd's "recorded" voices don't sound natural!) and would overwrite your files. I hacked it not to do that, but that was for an old version. Red x.svg To do: Do it for a new version and upload the modifications.

Upload to your device

Use adb or any other method to get the voice pack to your phone, and unzip it in files/voice/<langcode> relative to your OsmAnd directory (which might be somewhere like /storage/emulated/0/Android/data/net.osmand.plus).

After restarting OsmAnd (swipe it away in your task manager), the new or updated voice should be available. You can test it if you enable the "OsmAnd development" plugin and go to OsmAnd's "Settings" → "OsmAnd development" → "Test voice prompts".

Footnotes

  1. or another *NIX like Solaris, *BSD…
  2. I tried to automate this with sox (split on silence) but the results were unsatisfactory. I ended up doing it manually, selecting and exporting the audio for each message. I might try creating an automated method again in the future.