Good day!

The internet and the world wide web has not only brought us so many delightful and useful tools and information, but has brought a gust of never-ending dangers such as deep fakes and identity theft. This blog will be focused on voice replication.

Voice replication is where samples of voice are taken from a selected individual, recorded to train an AI, which can create an artificial voice of that individual. What’s most shocking is that the actual person did not say these words in this order. Moreover, the trained AI can say words that were not recorded and almost produce a realistic replicated voice.

I find that voice replication extremely affects authenticity as the lines between what is real or fake becomes very blurred, ambiguous and almost undistinguishable. With other people impersonating and using voice replication to make someone else say something can be very impactful. Furthermore, this invades privacy and the right over what you say, almost taking away the responsibility and ownership of your own voice.

I tested out voice replication using a web application called Lyrebird: It requires a minimum of 30 voice samples in order to create an artificial voice avatar. By recording more samples, the app would be able to improve your avatar’s voice, making it more accurate and believable. Afterwards, you could input a text or have a text randomly generated and lyrebird will have that text spoken by your voice.

I found that the voice replication by Lyrebird was not persuasive enough to make it seem believable. I would compare the voice to an AI such as Siri or another computerized voice that is used to read out text on a webpage. The tone and speed was slightly robotic and unnatural. The phrasing was different and felt like the voice was saying the text word by word – not as a whole sentence. I also couldn’t sense any emotion when the text was spoken. I would honestly say that there are elements of my voice that is recognisable, but overall it feels quite fake as I would tend to emphasise some words or change by pace.

There are many implications of voice replication. The main stakeholders would be the general public, but especially people who have had their voice recorded in speeches, on the radio or on the TV. This would mean these high-profile individuals could be harmed and/or wrongfully framed. This would negatively impact their reputation which could affect them in the long term – as it could mean that it will be difficult to find a job or lead a successful life. The situation could be made even worse if the replicated voice was accompanied with synced pictures because the source would seem very believable and more people would blindly believe it (deep fakes).

Politicians would be most affected as they have had a lot of speeches. Their voices could be replicated and used harmfully and maliciously. This would have distressing consequences especially if what they said could create tension and conflict between nations or the world. Having this political tension would create a chain reaction which would lead to trade wars, wars, disagreement over political ties, irreparable relationship, and much more.

Voice Replication has already been implemented in artificial intelligence (AI) systems such as Apple’s Siri, Windows’ Cortana, and Amazon’ Alexa. The benefits of training an AI to talk is that not as much time is wasted to record every single word in the English language (or other languages which would take even longer to record). Furthermore, people do not have to record every single sentence ever created (as that would be impossible) due to extremely varied length of sentences and the impossible number of combinations. Therefore, voice replication applied in this situation would save time, money and effort.

An advanced example currently used in the world is in Japan, where robots replace receptionists at hotels. They deal with simple instructions and are quicker and are equally as helpful. The robots will be trialled in the 2020 Japan Olympics.

Finally, voice replication can be utilised in bringing a deceased individual back to life through their voice. Along with deep fakes, the deceased individual could literally exist as a human being through a hologram! Perhaps with more research into their personality and what they think, these holograms could be very educational and exciting for the future, perhaps improving history lessons and allowing students to have an interactive lesson with a significant person in recent history (e.g Martin Luther King Jr.)

I believe that heuristics can be used to detect vocal irregularities in voice recordings. This will allow us to have concrete evidence that the voice was fake. This would get increasingly difficult to detect as technology improves and soon the ‘fake’ voice seemingly blends in with the real voice.

I also believe that another solution would be to have the local government issue an official statement that the voice replication is fake. This would sound more reliable from the government as they have much more information and power, making them more trustworthy. Yet in this day and age, even we don’t know if the government is telling the truth or not! Their systems could be hacked and a false statement may arise!

Overall, I believe voice replication should be used more sensitively and wisely to achieve a useful outcome. This would relate to the Common Good Approach

If you enjoyed this post, make sure you subscribe to my RSS feed!