

I'd like to talk today about a powerful and fundamental aspect of who we are: our voice. 我今天要和大家讲述的是 关于我们自身的一个非常强大 非常重要的方面:我们的声音,
Each one of us has a unique voiceprint that reflects our age, our size, even our lifestyle and personality . 每一个人的声音都带有独特的标记, 这个声音的标记能反映出我们的年龄,我们的胖瘦高矮, 甚至是我们的生活方式和性格。
In the words of the poet Longfellow, "the human voice is the organ of the soul." 用诗人朗费罗的话来说, “人类的声音是灵魂的重要器官。”
As a speech scientist, I'm fascinated by how the voice is produced, and I have an idea for how it can be engineered. 身为一个语音科学家,我非常热衷于研究 声音的产生, 而且我有一个如何制造声音的想法。
That's what I'd like to share with you. 这就是我今天想和大家分享的东西。
I'm going to start by playing you a sample of a voice that you may recognize . 首先,我想为大家播放一个声音样本, 这个声音你们可能听过。
(Recording) Stephen Hawking : "I would have thought it was fairly obvious what I meant." (录音)史蒂芬·霍金:“我本来以为, 我想说的意思很显而易见。”
Rupal Patel: That was the voice of Professor Stephen Hawking. 卢帕尔·帕特尔:那是 史蒂芬·霍金教授的声音。
What you may not know is that same voice may also be used by this little girl who is unable to speak because of a neurological condition. 你们可能不知道的是,同样的声音 也被用于这个小女孩身上, 她因为大脑神经系统缺陷 而不能讲话。
In fact, all of these individuals may be using the same voice, and that's because there's only a few options available. 事实上,很多不能说话的人 都可能在使用同样的声音 那是因为可以使用的声音样本太少了。
In the U.S. alone, there are 2.5 million Americans who are unable to speak, and many of whom use computerized devices to communicate. 单单在美国,就有250万人 不能说话, 而且在这些人中很多都是使用电脑化的设备 进行交流。
Now that's millions of people worldwide who are using generic voices, including Professor Hawking, who uses an American-accented voice. 也就是全世界数百万的人 都在使用一些毫无个性的声音, 其中就包括史蒂芬·霍金教授, 他使用的声音是带有美国口音的。
This lack of individuation of the synthetic voice really hit home when I was at an assistive technology conference a few years ago, and I recall walking into an exhibit hall and seeing a little girl and a grown man having a conversation using their devices, different devices, but the same voice. 我真正开始意识到 合成声音缺乏个性 是我在几年前参加一个 辅助技术会议的时候, 我记得走进一个展厅, 看到一个小女孩和一个成年男子 正在用他们的设备进行对话, 不同的设备,却是同样的声音。
And I looked around and I saw this happening all around me, literally hundreds of individuals using a handful of voices, voices that didn't fit their bodies or their personalities . 我看向四周,发现身边这种情况很多, 几乎是上百个人 却只用着为数不多的几种声音, 这些声音跟他们的身体特征 和性格都很不匹配。
We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. 我们肯定做梦也不会想到把一个成年男子的假肢 装在一个小女孩身上。
So why then the same prosthetic voice? 那为什么他们要用同样的合成声音呢?
It really struck me, and I wanted to do something about this. 这深深的触动了我, 我想做些什么。
I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders . 现在我想为大家播放一个人的录音—— 不对,其实是两个人, 他们都有很严重的言语障碍。
I want you to take a listen to how they sound. 我想让大家听听他们的声音。
They're saying the same utterance . 他们在发出同样一个音。
(First voice) (第一个声音)
(Second voice) (第二个声音)
You probably didn't understand what they said, but I hope that you heard their unique vocal identities . 大家可能并不明白他们说了什么, 但我希望大家听到了 他们独特的声音标志。
So what I wanted to do next is, 所以接下来我想要做的事情就是,
I wanted to find out how we could harness these residual vocal abilities and build a technology that could be customized for them, voices that could be customized for them. 我想要找出如何可以利用 他们残留的发声能力, 并发明一项技术, 这项技术能为他们创造出个性化的声音, 就是专门为他们定制的声音。
So I reached out to my collaborator , Tim Bunnell. 所以我联系了我的合作伙伴,蒂姆·邦内尔。
Dr. Bunnell is an expert in speech synthesis , and what he'd been doing is building personalized voices for people by putting together pre-recorded samples of their voice and reconstructing a voice for them. 邦内尔博士是一位语言合成方面的专家, 他一直在为需要帮助的人合成 个性化的声音, 他把这些人 预先录制好的声音样本组合在一起, 并重新建立他们的声音。
These are people who had lost their voice later in life. 这些人都是在人生后来的某个阶段 才失去了语言能力。
We didn't have the luxury of pre-recorded samples of speech for those born with speech disorder. 可是我们没有 那些生来就有言语障碍的人的 预先录制好的声音样本。
But I thought, there had to be a way to reverse engineer a voice from whatever little is left over. 但我想,肯定有一个办法 可以利用仅存的不管剩下多少的语言能力 来逆向重组声音。
So we decided to do exactly that. 于是我们决定去做这样的工作。
We set out with a little bit of funding from the National Science Foundation , to create custom-crafted voices that captured their unique vocal identities. 我们从国家科学基金会的一小笔资金开始, 努力打造反映了他们的独特声印的 定制的声音。
We call this project VocaliD, or vocal I.D., for vocal identity . 我们称之为VocaliD计划,即声音ID, 用于区别不同的声音。
Now before I get into the details of how the voice is made and let you listen to it, 那么,在我开始讲述 声音是如何制作的,以及让大家听这些声音之前,
I need to give you a real quick speech science lesson. Okay? 我需要先给大家上?可以么?
So first, we know that the voice is changing dramatically over the course of development. 首先,我们知道声音 在其发展过程中会发生巨大的改变。
Children sound different from teens who sound different from adults. 儿童的声音与青少年的声音不同, 而青少年的声音则与成人的声音不同。
We've all experienced this. 我们都经历过这样的改变。
Fact number two is that speech is a combination of the source , which is the vibrations generated by your voice box, which are then pushed through the rest of the vocal tract . 第二,语音是 声源的组合, 也就是你的喉部产生的震动 通过声道 传出来。
These are the chambers of your head and neck that vibrate , and they actually filter that source sound to produce consonants and vowels . 这些是你的头部和颈部 会震动的腔室, 他们会过滤声源 并产生辅音和元音。
So the combination of source and filter is how we produce speech. 所以声源和过滤器的组合 使得我们能够制造语言。
And that happens in one individual. 而这发生在一个个体身上。
Now I told you earlier that I'd spent a good part of my career understanding and studying the source characteristics of people with severe speech disorder, and what I've found is that even though their filters were impaired , they were able to modulate their source: the pitch , the loudness , the tempo of their voice. 早先我告诉过你们 我花了我职业生涯中的很大一部分时间 来了解和学习 那些有着严重言语障碍的人的 声源的特征, 我发现 虽然他们的过滤器受损, 他们仍然能够控制他们的声源, 包括音高、响度和声音的节奏。
These are called prosody, and I've been documenting for years that the prosodic abilities of these individuals are preserved . 这些我们称这些为韵律,而我多年的记录表明 这些人的韵律能力 被保留了下来。
So when I realized that those same cues are also important for speaker identity, 所以当我意识到这些同样的线索 对讲者身份也是非常重要的时候,
I had this idea. 我有了这样一个想法。
Why don't we take the source from the person we want the voice to sound like, because it's preserved, and borrow the filter from someone about the same age and size, because they can articulate speech, and then mix them? 为什么不利用那些 我们希望听到的声音的声源, 因为这个声源是好的, 再借助一个 差不多年龄和体型的人的过滤器, 因为他们可以清晰地发声, 然后把他们组合在一起?
articulate:vt.清晰地发(音); vi.发音; adj.发音清晰的;
Because when we mix them, we can get a voice that's as clear as our surrogate talker -- that's the person we borrowed the filter from— and is similar in identity to our target talker. 因为当我们把它们组合在一起的时候, 我们就可以获得一个 像代理说话者一样清晰的声音, 代理说话者就是我们向其借了过滤器的那个人, 而这个声音又跟我们的目标说话者的身份一致。
surrogate:n.代理;代表;代用品;adj.替代的;代用的;v.代理;代替; talker:n.说话…的人;爱说话的人;
It's that simple. 就这么简单。
That's the science behind what we're doing. 这就是我们在做的研究背后的科学。
So once you have that in mind, how do you go about building this voice? 有了这样的想法以后, 我们又该如何真正去打造这样的声音呢?
Well, you have to find someone who is willing to be a surrogate. 嗯,你必须找到 愿意做代理说话者的人。
It's not such an ominous thing. 这并不是什么有着不祥之兆的事情。
Being a surrogate donor only requires you to say a few hundred to a few thousand utterances . 作为一个代理说话者, 你只需要说上几百个 到几千个话语。
The process goes something like this. 过程大致是这样的。
(Video) Voice: Things happen in pairs. (视频)声音:事情成对发生。
I love to sleep. 我爱睡觉。
The sky is blue without clouds. 天空很蓝,无云。
RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, 卢帕尔·帕特尔:她就这样继续说上 大约三到四个小时, 当然她并不需要说出 目标说话者会说的所有东西,
but the idea is to cover all the different combinations of the sounds that occur in the language. 而只需覆盖到一门语言中的 所有发音的不同组合。
The more speech you have, the better sounding voice you're going to have. 越多的语音样本 就意味着越好的声音质量。
Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. 一旦有了这些录音之后, 我们需要做的就是 将这些录音 解析成语音的小片段, 一两个发声的组合, 有的时候甚至整个的词语 也会出现在数据库里边。
We're going to call this database a voice bank. 我们就将这个数据库称为声音银行。
Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" -- everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utterance. 这个声音银行的作用在于: 基于这个声音银行, 我们现在可以说出任何新的话语, 比如:“我爱巧克力”—— 每个人都应该有可以说出这句话的能力—— 从这个数据库中寻找 并找到说这句话需要的 所有必要的片段。
(Video) Voice: I love chocolate. (视频)声音:我爱巧克力。
RP: So that's speech synthesis. 卢帕尔·帕特尔:这就是语音合成。
It's called concatenative synthesis, and that's what we're using. 这个被称之为衔接合成,而我们用的就是它。
That's not the novel part. 其实这部分并不新奇。
What's novel is how we make it sound like this young woman. 新奇的部分是我们如何制作出听起来 像是这个年轻女性的声音。
This is Samantha. 这是萨曼莎。
I met her when she was nine, and since then, my team and I have been trying to build her a personalized voice. 我第一次见到她的时候,她九岁, 从那时候起,我和我的团队 就一直在努力给她打造一个属于她自己的声音。
We first had to find a surrogate donor, and then we had to have Samantha produce some utterances. 我们首先要找到一个代理说话者, 然后我们让萨曼莎 发出一些声音。
What she can produce are mostly vowel-like sounds, but that's enough for us to extract her source characteristics. 她能做的就是发出一些类似元音的声音, 但这对于我们提取她的声源特征 已经足够了。
What happens next is best described by my daughter's analogy . She's six. 接下来发生的事情最好可以 用我女儿的比喻来描述。她六岁。
She calls it mixing colors to paint voices. 她称其为“用不同的颜色画声音”。
It's beautiful. It's exactly that. 美极了。正是这样。
Samantha's voice is like a concentrated sample of red food dye which we can infuse into the recordings of her surrogate to get a pink voice just like this. 萨曼莎的声音就好比是 浓缩的红色食用色素注入了 她的代理说话者的录音里面, 而产生了这样的粉红色的声音。
(Video) Samantha: Aaaaaah. (视频)萨曼莎:啊……
RP: So now, Samantha can say this. 卢帕尔·帕特尔:那么现在,萨曼莎可以说这样的话。
(Video) Samantha: This voice is only for me. (视频)萨曼莎:这是只属于我的声音。
I can't wait to use my new voice with my friends. 我迫不及待地想跟我的朋友用我 的新声音交流。
RP: Thank you. (Applause) 卢帕尔·帕特尔:谢谢。(掌声)
I'll never forget the gentle smile that spread across her face when she heard that voice for the first time. 我永远不会忘记 当她第一次听到自己的声音的时候, 那个绽放在她脸上的温柔的笑脸。
Now there's millions of people around the world like Samantha, millions, and we've only begun to scratch the surface . 这个世界有上百万 和萨曼莎一样的人,上百万, 而我们其实才刚刚开始。
What we've done so far is we have a few surrogate talkers from around the U.S. 我们到目前为止所做的就是, 我们有来自美国的几个代理说话者,
who have donated their voices, and we have been using those to build our first few personalized voices. 他们捐献了自己的声音, 而我们正在用这些声音 来打造最初的一些个性化的声音。
But there's so much more work to be done. 但是接下来的任务还很重。
For Samantha, her surrogate came from somewhere in the Midwest , a stranger who gave her the gift of voice. 就萨曼莎,她的代理说话者 来自中西部的一个地方, 一个将声音赠送给她的陌生人。
And as a scientist, I'm so excited to take this work out of the laboratory and finally into the real world so it can have real-world impact . 作为一名科学家,我很期待 将这项工作搬到实验室之外, 最终搬进现实世界 并产生真正的影响。
What I want to share with you next is how I envision taking this work to that next level. 我接下来想跟你们分享的是 我对如何将这项工作 推进到下一个层次的展望。
I imagine a whole world of surrogate donors from all walks of life, different sizes, different ages, coming together in this voice drive to give people voices that are as colorful as their personalities. 我想象到一个充满了代理说话者的世界, 他们来自不同的行业,有着不同的体型和年龄, 他们为这个声音计划走到一起, 希望赋予人们 和他们的性格一样丰富多彩的声音。
To do that as a first step, we've put together this website, VocaliD.org, as a way to bring together those who want to join us as voice donors, as expertise donors, in whatever way to make this vision a reality. 实现这个目标的第一步, 我们建立了一个网站:VocaliD.org, 通过这个网站,我们把 愿意以声音捐献者或专业知识捐献者的身份 加入到我们的人们团结在一起, 不管以何种方式,来一起实现这个愿景。
They say that giving blood can save lives. 人们说献血可以拯救生命。
Well, giving your voice can change lives. 那么,捐献您的声音可以改变生命。
All we need is a few hours of speech from our surrogate talker, and as little as a vowel from our target talker, to create a unique vocal identity. 我们需要的仅仅是几小时的 代理说话者的话语, 以及目标说话者的一个小小的元音, 就可以打造一个独特的声音。
So that's the science behind what we're doing. 这就是我们所做的研究背后的科学。
I want to end by circling back to the human side that is really the inspiration for this work. 作为结尾,我还是想回到人的主题, 这也是这项工作的真正灵感来源。
About five years ago, we built our very first voice for a little boy named William. 大约五年前,我们第一次给一个名为威廉的男孩 打造了他的声音。
When his mom first heard this voice, she said, "This is what William would have sounded like had he been able to speak." 当他的妈妈第一次听到这个声音的时候, 她说:“如果威廉 可以讲话, 他的声音就该是这样的。
And then I saw William typing a message on his device. 然后我看到威廉在他的设备上 打出一条消息。
I wondered, what was he thinking? 我在想,他在想什么?
Imagine carrying around someone else's voice for nine years and finally finding your own voice. 想象一下九年来一直用着 别人的声音, 然后最终找到了你自己的声音。
Imagine that. 想象一下。
This is what William said: "Never heard me before." 威廉说的是: “我从来没有听过我自己的声音。”
Thank you. 谢谢。
(Applause) (掌声)