返回首页

JosephRedmon_2017-_电脑是如何学习即时辨识物体的?_

Ten years ago, computer vision researchers thought that getting a computer to tell the difference between a cat and a dog would be almost impossible, even with the significant advance in the state of artificial intelligence . 10 年前, 电脑视觉研究人员认为, 要让电脑辨别猫与狗的差别, 几乎是比登天还难, 即使用了相当先进的 人工智慧都很难办到。
vision:n.视力;美景;幻象;想象力;v.想象;显现;梦见; significant:adj.重大的;有效的;有意义的;值得注意的;意味深长的;n.象征;有意义的事物; artificial intelligence:n.人工智能;
Now we can do it at a level greater than 99 percent accuracy . 现在我们可以把辨别的准确度 提升到 99% 以上。
accuracy:n.[数]精确度,准确性;
This is called image classification -- give it an image, put a label to that image -- and computers know thousands of other categories as well. 这技术叫做图像分类—— 给电脑看图片, 并给图片贴上标签—— 电脑还可以识别出 许多其它类别的东西。
classification:n.分类;分级;分类法;归类; label:n.标签;标记;谓;唱片公司;v.贴标签于;用标签标明; categories:n.(人或事物的)类别,种类(category的复数)
I'm a graduate student at the University of Washington, and I work on a project called Darknet, which is a neural network framework for training and testing computer vision models. 我目前是华盛顿大学的研究生, 我正在做一个专题叫做「暗黑网路」, 它是一个用来训练及测试 电脑视觉模型的神经网路架构。
neural network:n.神经网络;
So let's just see what Darknet thinks of this image that we have. 所以,让我们来瞧瞧暗黑网路 对我们照片识别能力的状况。
When we run our classifier on this image, we see we don't just get a prediction of dog or cat, we actually get specific breed predictions . 当我们在这张照片上 开启我们的分类器, 可以看到电脑现在不只 在预测这是狗或猫, 它实际上正在擷取特定品种的预测。
classifier:n.[测][遥感]分类器; specific:adj.特殊的,特定的;明确的;详细的;[药]具有特效的;n.特性;细节;特效药; breed:v.繁殖;孕育;培育(动植物);导致;以…方式教育;n.品种; predictions:n.预测,预言(prediction复数形式);
That's the level of granularity we have now. 这就是现在我们电脑的粒度等级。
granularity:n.间隔尺寸,[岩]粒度;
And it's correct. 辨别正确。
My dog is in fact a malamute . 我的狗的确是只雪橇犬。
malamute:n.北极狗;爱斯基摩狗(等于malemute,Alaskanmalamute);
So we've made amazing strides in image classification, but what happens when we run our classifier on an image that looks like this? 所以,我们在图像识别上 已经有了很大的进步, 但如果我们用识别器 来辨别这样的照片呢?
strides:n.大步;步幅(stride的复数形式);v.跨过;迈步(stride的第三人称单数形式);
Well ... 嗯……
We see that the classifier comes back with a pretty similar prediction. 可以看到从分类器 得到的预测也相当类似。
And it's correct, there is a malamute in the image, but just given this label, we don't actually know that much about what's going on in the image. 没错,图片中有一只雪橇狗, 但它只给出一个标签, 我们对这张照片的理解 还不是很完整。
We need something more powerful. 我们需要更强的东西。
I work on a problem called object detection , where we look at an image and try to find all of the objects, put bounding boxes around them and say what those objects are. 我正在研究一个问题, 叫做「物件侦测」, 我们把一张照片中的 所有物体都找出来, 用边界框把它们框起来, 然后标示它们是那些东西。
detection:n.侦查,探测;发觉,发现;察觉;
So here's what happens when we run a detector on this image. 我们来看一下当我们在这一张图片上 执行侦测软体时,会发生甚么事。
detector:n.探测器;检测器;发现者;侦察器;
Now, with this kind of result, we can do a lot more with our computer vision algorithms. 现在,有了这类的结果, 我们就可以利用电脑视觉演算法, 幫我们做更多的事。
We see that it knows that there's a cat and a dog. 我们可以看到, 电脑知道图片中有一只猫和狗。
It knows their relative locations , their size. 它知道牠们彼此的相对位置、 大小。
relative:adj.相对的;有关系的;成比例的;n.亲戚;相关物;[语]关系词;亲缘植物; locations:n.地方;地点;位置;定位(location的复数)
It may even know some extra information. 电脑甚至可能知道其它的资讯。
extra:adj.额外的:n.额外的事物:adv.额外:另外:
There's a book sitting in the background. 它也看到了背景中有一本书。
And if you want to build a system on top of computer vision, say a self-driving vehicle or a robotic system, this is the kind of information that you want. 如果你想要建立一个 基于电脑视觉系统的实用系统, 比如说,自动驾驶车或机械人系统, 这类就会是你想要的资讯。
self-driving:自驾; vehicle:n.[车辆]车辆;工具;交通工具;运载工具;传播媒介;媒介物; robotic:adj.机器人的,像机器人的;自动的;n.机器人学;
You want something so that you can interact with the physical world. 你会想要一个可以 与实体世界互动的东西。
interact:v.互相影响;互相作用;n.幕间剧;幕间休息; physical:adj.[物]物理的;身体的;物质的;符合自然法则的;n.体格检查;
Now, when I started working on object detection, it took 20 seconds to process a single image. 当我开始做物件侦测时, 它要花 20 秒才能处理一张图片。
process:v.处理;加工;列队行进;n.过程,进行;方法,adj.经过特殊加工(或处理)的;
And to get a feel for why speed is so important in this domain, here's an example of an object detector that takes two seconds to process an image. 为了让各位体会 为什么这个领域这么讲究速度, 我这边做个执行物件侦测器的示范, 一张照片只要 2 秒的处理时间。
So this is 10 times faster than the 20-seconds-per-image detector, and you can see that by the time it makes predictions, the entire state of the world has changed, and this wouldn't be very useful for an application . 所以,比 20 秒一张的侦测器 快了 10 倍, 各位可以看到, 在它识别图像的过程中, 周围环境已经发生了变化, 但对一个应用软体而言, 这样的速度是很鷄肋的。
application:n.应用;申请;应用程序;敷用;
If we speed this up by another factor of 10, this is a detector running at five frames per second. 如果我们把另一个参数调升到 10 , 这个侦测器每秒 就可以识别 5 张图片。
factor:n.因素;要素;[物]因数;代理人;v.做代理商;v.把…作为因素计入; frames:n.[计][电子][通信]帧,[电影]画面;[建][计]框架;眼镜架(frame的复数);
This is a lot better, but for example, if there's any significant movement, 这样好多了, 但,假如, 移动很快的时候……
I wouldn't want a system like this driving my car. 我可不想在我车上装这样慢的系统。
This is our detection system running in real time on my laptop . 这是在我笔电上运行的 即时侦测系统。
real time:adj.实时的;接到指示立即执行的; laptop:n.便携式电脑;笔记本电脑;
So it smoothly tracks me as I move around the frame, and it's robust to a wide variety of changes in size, pose , forward, backward . 我在框框附近移动的时候, 它可以很顺畅地追踪着我, 而且,它可以根据不同的大小、 姿势、 前、后来做调整。
smoothly:adv.平稳地,平滑地;流畅地,流利地; tracks:n.小道;足迹;车辙;轨道;v.追踪;跟踪;(track的第三人称单数和复数) move around:v.走来走去;绕着…来回转; robust:adj.强健的;健康的;粗野的;粗鲁的; variety:n.多样;种类;杂耍;变化,多样化; pose:v.引起; n.装腔作势; (为画像、拍照等摆的)姿势; backward:adj.向后的;反向的;发展迟缓的;adv.向后地;相反地;
This is great. 太棒了。
This is what we really need if we're going to build systems on top of computer vision. 如果我们要建立一个 基于电脑视觉系统的实用系统, 这个才会是我真正想要的。
(Applause)
So in just a few years, to 20 milliseconds per image, a thousand times faster. 所以,才几年的时间, 进步到每张照片只要 20 毫秒, 快了 1000 倍。
milliseconds:n.[计量]毫秒(millisecond的复数形式);
How did we get there? 我们是如何办到的?
Well, in the past, object detection systems would take an image like this and split it into a bunch of regions and then run a classifier on each of these regions, and high scores for that classifier would be considered detections in the image. 过去,物件侦测系统, 会把一张像这样的照片, 分割成好几个小区块, 然后在每一个小区块 运行分类器软体, 相似度得分如果比较高 会被识别器认为照片侦测成功。
split:v.分离;使分离;劈开;离开;分解;n.劈开;裂缝;adj.劈开的; a bunch of:一群;一束;一堆; regions:n.地区;地域;行政区;左近;(region的复数)
But this involved running a classifier thousands of times over an image, thousands of neural network evaluations to produce detection. 但这样一张图片要执行 好几千次的识别指令、 经过好几千次的神经网路评估 才有办法侦测出来。
involved:adj.有关的; v.涉及; (involve的过去式和过去分词) evaluations:n.[审计]评估(evaluation的复数);
Instead, we trained a single network to do all of detection for us. 但我们不是这样做,我们训练了一个 网路模型来幫我们完成所有的侦测。
It produces all of the bounding boxes and class probabilities simultaneously . 它可以同时产出边界框 并同时对可能的结果进行评估。
probabilities:可能性;[统计]概率(probability的复数); simultaneously:adv.同时地;
With our system, instead of looking at an image thousands of times to produce detection, you only look once, and that's why we call it the YOLO method of object detection. 有了我们的系统, 你就不用一张图片看了好几千遍 才能侦测出来。 你只要看一眼 (YOLO), 所以我们简称这个 物件侦测技术为「YOLO」。
So with this speed, we're not just limited to images; we can process video in real time. 所以,有了这样的辨识速度, 我们不只可以侦测图片; 还可以处理即时的影片。
limited:adj.有限的; n.高级快车; v.限制; (limit的过去分词和过去式)
And now, instead of just seeing that cat and dog, we can see them move around and interact with each other. 现在各位看到的不是 猫、狗的静态图片, 而是有牠们在移动、 互动的动态影片。
This is a detector that we trained on 80 different classes in Microsoft's COCO dataset . 这是我们用微软 COCO 资料集里 训练出来的辨识器。
COCO:n.椰子(果);椰子树(等于coconutpalm);脑袋;adj.椰子壳的纤维所制的; dataset:na.数据集;数传机;
It has all sorts of things like spoon and fork, bowl, common objects like that. 它包含各种东西, 像是汤匙、叉子、碗 这类的日常用品。
It has a variety of more exotic things: animals, cars, zebras, giraffes. 它还有很多奇妙的东西: 动物、车子、斑马、长颈鹿。
exotic:adj.异国的;外来的;异国情调的;
And now we're going to do something fun. 现在我们要进行一件好玩的事。
We're just going to go out into the audience and see what kind of things we can detect. 我们会进到观众席, 去看看能辨识到哪些东西。
Does anyone want a stuffed animal ? 有谁要填充娃娃?
stuffed animal:n.填充玩具动物;(动物造型的)布绒玩具;动物标本;
There are some teddy bears out there. 这边还有一些泰迪熊。
teddy:泰迪玩具熊
And we can turn down our threshold for detection a little bit, so we can find more of you guys out in the audience. 我们现在降低一下 对侦测结果的精确度的要求, 这样我们可以在观众席中 找到更多东西。
threshold:n.入口;门槛;开始;极限;临界值;
Let's see if we can get these stop signs. 我们来看看能不能侦测到停止标志。
We find some backpacks . 我们有侦测到一些背包。
backpacks:n.背包(backpack的复数);双肩背包;
Let's just zoom in a little bit. 现在把镜头拉近一点。
And this is great. 这真的很厉害。
And all of the processing is happening in real time on the laptop. 所有的侦测流程 都可以在笔电里即时呈现。
processing:v.加工;处理;审核;数据处理;v.列队行进;缓缓前进;(process的现在分词)
And it's important to remember that this is a general purpose object detection system, so we can train this for any image domain. 更重要的是, 这只是一个一般用的物件侦测系统, 我们还可以训练它 辨别任何领域的照片。
The same code that we use to find stop signs or pedestrians , bicycles in a self-driving vehicle, can be used to find cancer cells in a tissue biopsy . 同样的程式码, 放在自动驾驶车里, 可以侦测到停止标志、行人、 脚踏车, 但放到组织切片 就可以侦测出癌症细胞。
pedestrians:n.行人(pedestrian的复数); cancer:n.癌症;恶性肿瘤; tissue:n.纸巾,手巾纸;(人、动植物细胞的)组织; biopsy:n.活组织检查;切片检查法;v.活组织检查;切片检查法;
And there are researchers around the globe already using this technology for advances in things like medicine, robotics . 现在全球有很多研究人员 已经开始在使用这项技术 做进一步的研究, 像是医药、机械人领域。
technology:n.技术;工艺;术语; robotics:n.机器人学;
This morning, I read a paper where they were taking a census of animals in Nairobi National Park with YOLO as part of this detection system. 今天早上,我读到一篇文章, 在奈洛比国家公园里, 他们要对动物们进行统计调查, YOLO 就是其使用的 侦测系统的一部分。
census:vt.实施统计调查;n.人口普查,人口调查; National Park:n.国家公园;
And that's because Darknet is open source and in the public domain , free for anyone to use. 而这一切都是因为 暗黑网路是开放原始码, 在公众领域, 任何人都可以免费使用。
source:n.来源;水源;原始资料; public domain:n.(用于不受版权保护的财产)公有领域;
(Applause) (掌声)
But we wanted to make detection even more accessible and usable, so through a combination of model optimization , network binarization and approximation , we actually have object detection running on a phone. 但我们希望侦测系统 可以更亲民、更好用, 所以在经过模型优化、 网路二值化及近似度化的整合后, 我们终于可以在手机上侦测物件。
accessible:adj.易接近的;可进入的;可理解的; combination:n.结合;组合;联合;[化学]化合; optimization:n.最佳化,最优化; approximation:n.[数]近似法;接近;[数]近似值;
(Applause) (掌声)
And I'm really excited because now we have a pretty powerful solution to this low-level computer vision problem, and anyone can take it and build something with it. 而我真的相当兴奋,因为我们现在 在低阶的电脑影像处理问题上 有了相当强力的解决方式, 任何人都可以拿去并创造一些东西。
solution:n.解决方案;溶液;溶解;解答; low-level:adj.低水平的;低级别的;
So now the rest is up to all of you and people around the world with access to this software, and I can't wait to see what people will build with this technology. 所以,接下来就看各位 以及全世界所有人 用这个软体大展身手了, 我真的等不及想看看你们 用这项科技所做出来的产品。
Thank you. 谢谢。
(Applause) (掌声)