Efficient Computing for Deep Learning, Robotics, and AI (Vivienne Sze) | MIT Deep Learning Series

Lecture by Vivienne Sze in January 2020, part of the MIT Deep Learning Lecture Series.

we’d have Viviane see here with us she’s a professor here at MIT working in the very important and exciting space of developing energy efficient and high-performance systems for machine learning computer vision and other multimedia applications this involves joint design of algorithms architectures circus systems to enable optimal trade-offs between power speed and quality of results one of the important differences between the human brain and AI systems is the energy efficiency of the brain so Vivian is a world-class researcher at the forefront of discovering how we can close that gap so please give her a warm welcome I’m really happy to be here to share some of the research and an overview of this area efficient computing so actually what I’m going to be talking about today is gonna be a little bit broader than just deep learning will start with deep learning but we will also move to you know how we might apply this to robotics and other AI tasks and why and why it’s really important to have efficient computing to enable a lot of these exciting applications also I just want to mention that a lot of the work I’m going to present today is not done by myself but in collaboration with a lot of folks at MIT over here and of course if you want access to the slides are available on our website so given that it’s the deep learning lecture series I want to first start out talking up a little bit about deep neural nets so we know that deep neural Nets has you know generate a lot of a lot of interest has a very many very compelling applications but one of the things that has you know come in to light over the past few years is increasing need of compute opene I actually showed over the past few years that there’s been a significant increase in the amount of compute that is required to form deep learning applications and to do the training for deep learning over the past few years so it’s actually grown exponentially over the past few years it’s don’t grow in fact by over three hundred thousand times in terms of the amount of compute we need to drive and increase the accuracy a lot of the tasks that we’re trying to achieve at the same time if we start looking at basically the environmental implications of all of this processing can be quite severe so if we look at for example the carbon footprint of you know training neural nets if you think of you know the amount of carbon footprint of flying across North America from New York to San Francisco or the carbon footprint of an average human life you can see that you know neural networks are orders of magnitude greater than that so the environmental or carbon footprint implications of computing for deep neural nets can be quite severe as well now this is a lot having to do with compute in the cloud another important area where we want to do compute is actually moving the compute from the cloud to the edge itself into the device where a lot of the data is being collected so why would we want to do that so there’s a couple of reasons first of all communication so in a lot of places around the world and just even a lot of just placing is generally you might not have a very strong communication infrastructure right so you don’t want to necessarily to rely on a communication network in order to do a lot of these applications so again you know removing your tethering from the cloud is important another reason is a lot of the times that we you know apply deep learning on a lot of applications where the data is very sensitive so you can think about things like health care where you’re collecting very sensitive data and so privacy and security again is really critical and you would rather than sending the data to the cloud you’d like to bring the compute to the data itself finally another compelling reason for you know bringing the compute into the device or into the robot is latency so this is particularly true for interactive applications so you can think of things like autonomous navigation robotics or self-driving vehicles where you need to interact with the real world you can imagine if you’re driving very quickly down the highway and you detect an obstacle you might not have enough time to send the data to the cloud wait for it to be processed and send the instruction back in so again you want to move the compute into the robot or into the vehicle itself okay so hopefully this is establishing why we want to move the compute into the edge but one of the big challenges of doing processing in the robot or in the device actually has to do with power consumption itself so if we take the self-driving car as an example been reported that it consumes over 2000 watts of power just for the computation itself just to process all the sensor data that it’s collecting right and this actually generates a lot of heat it takes up a lot of space you can see in this prototype that’s being placed in all the computes a specs are being placed in the trunk generates a lot of heat it generates and often needs water cooling so this can be a big cost and logistical challenges for self-driving vehicles now you can imagine that this is gonna be much more challenging if we shrink shrink down the form factor of the device itself to something that is perhaps portable in your hands you can think about smaller robots or something like your smartphone or cell phone in these particular cases when you think about portable devices you actually have very limited energy capacity and this is based on the fact that though battery itself is limited in terms of the size weight and its cost right so you can’t have very large amount of energy on these particular devices itself secondly when you take a look at you know the embedded platforms that are currently used for embedded processing for these particular applications they tend to consume you know over 10 watts which is an order of magnitude higher than the power consumption that you typically would allow for for these particular handheld devices so in these handheld devices typically you’re limited to under a watt due to the heat dissipation for example you don’t want your cell phone to get super hot ok so in the past you know decade or so or decades what we would do to address this challenge is that we would wait for transistors become smaller faster and more efficient however this has become a challenge over the past few years so transistors are not getting more efficient so for example Moore’s Law which typically makes transistors smaller and faster has been slowing down and Dennard scaling which has made transistors more efficient has also slowed down our endeth so you can see here over the past 10 years this trend has really flattened out ok so this is a particular challenge because we want more and more compute to drive deep neural network applications but the transistors are not becoming more efficient right so what we have to turn to in order to address this is we need to turn towards specialized hardware to achieve the significant speed and energy throughputs that we require for our particular application and we talked about designing specialized Harvard this is really about thinking about how we can redesign the hardware from the ground up particularly targeted at these AI deep learning and robotic tasks that we’re really excited about okay so this notion is not new in fact it’s become extremely popular to do this over the past few years there’s been a large number of startups and companies that have focused on building specialized hardware for deep learning so in fact New York Times reported I guess it’s two years ago that there’s a record number of startups looking at building specialized hardware for AI and for a deep learning okay so we’ll talk a little bit about what specialized hardware looks like for th