Inside the Brain Science of Driver-Assist Technology

Semi Trucks on Bridge

A conversation between Rick Duchalski, CarriersEdge, and Dr. Benjamin Wolfe, University of Toronto’s Applied Perception and Psychophysics Lab. 

This blog has been edited down for conciseness. To listen to the entire conversation, please check out their August episode of the Inside Webinar series CarriersEdge has on their YouTube channel.

My lab, the APPLY lab, sits in this interesting middle ground between vision science, which is what I’m originally trained in, hence why I hadn’t started doing driving until after my Ph.D. Vision science, which is the study of how we perceive, how our brains interpret visual input, and human factors and driver behavior, which I think to this audience, doesn’t need much explanation. What I’m really interested in is, how does studying driving teach us about how we see and how we interact with the world. And the nice thing about doing that work is it puts me in this middle ground between the two where I can draw on what we think we know about the mindand the brain, but look at driving as a behavior that we all do all the time, and ask, how can we do it? How can we make it safer? And what are the gaps in our knowledge that might be impairing us in our ability to improve safety and improve design?

So, what that means in practice is that we build laboratory experiments using real road video. We are the lab that created one of the major corpuses. And we really ask, you know, how quickly can you understand your environment? What happens when we distract you, what happens when you look at places in this video? There’s a lot of questions that we can ask safely in the lab that are very difficult to ask either in simulator or on road. We do this in a way that is both informative to an applied practitioner community and relevant to a more fundamental research community. We tend to call this doing “use-inspired research;” which is to say, we look to the world for things that we do – as opposed to things that live in the textbook – drag them into the lab and try to come up with good experiments that let us understand what’s going on. Part of that work is not just, you know, publishing a paper that a few of my friends and colleagues might read, but communicating those results to communities that will use them and benefit from them. 

Yes, I would say that that is really the core of the lab’s interests. Is the question of, okay, we all know that’s exactly what we’re doing behind the wheel. How do we do this? Yes, you look at the road. What does it mean for you to process that visual information? How quickly do it?

A piece of work we did a few years ago was, I’m going to sort of segue between the vision side and the automated driving side. We got interested in the question of what’s the worst-case scenario for a distracted driver or a driver in a semi autonomous vehicle, which is to say you’re looking away from the road, and let’s say you’ve been looking away from the road for long enough that you don’t know anything about what’s going on around you. We’ll put a pin in whether there are any actual autonomous systems being sold that would safely let you do this. You look back at the road, long does it take you to understand that the giant thousand-poundmoose is walking into the road? We’ve built that experiment in lab. And the answer is that you can understand that scene well enough with no prior knowledge of that particular environment, very quickly. You can do it in the time it takes you to make a couple of eye movements, so maybe half a second at most. That’s not enough for you to be a safe driver, but it says some interesting things about how quickly you can update your representation of the world, which is what, of course, sits underneath processes like situation awareness. 

There are a ton of them. Initially, one of the big ones is thinking about, just from a from a visual input standpoint, how do we gather information? Our intuition might be, oh, if I want to know about something, I’m going to look at it, and that makes some degree of sense on the surface. The issue there is that if you need to make an eye movement to see something, to be aware of it, making a directed eye movement, what I would call a saccade that’s the term we use for these,that takes time, and you can only make so many of them. Behind the wheel, you don’t want to be making lots of eye movements that you have to think about. You don’t think about your own eye movements most of the time. You can make two or three of these per second. One of the big questions we’ve been asking is, do you have to actually look at a particular object, or can you understand the scene well enough with visual input from around where you’re looking, what we would call peripheral vision? 

There are a whole lot of ideas in this space. One of the things that we’re actively studying is basically, where do you look? When do you look? How do you acquire information, both from right where you’re looking and from your surroundings?

That’s an incredibly cool question. My intuition as someone who studies the act the acquisitive process that these drivers are using is that this is probably learnable, but it’s probably hard to turn into training. So unfortunately, I probably have “here’s the silver bullet for how to turn people into perfect or near perfect drivers.” At a guess – and this is based on my lab’s research and my own scholarship – what they’re doing is they’re getting very good at using peripheral vision. They’re getting very good at monitoring their environment without having to make a lot of eye movements, without having to make a lot of head movements, and they’re learning to interpret what would otherwise be kind of ambiguous information around them in ways that help guide their behavior on the road. 

There’s actually quite a lot of precedent for this. There’s 50 plus years of evidence on eye tracking data on road showing that as people become more skilled drivers, they make relatively fewer large eye movements away from their lane of travel. That turns out to be a very well-known signature of expertise. What’s very interesting is when we’ve looked across other categories of visual expertise. Obviously, driving is a learned expert skill. In fact, our lives are full of them. I always like to describe it as you know, probably all of us are expert pedestrians, even if we don’t think about it. But it’s the same skill that we’ve learned to be extremely good at, and we don’t think about that’s also driven by our environment from around where we’re looking, from peripheral vision. So, my guess is some of the key to this is probably figuring out how to train drivers to interpret somewhat ambiguous information, to have enough experience of variable environments, to know what that means, and to act accordingly. Whether that’s something that we could turn into a new training system tomorrow, perhaps not, but that might give us the starting point to think about how to do that. 

Cognitive load and general driver distraction is quite justifiably a huge question for safety. I would say that there’s some complexity here that’s worth thinking about. To think of distraction in a driving context, is that you should try to avoid distraction at all costs. That seems like the nice, simple take home message. My view of things coming out of a cognitive psychology background is that it would be lovely if we could single task on driving, but it’s really a constellation of tasks flying in loose formation. But the idea of actually mono-tasking on the driving task is probably at odds with reality. So, we’re always multitasking to some degree. 

What’s interesting in my mind and what’s a critical question in safety is to pick apart what we mean by distraction. One piece is sort of the classic distraction, sort of often studied as audio-verbal distraction, which is, you’re on a cell phone, you’re on the radio, something like that. And that’s for instance, why in Ontario we, you know, we forbid people to hold their phone while driving, where they’re required to use hands free operation. But there’s an entire additional layer of action that tends to get lumped in here that we don’t think about, and that’s the question of visual distraction. If you the driver must engage with a control or a display in the cab, that’s going to take your eyes off road. We see this all the time, whether it’s in passenger vehicles or in commercial vehicles. This is basically ubiquitous, and yet we don’t think of as much problem as it really is. 

If you’ve taken a ride share, of course, you’ve seen this with the driver with their smartphone on the windshield or on the dash. If you’re thinking about a commercial vehicle, you probably have, not just GPS guidance, but you probably have additional displays in vehicle; any of those driver gaze off the road, will basically impart what I would call visual distraction. I’m not saying that this is necessarily a bad thing, but it’s something that we need to think about as a somewhat problem from Canonical distraction as one big umbrella.

If we think visual distraction, you’re moving your gaze away from the road. That’s reducing the driver’s ability to actually monitor their environment, build a representation of it, and really understand what’s going on around them, whereas with audio-verbal distraction, in theory, the driver keeps their eyes on the road. Both will draw on the available mental and cognitive resources support safe driving, but they’ll do it in different ways, and there are limitations that they’ll impose. 

It isn’t inherently problematic. You want to be doing some of it. One of the things that your brain does is it stitches those different views of the world together. Every time you move your eyes, your view of the world changes, by definition, into your continuous perceived experience of the world. In human factors, we tend to think of this as situation awareness. I tend to think of this as a more visual process, but that’s fundamentally what you’re doing. You will always be moving your eyes around, gathering information that you think you need. So yes, absolutely, you’re going to look at the mirrors. You might look at the center console, you might look at the speedometer, and all of this is part and parcel driving. 

Where things can go awry is when we add additional places for the driver to look. So, if you add, let’s say, a smartphone, and it’s picking up all of the text messages from your friends and family. Maybe you shouldn’t be trying to read those on the road. 

You’re always going to be making these eye movements, and the question is, how do we make sure that the driver has the information they need, rather than perhaps we could almost call it the information they need versus the information they want. 

My sense is, provided there’s enough training for that kind of a new system, what you will wind up with is a probably better than you had before representation of the surrounding environment around the vehicle. I would bet that the training to transition from mirrors and the knowledge of their blind spots and knowledge of what you can’t see, to these camera-based displays is probably a somewhat rough transition. Thinking back to a conversation I had with representatives of one of the major transport unions in the United States, who was interested in exactly this tech, from a peripheral vision standpoint, and whether we thought this was a good idea. From my standpoint, as someone who studies the visual components of driving is that, yes, this could be quite useful, but you wouldn’t necessarily want to drop someone cold into it, because it’s not how they’re going to be expecting to acquire the information that they need. That doesn’t get a solvable problem, my guess is you’d adapt readily, but it would definitely besufficiently different from what everyone would expect in the vehicle that it would take some significant training. 

Yes. I sit in a very interesting middle ground between very fundamental vision research which tends to not get consulted by practitioner communities at all. Generally, people look at the very fundamental work and go, this is way too basic for me. I’m also not quite in the main human factors community, because I’m interested in these underlying mechanisms and processes. And so, I would say, yes, people should be talking to me and my lab, because we’re answering particular questions that often sit underneath a lot of what people to build. I’m often very interested in, what are humans actually capable of? What can we do? What can’t we do and why? Having that understanding and that approach really means that I’m able to answer questions that they have in ways that they might not get from my colleagues, say, in engineering, rather than in psychology. 

Something like cruise control, which has been around for decades, is very simple. Something like advanced cruise control (ACC), which would require the vehicle to have some monitoring of the road environment, it would have some ability to maintain lane keeping that takes a camera array. That’s something that we’ve seen coming to passenger vehicles in the last decade. We have everything from ACC and systems like that all the way to what we’re getting right now in the passenger vehicle space is what I would call mid-level autonomy. This would be particularly both SAE two and three. Most of what we’re seeing is SAE level two, which is where the vehicle itself can be in operational control, with the driver assumed in an active monitoring role. So, if you’re thinking of, you know, Tesla Autopilot, that’s an SAE level two system. There are the very beginnings of publicly available SAE level three things like GM super Cruise, where the assumption is that the vehicle can be in operational control and notify the driver when they need to take over. That’s a particularly interesting system from my vantage point, because one of the ways they’re trying to make that safe is not just road mapping in advance and having areas where it’s geofenced function, but also yoking it to in cab eye tracking for the driver. 

Those are where actual autonomy lives. The SAE levels go up to four and five, which are, I think, wonderfully optimistic views of how difficult it is to solve vehicular automation. Level four would be the driver very rarely has to intervene. Level five, I usually refer to it as what diver? They’re the perfect autonomous taxi. We’re not seeing that for a while. 

What I would say for autonomous trucks is, in a highway environment, a grade separated road environment, you have a better chance of making that work than you do otherwise. As soon as you want to move into a non-highway environment that becomes an enormously difficult problem. This is the problem that every passenger vehicle automation company has run aground on. You’ll notice that there are no commercial products right now sitting at even level four, even high level three. The idea that that the robot could load up your trailer, could drive out of the facility, get itself on the highway, drive on the highway, get off the highway, back itself into the loading bay, and the next robot can unload – that sounds like good science fiction to me, and not perhaps likely, given our current state of reality. We must actually understand what people will do, what human behavior looks like, and what the limits of that mean for the technology, how to use it safely.

There’s a lot of complexity baked into that one of these, one of the pieces here is the reliability of these systems. This is something we should think about, not just when it comes to driver assistance tech, like, say, emergency braking or ACC, but even for something as simple as driver alerts. Almost everyone’s vehicle these days is going to have to have blind spot warning. We probably also all have the experience of that going off randomly. It thinks there’s something here, there obviously isn’t. The reliability of those cues needs to be significantly better than what the driver themselves will have. This is based on work that my lab has done on questions of cueing, which is a classic question in cognitive psychology. What we found is that if these systems are insufficiently reliable, they’re not better than you in fact they’re worse than you, you ignore them, you turn them off, which is actually a huge issue from a trust and deployment standpoint.

If they’re mandatory systems, if you can’t turn them off, what you have done is frustrate the driver and probably reduce safety, even though you have built a system that might be designed to keep them safe. That doesn’t mean you don’t build it, that doesn’t mean you don’t deploy it, but it means thinking not just about the reliability of the technology, but also of the second order consequences of deploying it badly. 

I think that’s the big piece. How do you build trust in these systems? That comes down to classic problems in like human factors automation, where you want you want reliable systems. You want transparent systems, and the driver needs to know what they’re doing. You need to have systems that behave in an expected manner. 

Absolutely, I always think of this as looking for pain points. I think it’s remarkably important to test that new tech with a small group, get a sense of what failure modes they encounter, and they will encounter them regardless of how well a new product is designed. It can be tested. It can be tested by lots of very careful, diligent people, and when you put it into the field, people are going to find ways to use it and misuse it in ways designers never conceived of. So yes, absolutely. Testing it in small groups, really listening to those users for what they can tell you about that technology and their experience with it and using that to adapt the training and the large scaledeployment is undoubtedly enormously helpful, because it really will support that rollout, and it’ll, frankly, if you’re the safety manager, it’ll make your job easier.

That’s a piece of it. Let’s say you have an auditory or a visual alert in the vehicle. One you need those alerts to obviously be sufficiently distinct and recognizable that the driver knows what they’re what they’re being told. If it’s a bunch of beeps that sounded the same, the driver will merely be confused and not particularly helped by that alert. If the alert itself is provided too early, they won’t necessarily know what that pertains to. Let’s say we have cameras monitoring the road and the road environment around the driver trying to support the driver and their awareness of the environment. Let’s say that system picks up on something it thinks is going to happen in a few seconds. There’s enough change on the roadway the space of multiple seconds that may be very hard for the driver to know what to respond to. 

On the flip side, you could have a that was too close in time to something changing. If something is changing, whether it’s in the vehicle itself or outside, and the driver gets an alert, and they have half a second built into that system to respond, that’s perhaps a little ambitious. So that’s one of the big issues in that space. 

Another piece of that is the reliability of these cues is one of these key pieces, whether it is driver alerting or anything else, if these are not sufficiently valid – and this is actually one of these interesting places that my lab has run into where what we think of as a valid queue in the fundamental research world is 80 per cent accurate. No one would tolerate an 80 per cent accurate queue in any vehicle they ever drove. It would be you turned off that system instantly, because 20 per cent of the time, it wouldn’t be of any utility. Imagine that on the road, it’s an enormous safety liability because it makes the vehicle unveiled to surrounding vehicles that that’s going to increase your chance of a collision enormously. What’s interesting with all these systems, to me and from the work that my lab has done, is to say we need to know what the range is that we’re we actually need to deploy this. How much better can we make the driver with the right alerts at the right time? For something like a simple auditory queue, this is literally based on a study we ran last year, if we’re looking for dangerous things on the road, which is often what my lab studies, people were about 92 per cent accurate in the lab in a particular study that we ran. If the queue was any less accurate than that, it didn’t matter. You ignored it because it didn’t help you. 

Knowing what the drivers can already do, is going to then dictate how you support them and figure out what their needs truly are. 

It depends on what you want to do with this. If what you want is, when did my driver look away from the forward roadway, or when they’re not looking at the windshield or not looking at the mirrors or something like that. That system can probably tell you that what it can’t tell you is what they were doing. All it can say is, hey, they were looking here or here. What a lot of people have wanted to do with these systems, where things get very, very tricky, is wanting to use them to build a computational model of driver’s situation awareness, and that those systems cannot do. Doing that requires you need to know where the driver was looking with quite a lot of detail. Those systems are not capable of this. You need to know what was in the driver’s environment, so you need that big camera array facing out, as well as the computer vision systems to interpret that, and you need some ability to interpret the driver’s internal mental state and try to figure out, okay, why did they do this? Were they looking down at a display in the vehicle? Did they decide to be reckless and pull out their phone and text? 

Without knowing all of that, you can’t build that whole model. If you just want a crude metric of attention, like gaze on or off roadway, yes, that totally works. When you want to go beyond that, no. That’s a problem that in a fundamental research space people have been trying to solve since the 1950’s. People have been trying to find ways around this since at least 1957 with some of the earliest modern eye tracking research. It is a continuous fight in my research world, and the answer right now is if someone is trying to sell you the magical AI based gaze monitoring “I know exactly what your driver did and when they did it,” they’re probably selling you a certain amount of snake oil, and I hope you have a squeaky snake.

I would say think of them as not only people who will use this because you say to, but people who you really want to both listen to for what they like and what they don’t like and understand where that’s coming from. Because they won’t necessarily be able to tell you what they really need, but what they do tell you might help you understand it. And that’s very fine distinction.

The goal with any new technology is to fold it into their expertise, and that’s going to take time, that’s going to take training, that’s going to take adaptation to the new system and understanding that that can’t and won’t be instant.


About

Doctor Benjamin Wolfe is the assistant professor of psychology at the University of Toronto, where he’s the Director of the Applied Perception and Psychophysics Laboratory. His trainees study how drivers acquire visual information from the surroundings, how attention and inattention impact driver awareness, and the role of cognitive factors in road safety. Doctor Wolfe received his B.A. in psychology in 2008 from Boston University, and his Ph.D. in Psychology from the University of California, Berkeley in 2015. After completing his Ph.D., Doctor Wolfe completed his post doctoral training at MIT from 2015 to 2020, and the Center for Transportation Logistics and the Computer Science and Artificial Intelligence Laboratory, where his work was funded by the Toyota Research Institute. Since joining the faculty at the University of Toronto in 2021 his research and scholarship has focused on perceptual and cognitive questions at the heart of safe driving, and he’s received funding from the Natural Sciences and Engineering Research Council, the University of Toronto Connaught Fund, and the University of Toronto XSeed Program bringing fundamental research in the study of human perception and cognition with human factors research in driver behavior.

CarriersEdge is a leading provider of online driver training for the trucking industry. With a comprehensive library of safety and compliance courses, supported by advanced management and reporting functions, CarriersEdge helps over 2000 fleets train their drivers without sacrificing miles or requiring people to come in on weekends. CarriersEdge is also the creator of the Best Fleets to Drive For program, an annual evaluation of the best workplaces in the North American trucking industry.