NDRLDA
NeuroDRL an Adaptive Driver Vigilance Control and Real-Time Alerting

A Deep Reinforcement Learning framework for real-time drowsiness detection with adaptive multi-level intervention, driven by multi-modal physiological signals.

View System Architecture
Figure 1

System Architecture

Figure 1: Architecture of the SEED-VIG dataset-driven driver vigilance monitoring and warning system. The framework comprises four layers: (a) multi-modal input signals including EEG spectral features, frontal EEG, EOG, and PERCLOS labels; (b) feature processing pipeline producing a 105-dimensional state vector; (c) a PPO-based DRL agent with Actor-Critic networks for adaptive decision-making; (d) a graduated intervention strategy with closed-loop feedback returning st+1 to the state representation layer.

Section 3.1  |  Table 1

Layer 1 — Multi-Modal Input

Four complementary signal modalities are captured simultaneously, each providing an independent window into the driver's neurophysiological state.

Signal Modality Channels Frequency / Features Dimensions Extraction Method
EEG Multi-Band
14-ch (full scalp) δ 1–4 Hz θ 4–8 Hz α 8–13 Hz β 13–30 Hz γ 30+ Hz 70-d PSD (Welch) / Differential Entropy per band–channel pair
Frontal EEG
Fp1, Fp2, F3, F4, Fz 5 bands × 5 channels; θ/α ratio; frontal asymmetry index ln(R)−ln(L) 25-d Same spectral decomposition; asymmetry computed as inter-hemispheric power difference
EOG Features
H-EOG + V-EOG Blink frequency, blink duration, saccade velocity, fixation duration, slow eyelid movement amplitude 8-d Peak detection for blinks; velocity threshold for saccades; amplitude integration for slow closure
PERCLOS Labels
Percentage of Eyelid Closure (≥80% closure threshold); vigilance ground truth Label Frame-wise eye aspect ratio; 80% closure criterion; windowed aggregation

Rationale for Multi-Modal Fusion

EEG spectral features capture neurophysiological fatigue progression, while EOG captures behavioral manifestations (slow eyelid closure) that spectral analysis may miss. The frontal region degrades earliest during fatigue, warranting dedicated extraction. PERCLOS provides an independent ground truth for both supervised pre-training and DRL reward calibration.

PERCLOS Threshold Criteria

<15% — Alert state  |  15–40% — Mild fatigue  |  40–80% — Moderate drowsiness  |  >80% — Severe drowsiness

Section 3.2  |  Table 2

Layer 2 — Feature Processing & State Representation

Raw multi-modal signals are transformed through a deterministic pipeline into a compact state vector that serves as the observation space for the DRL agent.

Processing Stage Method Parameters Output
Sliding Window Fixed-length segmentation with overlap Window: 3s  |  Overlap: 50%  |  Step: 1.5s Windowed feature matrices per modality
Normalization Z-score standardization (per-subject, per-channel) z = (x − μ) / σ Zero-mean, unit-variance features
Concatenation Modality-wise feature vector assembly EEG(70) + Frontal(25) + EOG(8) 103-dim feature vector
State Augmentation Append previous action and intervention flag at-1 ∈ {0,...,4} + It ∈ {0,1} st ∈ R105

State Vector Composition

st = [ fEEG70 ‖ fFrontal25 ‖ fEOG8 ‖ at−1 ‖ It ] ∈ R105

The inclusion of the previous action at−1 and intervention state It provides the agent with temporal context, enabling it to learn intervention policies that account for the driver's response to prior warnings. Per-subject normalization ensures inter-subject generalization.

Section 3.3  |  Tables 3–5

Layer 3 — DRL Agent (PPO)

The decision core is formulated as a Markov Decision Process and solved via Proximal Policy Optimization with Actor-Critic architecture.

3.3.1   MDP Formulation

Component Definition Specification
State S Observation at each decision step st ∈ R105 — multi-modal feature vector + temporal context
Action A Discrete intervention level selection at ∈ {0, 1, 2, 3, 4} — 5 graduated intervention levels
Reward R Shaped reward signal r = rdetect + α·rcost + β·rrecovery
Transition P State dynamics under fatigue Driver state evolves according to natural fatigue trajectory, modulated by intervention effectiveness

3.3.2   Network Architecture

Actor Network πθ(a|s)
Layer Dimensions Activation
Input 105
Hidden 1 + LayerNorm 256 ReLU
Hidden 2 + LayerNorm 128 ReLU
Output 5 Softmax
Critic Network Vφ(s)
Layer Dimensions Activation
Input 105
Hidden 1 + LayerNorm 256 ReLU
Hidden 2 + LayerNorm 128 ReLU
Output 1 Linear

3.3.3   PPO Core Mechanisms

Clipped Surrogate Objective

LCLIP(θ) = E[ min( rt(θ)·Ât, clip(rt(θ), 1±ε)·Ât ) ]

where rt(θ) = πθ(at|st) / πθold(at|st) is the probability ratio, Ât is the GAE advantage estimate, and ε = 0.2 limits the policy update magnitude to prevent destructive large steps.

Generalized Advantage Estimation

ÂtGAE(γ,λ) = Σl=0 (γλ)l · δt+l

where δt = rt + γV(st+1) − V(st) is the TD residual. The λ = 0.95 parameter balances bias and variance, enabling stable advantage estimation across multiple timesteps.

3.3.4   Training Hyperparameters

Hyperparameter Value Description
Learning Rate 3 × 10−4 Adam optimizer with linear decay
Discount Factor γ 0.99 Long-horizon reward consideration
GAE Parameter λ 0.95 Bias-variance trade-off in advantage
Clip Range ε 0.2 Policy update trust region
Epochs per Update 10 Multiple passes over collected buffer
Mini-batch Size 64 Stochastic gradient estimation
Gradient Clip Norm 0.5 Prevents gradient explosion
Buffer Size 2048 Steps collected before each update
Section 3.4  |  Table 6

Layer 4 — Action & Intervention

The agent's discrete action maps to a graduated intervention strategy delivered through in-vehicle interfaces. Post-intervention state changes feed back as the next observation, closing the decision loop.

Level Condition Intervention Methods Output Devices
0
No Intervention
PERCLOS < 15% None — driver assessed as alert
1
Mild Reminder
15% ≤ PERCLOS < 40% Dashboard text prompt; gentle notification tone HMI Screen
2
Moderate Warning
40% ≤ PERCLOS < 60% Audio beep alert; visual flash warning; emphasized text notification HMI Screen, Speaker
3
Strong Warning
60% ≤ PERCLOS < 80% Loud alarm; rapid visual flashing; seat vibration pulse; voice announcement HMI Screen, Speaker, Seat Motor
4
Emergency
PERCLOS ≥ 80% Continuous alarm; maximum vibration; voice command: "Pull over safely"; hazard light activation All Systems, Hazard Lights

3.4.1   Closed-Loop Feedback Mechanism

The system operates as a continuous closed-loop cycle. At each timestep t:

1
The agent observes state st and selects action at via the learned policy πθ.
2
The corresponding intervention is delivered through the designated output devices (HMI, speaker, vibration motor).
3
The driver's physiological response is captured, forming st+1 which feeds back into Layer 2 for the next decision cycle.
s0 → πθ → a0 → Intervention → s1 → πθ → a1 → … → sT