GitHub Repository

NDRLDA
NeuroDRL an Adaptive Driver Vigilance Control and Real-Time Alerting

A Deep Reinforcement Learning framework for real-time drowsiness detection with adaptive multi-level intervention, driven by multi-modal physiological signals.

View System Architecture

Figure 1

System Architecture

Figure 1: Architecture of the SEED-VIG dataset-driven driver vigilance monitoring and warning system. The framework comprises four layers: (a) multi-modal input signals including EEG spectral features, frontal EEG, EOG, and PERCLOS labels; (b) feature processing pipeline producing a 105-dimensional state vector; (c) a PPO-based DRL agent with Actor-Critic networks for adaptive decision-making; (d) a graduated intervention strategy with closed-loop feedback returning s_t+1 to the state representation layer.

Section 3.1 | Table 1

Layer 1 — Multi-Modal Input

Four complementary signal modalities are captured simultaneously, each providing an independent window into the driver's neurophysiological state.

Signal Modality	Channels	Frequency / Features	Dimensions	Extraction Method
EEG Multi-Band	14-ch (full scalp)	δ 1–4 Hz θ 4–8 Hz α 8–13 Hz β 13–30 Hz γ 30+ Hz	70-d	PSD (Welch) / Differential Entropy per band–channel pair
Frontal EEG	Fp1, Fp2, F3, F4, Fz	5 bands × 5 channels; θ/α ratio; frontal asymmetry index ln(R)−ln(L)	25-d	Same spectral decomposition; asymmetry computed as inter-hemispheric power difference
EOG Features	H-EOG + V-EOG	Blink frequency, blink duration, saccade velocity, fixation duration, slow eyelid movement amplitude	8-d	Peak detection for blinks; velocity threshold for saccades; amplitude integration for slow closure
PERCLOS Labels	—	Percentage of Eyelid Closure (≥80% closure threshold); vigilance ground truth	Label	Frame-wise eye aspect ratio; 80% closure criterion; windowed aggregation

Rationale for Multi-Modal Fusion

EEG spectral features capture neurophysiological fatigue progression, while EOG captures behavioral manifestations (slow eyelid closure) that spectral analysis may miss. The frontal region degrades earliest during fatigue, warranting dedicated extraction. PERCLOS provides an independent ground truth for both supervised pre-training and DRL reward calibration.

PERCLOS Threshold Criteria

<15% — Alert state | 15–40% — Mild fatigue | 40–80% — Moderate drowsiness | >80% — Severe drowsiness

Section 3.2 | Table 2

Layer 2 — Feature Processing & State Representation

Raw multi-modal signals are transformed through a deterministic pipeline into a compact state vector that serves as the observation space for the DRL agent.

Processing Stage	Method	Parameters	Output
Sliding Window	Fixed-length segmentation with overlap	Window: 3s \| Overlap: 50% \| Step: 1.5s	Windowed feature matrices per modality
Normalization	Z-score standardization (per-subject, per-channel)	z = (x − μ) / σ	Zero-mean, unit-variance features
Concatenation	Modality-wise feature vector assembly	EEG(70) + Frontal(25) + EOG(8)	103-dim feature vector
State Augmentation	Append previous action and intervention flag	a_t-1 ∈ {0,...,4} + I_t ∈ {0,1}	s_t ∈ R¹⁰⁵

State Vector Composition

s t = [ f EEG 70 ‖ f Frontal 25 ‖ f EOG 8 ‖ a t-1 ‖ I t] \in R 105

The inclusion of the previous action a_t−1 and intervention state I_t provides the agent with temporal context, enabling it to learn intervention policies that account for the driver's response to prior warnings. Per-subject normalization ensures inter-subject generalization.

Section 3.3 | Tables 3–5

Layer 3 — DRL Agent (PPO)

The decision core is formulated as a Markov Decision Process and solved via Proximal Policy Optimization with Actor-Critic architecture.

3.3.1 MDP Formulation

Component	Definition	Specification
State S	Observation at each decision step	s_t ∈ R¹⁰⁵ — multi-modal feature vector + temporal context
Action A	Discrete intervention level selection	a_t ∈ {0, 1, 2, 3, 4} — 5 graduated intervention levels
Reward R	Shaped reward signal	r = r_detect + α·r_cost + β·r_recovery
Transition P	State dynamics under fatigue	Driver state evolves according to natural fatigue trajectory, modulated by intervention effectiveness

3.3.2 Network Architecture

Actor Network π_θ(a|s)

Layer	Dimensions	Activation
Input	105	—
Hidden 1 + LayerNorm	256	ReLU
Hidden 2 + LayerNorm	128	ReLU
Output	5	Softmax

Critic Network V_φ(s)

Layer	Dimensions	Activation
Input	105	—
Hidden 1 + LayerNorm	256	ReLU
Hidden 2 + LayerNorm	128	ReLU
Output	1	Linear

3.3.3 PPO Core Mechanisms

Clipped Surrogate Objective

L CLIP (θ) = E[ min( r t (θ)\cdotÂ t, clip(r t (θ), 1\pmε)\cdotÂ t) ]

where r_t(θ) = π_θ(a_t|s_t) / π_{θ_old}(a_t|s_t) is the probability ratio, Â_t is the GAE advantage estimate, and ε = 0.2 limits the policy update magnitude to prevent destructive large steps.

Generalized Advantage Estimation

Â t GAE(γ,λ) = Σ l=0 \infty (γλ) l \cdot δ t+l

where δ_t = r_t + γV(s_t+1) − V(s_t) is the TD residual. The λ = 0.95 parameter balances bias and variance, enabling stable advantage estimation across multiple timesteps.

3.3.4 Training Hyperparameters

Hyperparameter	Value	Description
Learning Rate	3 × 10⁻⁴	Adam optimizer with linear decay
Discount Factor γ	0.99	Long-horizon reward consideration
GAE Parameter λ	0.95	Bias-variance trade-off in advantage
Clip Range ε	0.2	Policy update trust region
Epochs per Update	10	Multiple passes over collected buffer
Mini-batch Size	64	Stochastic gradient estimation
Gradient Clip Norm	0.5	Prevents gradient explosion
Buffer Size	2048	Steps collected before each update

Section 3.4 | Table 6

Layer 4 — Action & Intervention

The agent's discrete action maps to a graduated intervention strategy delivered through in-vehicle interfaces. Post-intervention state changes feed back as the next observation, closing the decision loop.

Level	Condition	Intervention Methods	Output Devices
0 No Intervention	PERCLOS < 15%	None — driver assessed as alert	—
1 Mild Reminder	15% ≤ PERCLOS < 40%	Dashboard text prompt; gentle notification tone	HMI Screen
2 Moderate Warning	40% ≤ PERCLOS < 60%	Audio beep alert; visual flash warning; emphasized text notification	HMI Screen, Speaker
3 Strong Warning	60% ≤ PERCLOS < 80%	Loud alarm; rapid visual flashing; seat vibration pulse; voice announcement	HMI Screen, Speaker, Seat Motor
4 Emergency	PERCLOS ≥ 80%	Continuous alarm; maximum vibration; voice command: "Pull over safely"; hazard light activation	All Systems, Hazard Lights

3.4.1 Closed-Loop Feedback Mechanism

The system operates as a continuous closed-loop cycle. At each timestep t:

1

The agent observes state s_t and selects action a_t via the learned policy π_θ.

2

The corresponding intervention is delivered through the designated output devices (HMI, speaker, vibration motor).

3

The driver's physiological response is captured, forming s_t+1 which feeds back into Layer 2 for the next decision cycle.

s 0 \to π θ \to a 0 \to Intervention \to s 1 \to π θ \to a 1 \to \dots \to s T

NDRLDA NeuroDRL an Adaptive Driver Vigilance Control and Real-Time Alerting