A Deep Reinforcement Learning framework for real-time drowsiness detection with adaptive multi-level intervention, driven by multi-modal physiological signals.
View System ArchitectureFigure 1: Architecture of the SEED-VIG dataset-driven driver vigilance monitoring and warning system. The framework comprises four layers: (a) multi-modal input signals including EEG spectral features, frontal EEG, EOG, and PERCLOS labels; (b) feature processing pipeline producing a 105-dimensional state vector; (c) a PPO-based DRL agent with Actor-Critic networks for adaptive decision-making; (d) a graduated intervention strategy with closed-loop feedback returning st+1 to the state representation layer.
Four complementary signal modalities are captured simultaneously, each providing an independent window into the driver's neurophysiological state.
| Signal Modality | Channels | Frequency / Features | Dimensions | Extraction Method |
|---|---|---|---|---|
|
EEG Multi-Band
|
14-ch (full scalp) | δ 1–4 Hz θ 4–8 Hz α 8–13 Hz β 13–30 Hz γ 30+ Hz | 70-d | PSD (Welch) / Differential Entropy per band–channel pair |
|
Frontal EEG
|
Fp1, Fp2, F3, F4, Fz | 5 bands × 5 channels; θ/α ratio; frontal asymmetry index ln(R)−ln(L) | 25-d | Same spectral decomposition; asymmetry computed as inter-hemispheric power difference |
|
EOG Features
|
H-EOG + V-EOG | Blink frequency, blink duration, saccade velocity, fixation duration, slow eyelid movement amplitude | 8-d | Peak detection for blinks; velocity threshold for saccades; amplitude integration for slow closure |
|
PERCLOS Labels
|
— | Percentage of Eyelid Closure (≥80% closure threshold); vigilance ground truth | Label | Frame-wise eye aspect ratio; 80% closure criterion; windowed aggregation |
EEG spectral features capture neurophysiological fatigue progression, while EOG captures behavioral manifestations (slow eyelid closure) that spectral analysis may miss. The frontal region degrades earliest during fatigue, warranting dedicated extraction. PERCLOS provides an independent ground truth for both supervised pre-training and DRL reward calibration.
<15% — Alert state | 15–40% — Mild fatigue | 40–80% — Moderate drowsiness | >80% — Severe drowsiness
Raw multi-modal signals are transformed through a deterministic pipeline into a compact state vector that serves as the observation space for the DRL agent.
| Processing Stage | Method | Parameters | Output |
|---|---|---|---|
| Sliding Window | Fixed-length segmentation with overlap | Window: 3s | Overlap: 50% | Step: 1.5s | Windowed feature matrices per modality |
| Normalization | Z-score standardization (per-subject, per-channel) | z = (x − μ) / σ | Zero-mean, unit-variance features |
| Concatenation | Modality-wise feature vector assembly | EEG(70) + Frontal(25) + EOG(8) | 103-dim feature vector |
| State Augmentation | Append previous action and intervention flag | at-1 ∈ {0,...,4} + It ∈ {0,1} | st ∈ R105 |
The inclusion of the previous action at−1 and intervention state It provides the agent with temporal context, enabling it to learn intervention policies that account for the driver's response to prior warnings. Per-subject normalization ensures inter-subject generalization.
The decision core is formulated as a Markov Decision Process and solved via Proximal Policy Optimization with Actor-Critic architecture.
| Component | Definition | Specification |
|---|---|---|
| State S | Observation at each decision step | st ∈ R105 — multi-modal feature vector + temporal context |
| Action A | Discrete intervention level selection | at ∈ {0, 1, 2, 3, 4} — 5 graduated intervention levels |
| Reward R | Shaped reward signal | r = rdetect + α·rcost + β·rrecovery |
| Transition P | State dynamics under fatigue | Driver state evolves according to natural fatigue trajectory, modulated by intervention effectiveness |
| Layer | Dimensions | Activation |
|---|---|---|
| Input | 105 | — |
| Hidden 1 + LayerNorm | 256 | ReLU |
| Hidden 2 + LayerNorm | 128 | ReLU |
| Output | 5 | Softmax |
| Layer | Dimensions | Activation |
|---|---|---|
| Input | 105 | — |
| Hidden 1 + LayerNorm | 256 | ReLU |
| Hidden 2 + LayerNorm | 128 | ReLU |
| Output | 1 | Linear |
where rt(θ) = πθ(at|st) / πθold(at|st) is the probability ratio, Ât is the GAE advantage estimate, and ε = 0.2 limits the policy update magnitude to prevent destructive large steps.
where δt = rt + γV(st+1) − V(st) is the TD residual. The λ = 0.95 parameter balances bias and variance, enabling stable advantage estimation across multiple timesteps.
| Hyperparameter | Value | Description |
|---|---|---|
| Learning Rate | 3 × 10−4 | Adam optimizer with linear decay |
| Discount Factor γ | 0.99 | Long-horizon reward consideration |
| GAE Parameter λ | 0.95 | Bias-variance trade-off in advantage |
| Clip Range ε | 0.2 | Policy update trust region |
| Epochs per Update | 10 | Multiple passes over collected buffer |
| Mini-batch Size | 64 | Stochastic gradient estimation |
| Gradient Clip Norm | 0.5 | Prevents gradient explosion |
| Buffer Size | 2048 | Steps collected before each update |
The agent's discrete action maps to a graduated intervention strategy delivered through in-vehicle interfaces. Post-intervention state changes feed back as the next observation, closing the decision loop.
| Level | Condition | Intervention Methods | Output Devices |
|---|---|---|---|
|
0
No Intervention
|
PERCLOS < 15% | None — driver assessed as alert | — |
|
1
Mild Reminder
|
15% ≤ PERCLOS < 40% | Dashboard text prompt; gentle notification tone | HMI Screen |
|
2
Moderate Warning
|
40% ≤ PERCLOS < 60% | Audio beep alert; visual flash warning; emphasized text notification | HMI Screen, Speaker |
|
3
Strong Warning
|
60% ≤ PERCLOS < 80% | Loud alarm; rapid visual flashing; seat vibration pulse; voice announcement | HMI Screen, Speaker, Seat Motor |
|
4
Emergency
|
PERCLOS ≥ 80% | Continuous alarm; maximum vibration; voice command: "Pull over safely"; hazard light activation | All Systems, Hazard Lights |
The system operates as a continuous closed-loop cycle. At each timestep t: