Geethika's Portfolio

ML, Cybersecurity, CAN, ISO 11898, IDS, Automotive

Developing an Intrusion detection System (IDS) for Control Area Networks (CAN) using Machine Learning based on Real Life Traffic Datasets

Institute

SCPS - TUHH

ROLE

Master Thesis

Duration

Dec 2024 - Jul 2025

Abstract

Modern vehicles rely on Controller Area Networks (CAN) buses for in-vehicle communication. CAN protocol is reliable and safe, but not a secure communication protocol, making it vulnerable to security threats. This thesis presents a modular Intrusion Detection System (IDS) architecture using Machine Learning (ML) based on real-world traffic datasets consisting of two modules: Detector and Classifier. It was identified that in real-world scenarios, the attack traffic is scarce when compared to benign traffic – making it difficult for ML algorithms to learn attack patterns for detection and classification. The proposed IDS architecture implements sliding windows to capture the time dependency of attack traffic to detect and classify attack patterns in CAN traffic. This study uses a reference dataset containing data collected in real-time from vehicle manufacturers Chevrolet and Subaru. It evaluates the performance of the IDS to observe how effectively it detects an attack and classifies it to the correct attack type, among many, such as denial of service, spoofing, fuzzing attacks, etc. It also evaluates the proposed IDS on its consistency and generalisation in detecting attacks over different vehicles CAN traffic and varying attack types in the reference dataset. This thesis also emphasises the use of sliding windows implementation during training, about how it affects the performance of the IDS. It was observed that the proposed IDS was effective in detecting the attacks with up to 94% true positive rate, and partially robust in classifying the attacks into the correct categories, only classifying the attacks consistently with known vehicles, and attack types that the IDS has seen during training. We also provide a modular architecture that can be expanded and further improved upon for future work.

Keywords: Controller Area Networks, CAN, Intrusion Detection System, Machine Learning, Real-World Datasets, Classifier, Detector, Chevrolet, Subaru

Dataset Used

can-train-and-test: A curated CAN dataset for automotive intrusion detection https://doi.org/10.1016/j.cose.2024.103777

91,827,504 benign samples; 74,183,508 attack samples; total 166,011,012 samples.
11 types of attacks: DoS, Gear spoofing, Interval, RPM spoofing, Speed spoofing, Standstill, Systematic, Suppress, and Masquerade.
4 different vehicles and 6 drivers of various ages and genders.
Data (benign & attack) collected live, on-the-road from OBD (on-board diagnostics) port.
High severity attacks on RPM, Speed, gear (double and triple signal modification).
Labeled dataset.

Train Methodology
Training is done on selected data using BiLSTM models for Classifier and Detector with sliding windows of approximately 0.05 seconds.
Test Methodology
Testing of the IDS is done using the experimental setup as portrayed in the 'Evaluate' box, to check the effectiveness and robustness of the IDS.
Models
The Detector and Classifier Model are as shown below.
Train Methodology
Training is done on selected data using BiLSTM models for Classifier and Detector with sliding windows of approximately 0.05 seconds.
Test Methodology
Testing of the IDS is done using the experimental setup as portrayed in the 'Evaluate' box, to check the effectiveness and robustness of the IDS.
Models
The Detector and Classifier Model are as shown below.
Train Methodology
Training is done on selected data using BiLSTM models for Classifier and Detector with sliding windows of approximately 0.05 seconds.
Test Methodology
Testing of the IDS is done using the experimental setup as portrayed in the 'Evaluate' box, to check the effectiveness and robustness of the IDS.
Models
The Detector and Classifier Model are as shown below.
Train Methodology
Training is done on selected data using BiLSTM models for Classifier and Detector with sliding windows of approximately 0.05 seconds.
Test Methodology
Testing of the IDS is done using the experimental setup as portrayed in the 'Evaluate' box, to check the effectiveness and robustness of the IDS.
Models
The Detector and Classifier Model are as shown below.

Train Methodology
Training is done on selected data using BiLSTM models for Classifier and Detector with sliding windows of approximately 0.05 seconds.
Test Methodology
Testing of the IDS is done using the experimental setup as portrayed in the 'Evaluate' box, to check the effectiveness and robustness of the IDS.
Models
The Detector and Classifier Model are as shown below.
Train Methodology
Training is done on selected data using BiLSTM models for Classifier and Detector with sliding windows of approximately 0.05 seconds.
Test Methodology
Testing of the IDS is done using the experimental setup as portrayed in the 'Evaluate' box, to check the effectiveness and robustness of the IDS.
Models
The Detector and Classifier Model are as shown below.
Train Methodology
Training is done on selected data using BiLSTM models for Classifier and Detector with sliding windows of approximately 0.05 seconds.
Test Methodology
Testing of the IDS is done using the experimental setup as portrayed in the 'Evaluate' box, to check the effectiveness and robustness of the IDS.
Models
The Detector and Classifier Model are as shown below.
Train Methodology
Training is done on selected data using BiLSTM models for Classifier and Detector with sliding windows of approximately 0.05 seconds.
Test Methodology
Testing of the IDS is done using the experimental setup as portrayed in the 'Evaluate' box, to check the effectiveness and robustness of the IDS.
Models
The Detector and Classifier Model are as shown below.

What I Learned

Through this project, I learned how to work independently, effectively managing my responsibilities as a working student while balancing university studies along with being dedicated to the project. This experience taught me to be analytical, accountable and reliable, ensuring that I could meet all deadlines and deliver high-quality work despite the demands of multiple commitments.

Find me on

Github

Spotify

Personal Blog

Navigation

Projects