Malware analysis demands rapid interpretation of complex detonation reports spanning filesystem, network, and process behaviours. While large language models (LLMs) demonstrate impressive capabilities for technical artifact interpretation, the opacity and escalating API costs of closed-weight frontier models motivate exploration of open-weight alternatives. However, many open-weight models are themselves large, demanding significant compute resources and incurring non-trivial hosting costs that place them beyond reach for resource-constrained deployments. This paper investigates whether orchestrated ensembles of small language models (SLMs) can match or exceed single LLM performance on malware analysis tasks. We established baselines by testing 11 open-weight SLMs, three cyber security pre-trained models, and six frontier LLMs on Meta’s CyberSecEval Malware Analysis benchmark. We then designed and evaluated four novel orchestration architectures: (i) a multi-agent pipeline that decomposes analysis into structured evidence-collection and reasoning stages, (ii) an adversarial debate framework in which two agents iteratively critique each other’s reasoning, (iii) a hierarchical consultation system that pairs a general-purpose SLM with a cyber-specialised expert model, and (iv) a hybrid architecture that combines evidence-grounded pipelines with adversarial debate reasoning. The hybrid system (Qwen3-4B with Foundation-Sec-8B) achieved 35.30% overall accuracy, surpassing all cyber security pre-trained model baselines (best: Llama-Primus-Nemotron-70B at 22.54%) and frontier LLM baselines without retrieved external evidence (Gemini 3 Pro Preview at 34.77%), while doing so at zero API costs, matching frontier alternatives augmented with retrieved evidence. Case studies on malware from the wild (UNC5142 and Lumma Stealer) confirmed the hybrid system’s ability to correct reasoning errors on novel evasion techniques such as EtherHiding and ClickFix. These findings suggest hybrid orchestration of open-weight SLM ensembles is a promising direction towards transparent, auditable, and cost-effective malware analysis systems.
@inproceedings{elzemity2026smaller,title={Small, Free, and Effective: Orchestrating Open-Weight Small Language Models to Outperform Single {LLM} for Malware Analysis},author={ElZemity, Adel},booktitle={Under review for RAID 2026},year={2026},}
ESORICS
APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks
Adel ElZemity, Budi Arief , Shujun Li , and 6 more authors
Bare-metal operational technology (OT) devices, especially the microcontrollers running Modbus/TCP and CoAP at the base of industrial control systems, have remained outside the reach of autonomous security attacks. Prior autonomous pentesting studies target Linux and web systems, whose shells and filesystems are familiar to LLM agents. Bare-metal OT has neither, so agents must reason directly over protocol fields and parser semantics. This requires new action-space designs and runtime controls, and opens new research questions about protocol-level exploit reasoning and its deployment envelope. We present APIOT (Autonomous Purple-teaming for Industrial OT), the first large language model (LLM) framework demonstrating an autonomous attack and remediation of bare-metal OT devices, achieving the full discovery to exploitation to patching to verification cycle without step-by-step human intervention. We implemented and evaluated this framework on Zephyr RTOS firmware across heterogeneous industrial IoT (IIoT) topologies. Through 290 experiment runs spanning five frontier LLMs, three network topologies, two impairment levels, and guided versus unguided conditions, APIOT achieved a mission success rate of 90.0% on the full attack-remediation cycle. We found that the runtime governance layer (which we call an overseer) is a critical engineering variable: without it, agents exhibit systematic degenerate patterns, including repetition loops, missing crash verification, and reconnaissance deadlocks. Together, these findings carry two implications beyond our testbed. Attacker expertise is no longer the binding constraint on bare-metal OT exploitation, and defender threat models must now assume LLM-augmented adversaries capable of executing autonomous discovery-through-remediation cycles against industrial firmware. Moreover, the overseer can improve outcomes by structurally blocking bad sequences rather than changing agent behaviour: runtime governance is a reliability lever for this class of LLM agents, not a model-specific tuning choice.
@inproceedings{elzemity2026apiot,title={{APIOT}: Autonomous Vulnerability Management Across Bare-Metal Industrial {OT} Networks},author={ElZemity, Adel and Arief, Budi and Li, Shujun and Brierley, Calvin and Wang, Yichao and Huang, Yuxiang and Pope, James and Li, Haoxiang and Oikonomou, George},booktitle={Under review for ESORICS 2026},year={2026},}
AI-SS
Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection
Adel ElZemity, Joshua Sylvester , Budi Arief , and 1 more author
In Proceedings of the 1st International Workshop on AI Safety and Security (AI-SS 2026) , 2026
SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes outdated. To deal with this issue, we present Agentic Knowledge Distillation, which consists of a powerful LLM acts as an autonomous teacher that fine-tunes a smaller student SLM, deployable for security tasks without human intervention. The teacher LLM autonomously generates synthetic data and iteratively refines a smaller on-device student model until performance plateaus. We compare four LLMs in this teacher role (Claude Opus 4.5, GPT 5.2 Codex, Gemini 3 Pro, and DeepSeek V3.2) on SMS spam/smishing detection with two student SLMs (Qwen2.5-0.5B and SmolLM2-135M). Our results show that performance varies substantially depending on the teacher LLM, with the best configuration achieving 94.31% accuracy and 96.25% recall. We also compare against a Direct Preference Optimisation (DPO) baseline that uses the same synthetic knowledge and LoRA setup but without iterative feedback or targeted refinement; agentic knowledge distillation substantially outperforms it (e.g. 86-94% vs 50-80% accuracy), showing that closed-loop feedback and targeted refinement are critical. These findings demonstrate that agentic knowledge distillation can rapidly yield effective security classifiers for edge deployment, but outcomes depend strongly on which teacher LLM is used.
@inproceedings{elzemity2026agentic,title={Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection},author={ElZemity, Adel and Sylvester, Joshua and Arief, Budi and Lemos, Rogério De},booktitle={Proceedings of the 1st International Workshop on AI Safety and Security (AI-SS 2026)},year={2026},note={Best Student Paper Award},eprint={2602.10869},archiveprefix={arXiv},primaryclass={cs.CR},}
2025
SECAI
Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data
Adel ElZemity, Budi Arief , and Shujun Li
In Proceedings of the 2025 International Workshop on Security and Artificial Intelligence (SECAI 2025) , 2025
@inproceedings{analysing_llm_risks,title={Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data},author={ElZemity, Adel and Arief, Budi and Li, Shujun},booktitle={Proceedings of the 2025 International Workshop on Security and Artificial Intelligence (SECAI 2025)},year={2025},eprint={2505.09974},archiveprefix={arXiv},primaryclass={cs.CR},doi={10.1007/978-3-032-16092-8_19},}
AISec
CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning
Adel ElZemity, Budi Arief , and Shujun Li
In Proceedings of the 2025 Workshop on Artificial Intelligence and Security (AISec 2025) , 2025
@inproceedings{elzemity2025cyberllminstruct,title={CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning},author={ElZemity, Adel and Arief, Budi and Li, Shujun},booktitle={Proceedings of the 2025 Workshop on Artificial Intelligence and Security (AISec 2025)},year={2025},publisher={ACM},doi={10.1145/3733799.3762968},eprint={2503.09334},archiveprefix={arXiv},primaryclass={cs.CR},}
CHARIOT
Ransomware in Resource-Constrained Industrial IoT Networks: There Actually is a Threat
Yuxiang Huang , Calvin Brierley , Adel ElZemity, and 5 more authors
In 2025 21st International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT) , Apr 2025
The threat of ransomware attacks against Industrial Internet of Things (IIoT) networks, particularly networks of resource-constrained devices, is starting to become a reality. In this paper, we contend that the threat of ransomware infection of an IIoT environment is not only plausible, but that it also exhibits different properties compared to ransomware attacks against traditional desktop systems, necessitating a new and more appropriate approach to deal with this threat. In particular, we articulate the unique characteristics of ransomware behaviour in IIoT networks considering the distinctive characteristics, such as computationally-constrained devices and low-power wireless communication protocols. Furthermore, we outline the necessary attributes for ransomware to effectively compromise and propagate within IIoT networks. To back our argument, we present a proof-of-concept (PoC) IIoT ransomware prototype. To highlight the generality of our work, we have developed the prototype for two different hardware platforms, powered by two different open source embedded operating systems: Contiki-NG and Zephyr.
@inproceedings{6c6894a4b8d349f092d614bc08395b3b,title={Ransomware in Resource-Constrained Industrial IoT Networks: There Actually is a Threat},author={Huang, Yuxiang and Brierley, Calvin and ElZemity, Adel and Pope, James and Ma, Jiteng and {Di Buono}, Antonio and Arief, Budi and Oikonomou, George},year={2025},month=apr,day={30},language={English},series={International Conference on Distributed Computing in Sensor Systems (DCOSS)},publisher={Institute of Electrical and Electronics Engineers (IEEE)},booktitle={2025 21st International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)},address={United States},url={https://dcoss.org/},}
2024
IEEE Xplore
Privacy Threats and Countermeasures in Federated Learning for Internet of Things: A Systematic Review
Adel ElZemity, and Budi Arief
In 2024 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics , Apr 2024
@inproceedings{10731741,author={ElZemity, Adel and Arief, Budi},booktitle={2024 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics},title={Privacy Threats and Countermeasures in Federated Learning for Internet of Things: A Systematic Review},year={2024},pages={331-338},keywords={Privacy;Differential privacy;Social computing;Computational modeling;Multi-party computation;Robustness;Blockchains;Internet of Things;Time factors;Security;Federated Learning;Internet of Things;Privacy Threats;Defensive Measures;Systematic Literature Review},doi={10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics62450.2024.00072}}
2023
Springer
A Comparative Analysis of Time Series Transformers and Alternative Deep Learning Models for SSVEP Classification
Heba Ali , Adel ElZemity, Amir E Oghostinos , and 1 more author
In International Conference on Model and Data Engineering , Apr 2023
Steady State Visually Evoked Potentials (SSVEPs) are intrinsic responses to specific visual stimulus frequencies. When the retina is activated by a frequency ranging from 3.5 to 75 Hz, the brain produces electrical activity at the same frequency as the visual signal, or its multiples. Identifying the preferred frequencies of neurocortical dynamic processes is a benefit of SSVEPs. However, the time consumed during calibration sessions limits the number of training trials and gives rise to visual fatigue since there is significant human variation across and within individuals over time, which weakens the effectiveness of the individual training data. To address this issue, we propose a novel cross-subject-based classification method to enhance the robustness of SSVEP classification by employing cross-subject similarity and variability. Through an efficient time-series transformer, we compared Time Series Transformers (TST) with different deep learning approaches in the literature. We utilized the TST to speed up calibration processes and improve classification precision for new users. Then we compare this technique to other techniques: EEGNet, FBtCNN, and C-CNN. Our suggested framework’s outcomes are validated using two datasets with two different time window lengths. The experimental results suggest that cross-subject time-series transformers and EEGNet achieve better performance with specific subjects than state-of-the-art techniques when compared to other techniques that have high potential for building high-speed BCIs.
@inproceedings{ali2023comparative,title={A Comparative Analysis of Time Series Transformers and Alternative Deep Learning Models for SSVEP Classification},author={Ali, Heba and ElZemity, Adel and Oghostinos, Amir E and Selim, Sahar},booktitle={International Conference on Model and Data Engineering},pages={3--16},year={2023},organization={Springer},isbn={978-3-031-55729-3},publisher={Springer Nature Switzerland},doi={10.1007/978-3-031-55729-3_2}}
IEEE Xplore
A Transformer-Based Deep Learning Architecture for Accurate Intracranial Hemorrhage Detection and Classification
Adel ElZemity, Maryam ElFdaly , Shorouk Abdelfattah , and 6 more authors
In 2023 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT) , Apr 2023
@inproceedings{10391388,author={ElZemity, Adel and ElFdaly, Maryam and Abdelfattah, Shorouk and Abdelwahab, Ahmed and Ramadan, Mohamed and Zakzouk, Salma and Ameen, Ahmed and Elkhishen, Rawan and Darweesh, M. Saeed},booktitle={2023 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)},title={A Transformer-Based Deep Learning Architecture for Accurate Intracranial Hemorrhage Detection and Classification},year={2023},volume={},number={},pages={215-220},keywords={Technological innovation;Deep architecture;Computer architecture;Streaming media;Transformers;Convolutional neural networks;Hemorrhaging;Intracranial Hemorrhage;Transformer;Swin Transformer},doi={10.1109/3ICT60104.2023.10391388},}
2020
IEEE Xplore
Wastewater treatment model with smart irrigation utilizing PID control
Adel ElZemity, Ahmed Ali Gaafar , Ahmed Khaled Ahmed , and 4 more authors
In 2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES) , Apr 2020
@inproceedings{el2020wastewater,title={Wastewater treatment model with smart irrigation utilizing PID control},author={ElZemity, Adel and Gaafar, Ahmed Ali and Ahmed, Ahmed Khaled and Abdelwahab, Ahmed Sayed and Saad, Hatim Mohamed and Elboushi, Mostafa Khaled and Ibraheem, Amira Mofreh},booktitle={2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES)},pages={374--379},year={2020},organization={IEEE},doi={10.1109/NILES50944.2020.9257882},}
2019
IEEE Xplore
Interfacial Modification of Perovskite Solar Cell Using ZnO Electron Injection Layer with PDMS as Antireflective Coating
Mohamed K. Othman , Adel ElZemity, Mohamed K. Rawash , and 4 more authors
In 2019 Novel Intelligent and Leading Emerging Sciences Conference (NILES) , Apr 2019
@inproceedings{8909336,author={Othman, Mohamed K. and ElZemity, Adel and Rawash, Mohamed K. and Taha, Hazem A. and Alalem, Shorouk and El-Fdaly, Maryam and El-Batawy, Yasser M.},booktitle={2019 Novel Intelligent and Leading Emerging Sciences Conference (NILES)},title={Interfacial Modification of Perovskite Solar Cell Using ZnO Electron Injection Layer with PDMS as Antireflective Coating},year={2019},volume={1},number={},pages={209-213},keywords={Conferences;Perovskite solar cell;photovoltaics;Polydimethylsiloxane (PDMS);Pyramids Structure;Electron Injection;Multipathing},doi={10.1109/NILES.2019.8909336},}