Anamika Gupta, Sakshi Garg and Harsh Bamotra
Adv. Know. Base. Syst. Data Sci. Cyber., 2 (1):184-196
Anamika Gupta : Shaheed Sukhdev College of Business Studies
Sakshi Garg : Shaheed Sukhdev College of Business Studies, University of Delhi
Harsh Bamotra : Acharya Narendra Dev College, University of Delhi
DOI: https://dx.doi.org/10.54364/cybersecurityjournal.2025.1109
Article History: Received on: 25-Feb-25, Accepted on: 22-Apr-25, Published on: 29-Apr-25
Corresponding Author: Anamika Gupta
Email: anamikargupta@sscbsdu.ac.in
Citation: Anamika Gupta (2025). Evaluation of Prompting Strategies for Cyberbullying Detection using various Large Language Models. Adv. Know. Base. Syst. Data Sci. Cyber., 2 (1 ):184-196
Sentiment analysis detects toxic language for safer online spaces and helps businesses re-
fine strategies through customer feedback analysis. Advancements in Large Language
Models (LLMs) and prompt engineering have introduced novel approaches to sentiment
analysis, cyberbullying detection, and toxicity classification. However, several challenges
persist, particularly in handling text ambiguity, sarcasm, multilingual contexts, and nu-
anced emotional comprehension, which limit the ability to achieve accurate and human-
aligned results. This study uses the CYBY23 dataset, which contains 112 human-annotated
threads. To balance the dataset, synthetic threads were generated using ChatGPT, re-
sulting in a final dataset of 148 threads evenly distributed across two labels: 0 (bullying
with no aggression) and 1 (bullying with aggression). Three publicly available LLMs—
Deepseek-r1-distill-llama-70b (Deepseek), Qwen-2.5-32b (Qwen) and llama3-70b-8192
(Llama)—were systematically evaluated using zero-shot, one-shot, and few-shot prompt-
ing strategies, with all models accessed via Groq Cloud APIs. The model outputs were
assessed using recall, precision, F1 scores, and accuracy to measure performance in differ-
ent prompting techniques (PT). In this report, Qwen achieved the highest overall accuracy
at 82.43% in few-shot 2, while Llama matched that accuracy in one-shot 2, demonstrating
solid performance in few-shot tasks as well. Deepseek showed high variability, thriving
with contextual enhancements in zero-shot 2 but struggling in one-shot and fluctuating in
few-shot settings. one-shot prompting proved most effective for Llama, while few-shot
methods worked best for both Qwen and Llama.
[1] S. Feuerriegel et al., “Generative AI,” Business Information Systems Engineering, vol. 66, no. 1, pp. 111–126, 2024. [2] S. Ahmadi, “Open AI and its Impact on Fraud Detection in Financial Industry,” Journal of Knowledge Learning and Science Technology ISSN, pp. 2959–6386, 2023. [3] S. Bauman, “Cyberbullying: A virtual menace,” in National Coalition Against Bullying National Conference, vol. 2, no. 4, 2007. [4] M. Campbell and S. Bauman, “Cyberbullying: Definition, Consequences, Prevalence,” in Reducing Cyberbullying in Schools, pp. 3–16, Elsevier, 2018. [5] K. T. A. S. Kasturiratna et al., “Umbrella review of meta-analyses on the risk factors, protective factors, consequences and interventions of cyberbullying victimization,” Nature Human Behaviour, pp. 1–32, 2024. [6] B. S. Nandhini and J. I. Sheeba, “Online social network bullying detection using intelligence techniques,” Procedia Computer Science, vol. 45, pp. 485–492, 2015. [7] S. P. Kiriakidis and A. Kavoura, “Cyberbullying: A review of the literature on harassment through the internet and other electronic means,” Family Community Health, vol. 33, no. 2, pp. 82–93, 2010. [8] A. Perera and P. Fernando, “Accurate cyberbullying detection and prevention on social media,” Procedia Computer Science, vol. 181, pp. 605–611, 2021. [9] G. H. Resende et al., “A Comprehensive View of the Biases of Toxicity and Sentiment Analysis Methods Towards Utterances with African American English Expressions,” arXiv preprint arXiv:2401.12720, 2024. 193 https://cybersecurityjournal.info/ | April 2025 Anamika Gupta, et al. [10] A. Liyih et al., “Sentiment analysis of the Hamas-Israel war on YouTube comments using deep learning,” Scientific Reports, vol. 14, 2024, Nature Publishing Group UK London. [11] M. Mathebula, A. Modupe, and V. Marivate, “ChatGPT as a Text Annotation Tool to Evaluate Sentiment Analysis on South African Financial Institutions,” IEEE Access, vol. 12, pp. 144017–144043, 2024. [12] G. Giumetti and R. Kowalski, “Cyberbullying via Social Media and Well-Being,” Current Opinion in Psychology, vol. 45, p. 101314, 2022. [13] M. S. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine learning techniques,” in 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–5, IEEE, 2013. [14] Gupta A., Thakkar K., Bhasin V., Mathur V., and Tiwari A. (2024), “Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning” International Journal of Advanced Computer Science and Applications(IJACSA), vol. 15, issue 1, pp. 1135–1143, 2024. http://dx.doi.org/10.14569/IJACSA.2024.01501112 [15] Gupta, A., Thakkar, K., Mathur, V., and Tiwari, A. (2023). Machine Learning Methods for Detection of Bystanders: A Survey. International Journal of Advanced Computer Technology, vol. 12, issue 4, pp. 06–14, 2024. [16] M. R. Hasan, M. Maliha, and M. Arifuzzaman, “Sentiment analysis with NLP on Twitter data,” in 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), pp. 1–4, IEEE, 2019. [17] Q. T. Ain et al., “Sentiment analysis using deep learning techniques: a review,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 6, 2017. [18] P. Nandwani and R. Verma, “A review on sentiment analysis and emotion detection from text,” Social Network Analysis and Mining, vol. 11, no. 1, p. 81, 2021. [19] W. Zhang et al., “Sentiment Analysis in the Era of Large Language Models: A Reality Check,” 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.15005. [20] M. S. Islam, S. Sutton, and R. I. Rafiq, “A Generative AI Powered Approach to Cyberbullying Detection,” in Proceedings of the 2024 8th International Conference on Information System and Data Mining, pp. 57–63, 2024. [21] K. Verma et al., “Beyond Binary: Towards Embracing Complexities in Cyberbullying Detection and Intervention - a Position Paper,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LRECCOLING 2024), pp. 2264–2284, 2024. [22] T. Zhan et al., “Optimization techniques for sentiment analysis based on LLM (GPT-3),” Applied and Computational Engineering, vol. 67, pp. 41–47, 2024. [23] S. Paul and S. Saha, “CyberBERT: BERT for cyberbullying identification,” Multimedia Systems, vol. 28, no. 6, pp. 1897–1904, 2022. [24] K. Kheiri and H. Karimi, “SentimentGPT: Exploiting GPT for advanced sentiment analysis and its departure from current machine learning,” arXiv preprint arXiv:2307.10234, 2023. 194 https://cybersecurityjournal.info/ | April 2025 Anamika Gupta, et al. [25] G. Villate-Castillo, J. D. Ser, and B. S. Urquijo, “A systematic review of toxicity in large language models: Definitions, datasets, detectors, detoxification methods and challenges,” 2024. [26] OpenAI et al., “GPT-4 Technical Report,” 2023. [Online]. Available: https://arxiv.org/ abs/2303.08774. [27] S. Pichai, “Our next-generation model: Gemini 1.5,” 2024. [Online]. Available: https://blog.google/technology/ai/ google-gemini-next-generation-model-february-2024/#sundar-note. [28] “The Claude 3 Model Family: Opus, Sonnet, Haiku,” 2024. [Online]. Available: https: //paperswithcode.com/paper/the-claude-3-model-family-opus-sonnet-haiku. [29] H. Touvron et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023. [30] J. P. Inala et al., “Data Analysis in the Era of Generative AI,” arXiv preprint arXiv:2409.18475, 2024. [31] R. Wahid, J. Mero, and P. Ritala, “Written by ChatGPT, illustrated by Midjourney: generative AI for content marketing,” Asia Pacific Journal of Marketing and Logistics, vol. 35, no. 8, pp. 1813–1822, 2023. [32] J. D. Weisz et al., “Perfection not required? Human-AI partnerships in code translation,” in Proceedings of the 26th International Conference on Intelligent User Interfaces, pp. 402–412, 2021. [33] D. Krause, “Mitigating risks for financial firms using generative AI tools,” Available at SSRN 4452600, 2023. [34] S. Gupta, R. Ranjan, and S. N. Singh, “Comprehensive Study on Sentiment Analysis: From Rule-based to modern LLM based system,” arXiv preprint arXiv:2409.09989, 2024. [35] J. R. Jim et al., “Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review,” Natural Language Processing Journal, p. 100059, 2024, Elsevier. [36] B. Chen et al., “Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review,” arXiv preprint arXiv:2310.14735, 2023. [37] H. S. Alfurayj, N. S. Yee, and S. L. Lutfi, “Bystanders Unveiled: Introducing a Comprehensive Cyberbullying Corpus with Bystander Information,” in TENCON 2023-2023 IEEE Region 10 Conference (TENCON), pp. 1012–1017, IEEE, 2023. [38] Busker, T., Choenni, S., Bargh, M. S. (2025). Exploiting GPT for synthetic data generation: An empirical study. Government Information Quarterly,42(1), 101988 [39] P. Sahoo et al., “A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications,” 2024. [Online]. Available: https://doi.org/10.13140/ RG.2.2.13032.65286. [40] T. Brown et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020. 195 https://cybersecurityjournal.info/ | April 2025 Anamika Gupta, et al. [41] M. Hossin and M. N. Sulaiman, “A Review on Evaluation Metrics for Data Classification Evaluations,” in International Journal of Data Mining & Knowledge Management Process, vol. 5, no. 2, pp. 1, Academy & Industry Research Collaboration Center (AIRCC), 2015.