::Accepted Papers :: 6th International Conference on Data Science and Machine Learning (DSML 2025)

Accepted Papers

Reframing Standard Language Ideology in the Language of AI

Atticus Yang, Department of Digital Studies, The University of Chicago, Chicago, USA

ABSTRACT

In this paper we deconstruct and clarify concerns raised by Geneveive Smtih, Eve Fleising, et al. on standard language ideology in language generated by large language models (LLMs), as described in their 2024 paper “Standard Language Ideology in AI-Generated Language”. First, we return to theoretical interpretations of language ideology and reconstruct our understanding of how certain variants of language can achieve prestige between humans within a society. We apply this paradigm to the context of humans interacting with nonhuman LLMs, especially as it relates to linguistically default Standard American English (SAE) in AI. Through this process we discuss the contrasting pros and cons between speakers of SAE who exist inside of Standard American Society, and those who exist outside of it, ultimately allowing us to comment on the complex relationship humans have with prestige dialects and highlight the power of AI developers themselves to deconstruct the bias within their work.

KEYWORDS

Language Ideology, Dialects, Language Models, Natural Language Processing, Generative AI

Injecting Perceptual Features Into T5 for Figurative Language Generation

WU Yufeng, Department of Linguistics and Translation, City University of Hong Kong, Hong Kong, China

ABSTRACT

Understanding metaphors remains a core challenge for NLP systems, especially when metaphorical meaning depends on perceptual grounding. This paper explores whether injecting perceptual color features into a T5-based language model can enhance metaphor explanation generation. We propose a low-cost, interpretable approach by mapping 12-dimensional color vectors (JzAzBz space) into prefix embeddings that condition the model during fine-tuning. Evaluation on held-out test sets shows that the color-injected model outperforms the text-only baseline in both automatic metrics (BLEU +144%, ROUGE-L F1 +150%) and human ratings of correctness and general quality. However, a significant drop in comprehensiveness is observed, suggesting a trade-off between precision and coverage. Rater agreement analyses reveal high within-item agreement but modest inter-rater consistency, underscoring the subjective difficulty of metaphor evaluation. Our findings demonstrate the utility of perceptual grounding for figurative language generation and offer insights into balancing accuracy and elaboration in metaphor explanation tasks.

KEYWORDS

Metaphor Explanation, Embodied Cognition, Multimodal NLP

Dynamic Generalized IoU Threshold: Optimizing AP and mAP for Object Detection using a Data-Driven Approach

Jie Zhao and Meng Su, Department of Computer Science and Software Engineering, Penn State University, Erie, USA

ABSTRACT

Object detection, a key technology in computer vision applications such as facial recognition, self-driving vehicles, and industrial surveillance, is widely utilized in research and industrial settings. The evaluation of object detection algorithms relies heavily on performance metrics, but challenges arise with models like the Single Shot Detector (SSD), which features variable bounding box areas and diverse aspect ratios. These properties complicate the application of traditional metrics such as Receiver Operating Characteristic (ROC), Area Under the Curve (AUC), and Youdens J Statistic, which are better suited for classification, as well as the Jaccard Index (IoU), a localization measure. This paper introduces a data-driven approach to address these challenges by generating dynamic IoU threshold constants tailored to different object categories, guided by three constraints derived from Machine Learning and Statistics. Experiments using the 2017 COCO validation dataset demonstrate that this method enhances Average Precision (AP) and mean Average Precision (mAP) compared to fixed threshold approaches.

KEYWORDS

Dynamic IoU threshold constant generation, Data-driven approach, Single Shot Detector (SSD), Youdens J Statistics, Precision, recall, optimized AP, and optimized mAP

The Convergence Quincunx: A Theoretical Framework for Rebuilding Consumer Trust in the Digital Age

Luis Arieira, Faculty of Digital Economy, NUM, Phnom penh, Cambodia

ABSTRACT

The contemporary digital marketing landscape is defined by a significant paradox: unprecedented reach is juxtaposed with profound erosion of consumer trust. This paper introduces "The Convergence Quincunx," a five-pillar theoretical framework designed to address this trust deficit by transitioning from an "attention economy" to a "trust economy." The frameworks core novelty lies in its holistic synthesis of traditional marketing principles with the transformative capabilities of blockchain technology, centered around a concept of an immutable brand promise. We argue that while existing research explores individual applications of blockchain in marketing (e.g., supply chain transparency, tokenized loyalty), a comprehensive, integrated framework to address trust as a systemic issue has been absent. This paper outlines a conceptual model, posits a "Theory of Verifiable Trust," and proposes a mixed-methods research agenda to empirically validate the frameworks efficacy.

Cyberdtd: A Multimodal Benchmark Dataset for Cyberbullying Detection in Tunisian Dialect

Sahar Ben Bechir , Asma Mekki , Ismail Badache , Mariem Ellouze , and Lamia Hadrich Belguith , ANLP Research Group, MIRACL Lab., University of Sfax

ABSTRACT

Cyberbullying has become a pressing issue on social media, particularly in low-resource and multilingual contexts such as Tunisia. Effective detection requires understanding both textual and visual signals, including images with embedded text and user-generated comments. In this work, we introduce CyberDTD (Cyberbullying Detection in Tunisian Dialect), a multimodal dataset designed to support research on cyberbullying detection in the Tunisian Arabic dialect. CyberDTD contains 10.802 images across five categories (humor, sarcasm, hate, violence, neutral), making it the first large-scale multimodal cyberbullying dataset in Tunisian Arabic. The dataset will be publicly released to foster future research. The dataset encompasses a wide range of online harassment, including sarcasm, humor, hate, and violence, while also providing neutral examples for balanced analysis. We offer a comprehensive descriptive analysis, highlighting key challenges such as class imbalance, multimodality, and cultural specificity. CyberDTD constitutes a valuable resource for developing and evaluating machine learning models in low-resource settings, paving the way for more robust and culturally aware cyberbullying detection systems.

KEYWORDS

Multimodal Dataset, Tunisian Dialect, Cyberbullying, Natural Language Processing (NLP).

Leveraging Big Data Analytics For Evidence-based Social Policy: A Computational Sociology Approach

Oritsemeyiwa Gabriel Orugboh1

ABSTRACT

The accelerating digitalization of society has generated unprecedented volumes of social, economic, and behavioural data - offering new opportunities to transform social policy from reactive to predictive and adaptive. Yet, traditional social policy frameworks remain constrained by slow data cycles, fragmented records, and linear analysis models that fail to reflect complex societal dynamics. This paper introduces an integrated Computational Sociology Framework that leverages Big Data Analytics and Machine Learning to inform evidence-based social policy. Drawing from open government data, social media streams, and census databases, the study applies unsupervised clustering and regression modelling (Python, SQL, and scikit-learn) to detect spatial patterns of digital exclusion, income disparity, and urban vulnerability. Comparative case studies - Nigeria’s Conditional Cash Transfer (CCT) Program and Kenya’s Hunger Safety Net Programme (HSNP) - illustrate how algorithmic targeting improves accuracy, responsiveness, and transparency in welfare delivery. The results demonstrate that big data–driven policy can improve targeting precision by over 30% and reduce decision lags by up to 80%. Finally, the paper presents a decision-support dashboard prototype for policymakers, integrating explainable AI (XAI) for transparency and interpretability. This research advances a scalable model for computational governance—bridging sociology, data science, and public administration to promote inclusive, equitable, and evidence-driven policy innovation.

KEYWORDS

Big Data Analytics, Computational Sociology, Evidence-Based Policy, Machine Learning, Social Inequality, Smart Governance

DSML

6^th International Conference on Data Science and Machine Learning (DSML 2025)

November 08 ~ 09, 2025, Melbourne, Australia

6^th International Conference on Data Science and Machine Learning (DSML 2025)

November 08 ~ 09, 2025, Melbourne, Australia

6^th International Conference on Data Science and Machine Learning (DSML 2025)

November 08 ~ 09, 2025, Melbourne, Australia

6^th International Conference on Data Science and Machine Learning (DSML 2025)

November 08 ~ 09, 2025, Melbourne, Australia

6^th International Conference on Data Science and Machine Learning (DSML 2025)

November 08 ~ 09, 2025, Melbourne, Australia

6^th International Conference on Data Science and Machine Learning (DSML 2025)

November 08 ~ 09, 2025, Melbourne, Australia

Accepted Papers

Reframing Standard Language Ideology in the Language of AI

ABSTRACT

KEYWORDS

Injecting Perceptual Features Into T5 for Figurative Language Generation

ABSTRACT

KEYWORDS

Dynamic Generalized IoU Threshold: Optimizing AP and mAP for Object Detection using a Data-Driven Approach

ABSTRACT

KEYWORDS

The Convergence Quincunx: A Theoretical Framework for Rebuilding Consumer Trust in the Digital Age

ABSTRACT

Cyberdtd: A Multimodal Benchmark Dataset for Cyberbullying Detection in Tunisian Dialect

ABSTRACT

KEYWORDS

Leveraging Big Data Analytics For Evidence-based Social Policy: A Computational Sociology Approach

ABSTRACT

KEYWORDS

Reach Us