Prompting fairness: Learning prompts for debiasing large language models

ManualCenter for AI Measurement

Authors: Camelia Lemnaru, Cristian Andrei Rad

Large language models are prone to internalize social biases due to the characteristics of the data used for their self-supervised training scheme. Considering their recent emergence and wide availability to the general public, it is mandatory to identify and alleviate these biases to avoid perpetuating stereotypes towards underrepresented groups. We present a novel prompt-tuning method for reducing biases in encoder models such as BERT or RoBERTa. Unlike other methods, we only train a small set of additional reusable token embeddings that can be concatenated to any input sequence to reduce bias in the outputs. We particularize this method to gender bias by providing a set of templates used for training the prompts. Evaluations on two benchmarks show that our method is on par with the state of the art while having a limited impact on language modeling ability

Energy forecasting under missing data: Comparative evaluation of augmented representations and decoder-only time-series imputation

2026OpenAlex automated

Authors: Tudor Cioara

Data-related issues, including missing values and irregular measurements, challenge the accuracy of short-term energy forecasting in smart grids. In data-scarce scenarios, two approaches are commonly considered, but their strengths and weaknesses are not fully mapped. Embedding-based models learn joint representations from heterogeneous data, compensating for the lack of time-series measurements via additional contextual or external sources, whereas imputation pipelines restore temporal continuity but may smooth variability or produce implausible values. To address these limitations, we propose a unified forecasting framework for energy systems that integrates a shared Temporal Fusion Transformer prediction with a controlled degradation protocol to simulate realistic missing-data patterns. This enables a fair and systematic comparison between two pipelines: a representation-augmented learning and decoder-only time series imputation. The former integrates TS2Vec temporal embeddings and BERT-based static contextual representations to provide a richer forecasting space without without explicit reconstruction of missing values. The latter uses a Chronos-2 model to reconstruct missing time-series segments, followed by physics-based correction to enforce physically plausible outputs. We evaluate both pipelines under a controlled data degradation protocol to map the trade-offs between representation learning and data continuity restoration through imputation. We use real-world non-residential building electricity consumption and wind generation datasets. The imputation-based pipeline achieves a mean sMAPE of 10.14% and MAE of 8.43 kWh across 100 buildings, compared to 12.11% and 10.89 kWh for the representation-based approach ( p < 0 . 01 p < 0 . 01 p < 0 . 01 ) . On the wind generation imputation also improves predictive accuracy ( R 2 = 0 . 870 vs. R 2 = 0 . 794 ). However representation-based models remain competitive in scenarios with irregular, spike-dominated, or event-driven consumption patterns where imputation provides limited additional benefits.

A systematic review of generative AI usage for IT project management

2026OpenAlex automated

Authors: Tudor Cioara

This paper aims to synthesize current knowledge on generative AI in IT project management using the PRISMA methodology to provide researchers with a comprehensive perspective on techniques, applications, adoption trends, limitations, and integration across project management tools and process groups. The analysis reveals a clear dominance of OpenAI's GPT in the included studies but relying primarily on prompt engineering, suggesting that research in this area remains at an exploratory stage. Finally, it identifies and discusses three promising research directions for AI-enabled project management, including process group-specific AI agents, project role-based AI agents, and hybrid collaborative networks that enable human-guided orchestration.

AlloyGraph: Data and Evaluation Results for Multi-Agent AI Superalloy Property Prediction

2026OpenAlex automated

Authors: Alexandru Lecu, Adrian Petru Groza

Training data (77 alloys from the Nickel Institute handbook), evaluation data (88 alloys from manufacturer datasheets), prediction results for six model configurations, chatbot evaluation benchmarks (250 MCQ questions, 100 RAGAS questions, 12 expert-graded questions), inverse design results (20 target specifications), and OWL ontology for the AlloyGraph platform. Associated repository: https://github.com/AlexLecu/AlloyGraph

ConvU-NExT: An Asymmetrical Encoder–Decoder for Denoising Low Dose CT

2026ArticleManualTrusted AI

Authors: Adrian Petru Groza

Low-dose computed tomography (LDCT) is a medical imaging modality designed to minimize ionizing radiation exposure while maintaining the ability to produce detailed cross-sectional images. It is particularly valuable in scenarios requiring repeated imaging, such as cancer screening, follow-up examinations or pediatric diagnostics, where reducing radiation dose is critical to patientsafety. For example, to reduce noise by half, fourtimesthe radiation dose isrequired in the slice. The goal isto achieve postprocessed LDCT images with comparable quality to those obtained from standard-dose CT imaging. We start with a brief overview of the CT procedures and their limitations. Then we introduce a novel denoising method based on an asymmetric integration of the ConvNeXt backbone with the U-Net architecture. This novel approach obtained 2–3 times less noise than the original LDCT, having a 10%–20% increase in performance compared to U-Net implementation, checked against three metrics MSE, SSIMLoss and combinations of both. The results suggest that: (i) augmenting the images with specific noise, obtained from water phantom CT scan test, while training yieldssuperiorresults compared to generic noise augmentations; (ii) a larger kernelsize better extracts features and (iii) a smaller kernel size was mandatory for feature reconstruction

Colonic Polyp Detection with Object Detection Models

2026OpenAlex automated

Authors: Eugen Richard Ardelean

In recent years, deep learning has been applied more and more to medical image analysis. One such application of deep learning is the automated polyp detection in colonoscopy with the target of reducing miss rates. This study presents a comprehensive evaluation of nine state-of-the-art object detection models for colonic polyp detection: YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLO12, YOLO26, RT-DETR, YOLO-World, and YOLOE. The models were evaluated on three publicly available datasets: CVC-ClinicDB, CVC-ColonDB, and ETIS-LaribPolypDB. All models were trained under standardized conditions using identical hyperparameters and data augmentation strategies to guarantee fair comparison. Performance was evaluated using multiple metrics: mAP@50, mAP@50–95, F1 score, precision, recall, inference time, and computational cost. YOLO11 demonstrated the best overall performance, achieving mAP@50 scores of 0.995, 0.944, and 0.978 on the three datasets respectively, while maintaining the fastest inference time of approximately 150 ms per image and the third-lowest computational cost at 21.3 GFLOPs. Cross-dataset generalization experiments revealed a significant loss of performance, with mAP@50 dropping by 20–40% when models were tested on an unseen dataset, highlighting the challenge of true generalization with limited datasets. Statistical analysis by polyp size showed that while all models achieved F1 scores exceeding 0.95 for large polyps, performance decreased to 0.60–0.85 for small polyps, indicating a limitation in detecting small lesions. The analysis of failure modes showed that missed detections, false positives and boundary errors constitute 60–75% of all failures, suggesting that domain adaptation of object detection models may be required.

Performance Evaluation of LLMs in Automated RDF Knowledge Graph Generation

2026OpenAlex automated

Authors: Tudor Cioara

Cloud systems generate large, heterogeneous log data containing critical infrastructure, application, and security information. Transforming these logs into RDF triples enables their integration into knowledge graphs, improving interpretability, root-cause analysis, and cross-service reasoning beyond what raw logs allow. Large Language Models (LLMs) offer a promising approach to automate RDF knowledge graph generation; however, their effectiveness on complex cloud logs remains largely unexplored. In this paper, we evaluate multiple LLM architectures and prompting strategies for automated RDF extraction using a controlled framework with two pipelines for systematically processing semi-structured log data. The extraction pipeline integrates multiple LLMs to identify relevant entities and relationships, automatically generating subject-predicate-object triples. These outputs are evaluated using a dedicated validation pipeline with both syntactic and semantic metrics to assess accuracy, completeness, and quality. Due to the lack of public ground-truth datasets, we created a reference Log-to-KG dataset from OpenStack logs using manual annotation and ontology-driven methods, enabling objective baseline. Our analysis shows that Few-Shot learning is the most effective strategy, with Llama achieving a 99.35% F1 score and 100% valid RDF output while Qwen, NuExtract, and Gemma also perform well under Few-Shot prompting, with Chain-of-Thought approaches maintaining similar accuracy. One-Shot prompting offers a lighter but effective alternative, while Zero-Shot and advanced strategies such as Tree-of-Thought, Self-Critique, and Generate-Multiple perform substantially worse. These results highlight the importance of contextual examples and prompt design for accurate RDF extraction and reveal model-specific limitations across LLM architectures.

A Hybrid Machine Learning–Genetic Algorithm for Optimizing Surface-Mount Technology Planning

2026conference paperManualTrusted AI

Authors: Adrian Petru Groza

We tackle the problem of improving the Surface- Mount Technology (SMT) process planning in an automotive manufacturing setting. Current simulations show low accu- racy across production lines as the existing approach relies on predefined setups rather than adapting to product-specific configurations. We propose a hybrid framework that couples machine learning with a genetic algorithm to generate product- specific plans. Our solution involves three tasks: (i) assigning boards to lines, (ii) allocating components to Pick-and-Place (PnP) machines, and (iii) balancing workloads across machines. Our hybrid pipeline embeds supervised learning in a genetic optimizer. A multi-class classifier selects feasible PnP head con- figurations per Bill of Materials (BOM) part number (precision = 0.73). A genetic algorithm assigns components to compatible feeder tables/machines, while a regression model estimates table cycle times (R² = 0.88). The fitness jointly optimizes Components Placed per Hour (CPH) and Line Balancing (LB) under process constraints. Different mutation methods are explored, revealing that mutation based on balancing the workload by leveling the number of placements on the tables with minimum and maximum cycle time results in an LB of 0.83, with a CPH of 0.37 and an average delta cycle time of -3.27% across 105-part numbers

OCTA-Based Biomarker Characterization in nAMD

2026OpenAlex automated

Authors: Adrian Petru Groza

We aim to enhance ophthalmologists' decision-making when diagnosing the Neovascular Age-Related Macular Degeneration (nAMD). We developed three tools to analyze Optical Coherence Tomography Angiography images: (1) extracting biomarkers such as mCNV area and vessel density using image processing; (2) generating a 3D visualization of the neovascularization for a better view of the affected regions; and (3) applying an ensemble of three white box machine learning algorithms (decision tree, support vector machines and DL-Learner) for nAMD diagnosis. The learned expressions reached 100% accuracy for the training data and 68% accuracy in testing. The main advantage is that all the learned models white-box, which ensures explainability and transparency, allowing clinicians to better understand the decision-making process.

Edge-Oriented Orchestration of Energy Services Using Graph-Driven Swarm Intelligence

2026OpenAlex automated

Authors: Tudor Cioara

As smart grids increasingly depend on IoT devices and distributed energy management, they require decentralized, low latency orchestration of energy services. We address this with a unified framework for edge fog cloud infrastructures tailored to smart energy systems. It features a graph based data model that captures infrastructure and workload, enabling efficient topology exploration and task placement. Leveraging this model, a swarm-based heuristic algorithm handles task offloading in a resource-aware, latency sensitive manner. Our framework ensures data interoperability via energy data space compliance and guarantees traceability using blockchain based workload notarization. We validate our approach with a real-world KubeEdge deployment, demonstrating zero downtime service migration under dynamic workloads while maintaining service continuity.

Replay Attacks Against Audio Deepfake Detection

2025ConferenceManualTrusted AI

We show how replay attacks undermine audio deepfake detection: By playing and re-recording deepfake audio through various speakers and microphones, we make spoofed samples appear authentic to the detection model. To study this phenomenon in more detail, we introduce ReplayDF, a dataset of recordings derived from M-AILABS and MLAAD, featuring 109 speaker-microphone combinations across six languages and four TTS models. It includes diverse acoustic conditions, some highly challenging for detection. Our analysis of six open-source detection models across five datasets reveals significant vulnerability, with the topperforming W2V2-AASIST model’s Equal Error Rate (EER) surging from 4.7% to 18.2%. Even with adaptive Room Impulse Response (RIR) retraining, performance remains compromised with an 11.0% EER. We release ReplayDF for noncommercial research use.

TADA: Training-free Attribution and Out-of-Domain Detection of Audio Deepfakes

2025ConferenceManualTrusted AI

Authors: David Combei

Deepfake detection has gained significant attention across audio, text, and image modalities, with high accuracy in distinguishing real from fake. However, identifying the exact source—such as the system or model behind a deepfake—remains a less studied problem. In this paper, we take a significant step forward in audio deepfake model attribution or source tracing by proposing a training-free, green AI approach based entirely on k-Nearest Neighbors (kNN). Leveraging a pre-trained self-supervised learning (SSL) model, we show that grouping samples from the same generator is straightforward– we obtain an 0.93 F1-score across five deepfake datasets. The method also demonstrates strong out-of-domain (OOD) detection, effectively identifying samples from unseen models at an F1-score of 0.84

Prompts and Prayers: the Rise of GPTheology

2025ManualCenter for AI Measurement

Authors: Adrian Petru Groza

Increasingly artificial intelligence (AI) has been cast in “god-like” roles (to name a few: film industry – Matrix, The Creator, Mission Impossible, Foundation, Dune etc.; literature – Children of Time, Permutation City, Neuromancer, I Have no Mouth and I Must Scream, Alphaville etc.). This trend has accelerated with the advent of sophisticated Large Language Models such as ChatGPT. For this phenomenon, where AI is perceived as divine, we use the term GPTheology, where ChatGPT and other AI models are treated as potential oracles of a semi-divine nature. This paper explores the emergence of GPTheology as a form of techno-religion, examining how narratives around AI echo traditional religious constructs. We draw on community narratives from online forums – Reddit – and recent projects – AI-powered Mazu Statue in Malaysia (Lu, 2025); “ShamAIn” Project in Korea (He-rim, 2025); AI Jesus in a Swiss Church (Kennedy, 2024). These examples show striking similarities to technological notions of the Singularity and the development of Artificial General Intelligence (AGI). Additionally, we analyse how daily interactions with AI are acquiring ritualistic associations and how AI-centric ideologies clash with or are integrated into established religions. This study uses a dataset of Reddit posts discussing AI to identify recurring themes of salvation, prophecy, and demonization surrounding AI. Our findings suggest that new belief systems are developing around AI, and this carries both philosophical and sociotechnical implications. Our paper critically analyses the benefits and dangers, as well as the social, political and ethical challenges of this development. This transdisciplinary inquiry highlights how AI and religion are increasingly intertwined, prompting necessary questions about humanity’s relationship with its creations and the future of belief.

Reducing Hallucinations in Medical AI: A Knowledge Graph-Augmented Retrieval System for Evidence-Based Age-Related Macular Degeneration Information

2025articleManualTrusted AI

Authors: Alexandru Lecu, Adrian Petru Groza

Large language models (LLMs) have significantly advanced natural language generation but frequently produce unverified outputs, compromising their reliability in critical medical applications. We present a framework that combines structured biomedical knowledge with LLMs through retrieval-augmented generation to address this challenge. Our system automatically extracts causal relationships from 5 000 age-related macular degeneration (AMD) abstracts, building a knowledge graph with over 43 200 validated relations. Using vector-based retrieval, the framework generates contextually relevant and verifiable responses with direct clinical evidence links. We evaluated our approach across eight language models, including open-source models from 1B to 70B parameters (LLama, Mistral, Qwen, SmolLM) and GPT-5-mini, on 3 000 queries with varying question types and reasoning complexity. Smaller models (3B parameters) showed substantial improvements: SmolLM3-3B reached 95.6% accuracy on singlehop true/false questions (from 78.2% baseline). The medium-scale model Mistral-7B demonstrated the largest gains on complex multi-hop reasoning, improving from 45% to 76% accuracy on multiple-choice questions. Larger models (70B parameters) showed minimal improvement due to already high baseline performance (97-98% accuracy). Our results demonstrate that RAG-enhanced knowledge graphs enable resource-efficient smaller models to achieve performance levels approaching or matching larger models, reducing hallucinations while maintaining computational efficiency for clinical deployment [PDF](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11298209)

Unmasking real-world audio deepfakes: A data-centric approach

2025ConferenceManualTrusted AI

Authors: David Combei

The growing prevalence of real-world deepfakes presents a critical challenge for existing detection systems, which are often evaluated on datasets collected just for scientific purposes. To address this gap, we introduce a novel dataset of real-world audio deepfakes. Our analysis reveals that these real-world examples pose significant challenges, even for the most performant detection models. Rather than increasing model complexity or exhaustively search for a better alternative, in this work we focus on a data-centric paradigm, employing strategies like dataset curation, pruning, and augmentation to improve model robustness and generalization. Through these methods, we achieve a 55% relative reduction in EER on the In-the-Wild dataset, reaching an absolute EER of 1.7%, and a 63% reduction on our newly proposed real-world deepfakes dataset, AI4T. These results highlight the transformative potential of data-centric approaches in enhancing deepfake detection for real-world applications.

Structural Retinal Analysis in Toxoplasmic Retinochoroiditis: OCT Follow-Up with Three-Dimensional Reconstruction

2025OpenAlex automated

Authors: Adrian Petru Groza

Background: Ocular toxoplasmosis remains the leading cause of posterior uveitis worldwide. Optical coherence tomography (OCT) provides valuable insights into the structural alterations associated with this condition. The present study aimed to characterize the vitreous, retinal, and choroidal morphological changes observed during both the active and scarred stages of ocular toxoplasmosis using OCT imaging. A secondary objective was to evaluate the added value of three-dimensional reconstruction in the assessment of retinal lesions. Methods: A retrospective study was conducted on 12 eyes belonging to 12 patients diagnosed with toxoplasmosis retinochoroiditis (TRC). Optical coherence tomography (OCT) scans centered on the active lesions were qualitatively analyzed at baseline and follow-up. Additionally, a ResUNet model was trained to generate a full volumetric reconstruction of the retinochoroidal lesions in selected cases. Results: Twelve eyes were analyzed at a mean of 16.2 days from symptom onset. The mean follow-up duration was 144 days (range: 12–490 days). OCT imaging revealed characteristic alterations in the retina, choroid, and vitreous body, which were documented both at baseline and at follow-up. Representative cases were selected for three-dimensional reconstruction to illustrate the extent of retinal architectural involvement. Conclusions: OCT analysis refines our understanding of the structural damage associated with ocular toxoplasmosis, while three-dimensional reconstruction enhances our ability to visualize and interpret these alterations on a larger scale.

Fine-Grained Complexity of Ontology Mediated Queries

2025ManualTrusted AI

Authors: Cristina Feier

Hybrid transformer model with liquid neural networks and learnable encodings for buildings’ energy forecasting

2025OpenAlex automated

Authors: Tudor Cioara

• Hybrid transformer with liquid neural networks model for building energy forecasting. • Convolutional neural network encoders to understand temporal dynamics in energy data through spatial mappings. • Reservoir processing module implemented with liquid neural networks to capture non-linear relations in energy data. • Validation on various building contexts, including large apartment buildings and small households. Accurate forecasting of buildings' energy demand is essential for building operators to manage loads and resources efficiently, and for grid operators to balance local production with demand. However, nowadays models still struggle to capture nonlinear relationships influenced by external factors like weather and consumer behavior, assume constant variance in energy data over time, and often fail to model sequential data. To address these limitations, we propose a hybrid Transformer-based model with Liquid Neural Networks and learnable encodings for building energy forecasting. The model leverages Dense Layers to learn non-linear mappings to create embeddings that capture underlying patterns in time series energy data. Additionally, a Convolutional Neural Network encoder is integrated to enhance the model's ability to understand temporal dynamics through spatial mappings. To address the limitations of classic attention mechanisms, we implement a reservoir processing module using Liquid Neural Networks which introduces a controlled non-linearity through dynamic reservoir computing, enabling the model to capture complex patterns in the data. For model evaluation, we utilized both pilot data and state-of-the-art datasets to determine the model's performance across various building contexts, including large apartment and commercial buildings and small households, with and without on-site energy production. The proposed transformer model demonstrates good predictive accuracy and training time efficiency across various types of buildings and testing configurations. Specifically, SMAPE scores indicate a reduction in prediction error, with improvements ranging from 1.5 % to 50 % over basic transformer, LSTM and ANN models while the higher R² values further confirm the model's reliability in capturing energy time series variance. The 8 % improvement in training time over the basic transformer model, highlights the hybrid model computational efficiency without compromising accuracy.

Technical and socio-economic perspectives for microgrid control and cooperation

2025OpenAlex automated

Authors: Tudor Cioara

The microgrid offers benefits across technical, economic, environmental, and social dimensions for local energy management. However, their development faces several challenges, requiring multi- dimensional studies. In this paper, we provide a multidisciplinary overview to sustain the implementation of microgrids and to improve power supply and renewable integration. Beginning with a building-scale MG, the study will first focus on optimal techno-economic sizing to secure off-grid operation and on-site consumer specificities. The aim is then to demonstrate that resilient control solutions will ensure the effectiveness of this concept under critical conditions. We explore the deployment of these solutions at a neighborhood scale using cooperative energy management between off-grid microgrids such as virtual power plants, energy trading, and cooperative game theory. Finally, we have addressed the social dimension as it aims to assess the social factors influencing microgrid development, including a projection study on the economic growth of microgrids and several regulation guidelines.

MCP-Orchestrated Multi-Agent System for Automated Disinformation Detection

2025conference paperManualTrusted AI

Authors: Adrian Petru Groza, Alexandru Lecu

The large spread of disinformation across digital platforms creates significant challenges to information integrity. This paper presents a multi-agent system that uses relation extraction to detect disinformation in news articles, focusing on titles and short text snippets. The proposed Agentic AI system combines four agents: (i) a machine learning agent (logistic regression), (ii) a Wikipedia knowledge check agent (which relies on named entity recognition), (iii) a coherence detection agent (using LLM prompt engineering), and (iv) a web-scraped data analyzer that extracts relational triplets for fact checking. The system is orchestrated via the Model Context Protocol (MCP), offering shared context and live learning across components. Results demonstrate that the multi-agent ensemble achieves 95.3% accuracy with an F1 score of 0.964, significantly outperforming individual agents and traditional approaches. The weighted aggregation method, mathematically derived from individual agent misclassification rates, proves superior to algorithmic threshold optimization. The modular architecture makes the system easily scalable, while also maintaining details of the decision processes.

On the Contribution of Lexical Features to Speech Emotion Recognition

2025ConferenceManualTrusted AI

Authors: David Combei

Although paralinguistic cues are often considered the primary drivers of speech emotion recognition (SER), we investigate the role of lexical content extracted from speech and show that it can achieve competitive—and in some cases higher—performance compared to acoustic models. On the MELD dataset, our lexical-based approach obtains a weighted F1-score (WF1) of 51.5%, compared to 49.3% for an acousticonly pipeline with a larger parameter count. Furthermore, we analyze different self-supervised (SSL) speech and text representations, conduct a layer-wise study of transformer-based encoders, and evaluate the effect of audio denoising.