Showing New Listings For Friday, 16 May 2025

May 18, 2025 by ADMIN 45 views

Introduction

The proliferation of Text-to-Music (TTM) platforms has democratized music creation, enabling users to effortlessly generate high-quality compositions. However, this innovation also presents new challenges to musicians and the broader music industry. This study investigates the detection of AI-generated songs using the FakeMusicCaps dataset by classifying audio as either deepfake or human.

Methodology

To simulate real-world adversarial conditions, tempo stretching and pitch shifting were applied to the dataset. Mel spectrograms were generated from the modified audio, then used to train and evaluate a convolutional neural network.

Results

The study presents technical results, exploring the ethical and societal implications of TTM platforms, arguing that carefully designed detection systems are essential to both protecting artists and unlocking the positive potential of generative AI in music.

Conclusion

The study demonstrates the effectiveness of a convolutional neural network in detecting AI-generated songs, highlighting the importance of developing detection systems to protect artists and promote the responsible use of generative AI in music.

Introduction

This paper focuses on explaining the timbre conveyed by speech signals and introduces a task termed voice timbre attribute detection (vTAD). In this task, voice timbre is explained with a set of sensory attributes describing its human perception.

Methodology

A pair of speech utterances is processed, and their intensity is compared in a designated timbre descriptor. Moreover, a framework is proposed, which is built upon the speaker embeddings extracted from the speech utterances.

Results

The investigation is conducted on the VCTK-RVA dataset. Experimental examinations on the ECAPA-TDNN and FACodec speaker encoders demonstrated that: 1) the ECAPA-TDNN speaker encoder was more capable in the seen scenario, where the testing speakers were included in the training set; 2) the FACodec speaker encoder was superior in the unseen scenario, where the testing speakers were not part of the training, indicating enhanced generalization capability.

Conclusion

The study presents a novel approach to voice timbre attribute detection, highlighting the importance of understanding the timbre conveyed by speech signals and its applications in various fields.

Introduction

Location-based service (LBS) applications proliferate and support transportation, entertainment, and more. Modern mobile platforms, with smartphones being a prominent example, rely on terrestrial and satellite infrastructures (e.g., global navigation satellite system (GNSS) and crowdsourced Wi-Fi, Bluetooth, cellular, and IP databases) for correct positioning.

Methodology

Our work reveals that GNSS spoofing attacks succeed even though smartphones have multiple sources of positioning information. Moreover, that Wi-Fi spoofing attacks with GNSS jamming are surprisingly effective. More concerning is the evidence that sophisticated, coordinated spoofing attacks are highly effective.

Results

Attacks can target GNSS in combination with other positioning methods, thus defenses that assume that only GNSS is under attack cannot be effective. More so, resilient GNSS receivers and special-purpose antennas are not feasible on smartphones.

Conclusion

The study proposes an extended receiver autonomous integrity monitoringRAIM) framework that leverages the readily available, redundant, often so-called opportunistic positioning information on off-the-shelf platforms. We jointly use onboard sensors, terrestrial infrastructures, and GNSS.

Introduction

Deep learning models have shown promise in lung pathology detection from chest X-rays, but widespread clinical adoption remains limited due to opaque model decision-making.

Methodology

We extend our previous approach and present XpertXAI, a generalizable expert-driven model that preserves human-interpretable clinical concepts while scaling to detect multiple lung pathologies.

Results

We find that existing techniques frequently fail to produce clinically meaningful explanations, omitting key diagnostic features and disagreeing with radiologist judgments. XpertXAI not only outperforms these baselines in predictive accuracy but also delivers concept-level explanations that better align with expert reasoning.

Conclusion

The study demonstrates the effectiveness of human-centric model design in explainable AI for lung cancer detection, highlighting the importance of developing models that provide transparent and interpretable explanations.

Introduction

Modern robots face challenges shared by humans, where machines must learn multiple sensorimotor skills and express them adaptively.

Methodology

We introduce Neural Associative Skill Memories (ASMs), a framework that utilises self-supervised predictive coding for temporal prediction to unify skill learning and expression, using biologically plausible learning rules.

Results

Our model achieves comparable qualitative performance in skill memory expression while using local learning rules and predicts a biologically relevant speed-accuracy trade-off during skill memory expression.

Conclusion

The study presents a novel approach to neural associative skill memories, highlighting the importance of developing robots that can learn and express multiple skills in a safe and adaptive manner.

Introduction

Research projects, including those focused on cancer, rely on the manual extraction of information from clinical reports. This process is time-consuming and prone to errors, limiting the efficiency of data-driven approaches in healthcare.

Methodology

We utilize GMV's NLP tool uQuery, which excels at identifying relevant entities in clinical texts and converting them into standardized formats such as SNOMED and OMOP.

Results

Our results demonstrate strong overall performance, particularly in identifying entities like MET and PAT, although challenges remain with less frequent entities like EVOL.

Conclusion

The study presents a novel approach to automated detection of clinical entities in lung and breast cancer reports using NLP techniques, highlighting the importance of developing efficient and accurate methods for extracting relevant information from clinical texts.

Introduction

Several recent works argue that LLMs have a universal truth direction where true and false statements are linearly separable in the activation space of the model.

Methodology

We explore how this truth direction generalizes between various conversational formats. We find good generalization between short conversations that end on a lie, but poor general to longer formats where the lie appears earlier in the input prompt.

Results

We propose a solution that significantly improves this type of generalization by adding a fixed key phrase at the end of each conversation.

Conclusion

The study demonstrates the challenges of developing reliable LLM lie detectors that generalize to new settings, highlighting the importance of further research in this area.

Introduction

Multi-modal generative AI models integrated into wearable devices have shown significant promise in enhancing the accessibility of visual information for blind or visually impaired (BVI) individuals.

Methodology

We introduce WhatsAI, a prototype extensible framework that empowers BVI enthusiasts to leverage Meta Ray-Bans to create personalized wearable visual accessibility technologies.

Results

Our system is the first to offer a fully hackable template that integrates with WhatsApp, facilitating robust Accessible Artificial Intelligence Implementations (AAII) that enable blind users to conduct essential visual assistance tasks.

Conclusion

The study presents a novel approach to transforming Meta Ray-Bans into an extensible generative AI platform for accessibility, highlighting the importance of developing accessible technologies that empower visually impaired individuals.

Introduction

The fields of autonomous systems and robotics are receiving considerable attention in civil applications such as construction, logistics, and firefighting.

Methodology

We present a novel Edge-AI-enabled drone-based surveillance system for autonomous multi-robot operations at construction sites.

Results

Our system integrates a lightweight MCU-based object detection model within a custom-built UAV platform and a 5G-enabled multi-agent coordination infrastructure.

Conclusion

The study demonstrates the effectiveness of Edge-AI solutions in enabling low-power, cost-effective robotics that can automate civil services, improve safety, and enhance sustainability.

Introduction

Imaging and genomic data offer distinct and rich features, and their integration can unveil new insights into the complex landscape of diseases.

Methodology

We present a novel approach utilizing radiogenomic data including structural MRI images and gene expression data, for Alzheimer's disease detection.

Results

Our framework introduces a novel heterogeneous bipartite graph representation learning featuring two distinct node types: genes and images.

Conclusion

The study presents a novel approach to radiogenomic bipartite graph representation learning, highlighting the importance of integrating imaging and genomic data for Alzheimer's disease detection.

Introduction

Safe handover in shared autonomy for vehicle control is well-established in modern vehicles.

Methodology

We propose Diffusion-SAFE, a closed-loop shared autonomy framework leveraging diffusion models to: (1) predict human driving behavior for detection of potential risks, (2) generate safe expert trajectories, and (3) enable smooth handovers by blending human and expert policies over a short time horizon.

Results

Our method ensures a gradual transition of control authority, by mimicking the drivers' behavior before intervention, whichates abrupt takeovers, leading to smooth transitions.

Conclusion

The study presents a novel approach to shared autonomy, highlighting the importance of developing safe and smooth handover systems for vehicle control.

**Introduction

Q&A: Detection and AI-Generated Content

Q: What is the main focus of the study on detecting musical deepfakes?

A: The main focus of the study is to investigate the detection of AI-generated songs using the FakeMusicCaps dataset by classifying audio as either deepfake or human.

Q: What is the significance of the study on voice timbre attribute detection?

A: The study focuses on explaining the timbre conveyed by speech signals and introduces a task termed voice timbre attribute detection (vTAD). This is significant because it can be applied in various fields such as speech recognition, speaker identification, and emotion recognition.

Q: What is the main challenge in location-based services (LBS) applications?

A: The main challenge in LBS applications is the vulnerability to attacks that manipulate positions to control and undermine LBS functionality.

Q: What is the proposed solution for location-based services (LBS) applications?

A: The proposed solution is an extended receiver autonomous integrity monitoring (RAIM) framework that leverages the readily available, redundant, often so-called opportunistic positioning information on off-the-shelf platforms.

Q: What is the main goal of the study on explainability through human-centric design for XAI in lung cancer detection?

A: The main goal of the study is to develop a generalizable expert-driven model that preserves human-interpretable clinical concepts while scaling to detect multiple lung pathologies.

Q: What is the significance of the study on neural associative skill memories for safer robotics and modeling human sensorimotor repertoires?

A: The study presents a novel approach to neural associative skill memories, highlighting the importance of developing robots that can learn and express multiple skills in a safe and adaptive manner.

Q: What is the main challenge in automated detection of clinical entities in lung and breast cancer reports using NLP techniques?

A: The main challenge is the time-consuming and prone to errors process of manual extraction of information from clinical reports.

Q: What is the proposed solution for automated detection of clinical entities in lung and breast cancer reports using NLP techniques?

A: The proposed solution is the use of GMV's NLP tool uQuery, which excels at identifying relevant entities in clinical texts and converting them into standardized formats such as SNOMED and OMOP.

Q: What is the main goal of the study on exploring the generalization of LLM truth directions on conversational formats?

A: The main goal of the study is to explore how the truth direction generalizes between various conversational formats.

Q: What is the significance of the study on WhatsAI: transforming Meta Ray-Bans into an extensible generative AI platform for accessibility?

A: The study presents a novel approach to transforming Meta Ray-Bans into an extensible generative AI platform for accessibility, highlighting the importance of developing accessible technologies that empower visually impaired individuals.

Q: What is the main challenge in EdgeAI drone for autonomous construction site demonstrator?

A: The main challenge is the need for robust processing units to run AI models.

Q: What is the proposed solution for EdgeAI drone for autonomous construction site demonstrator?

A: The proposed solution is the use of Edge-AI solutions that enable low-power, cost-effective robotics can automate civil services, improve safety, and enhance sustainability.

Q: What is the main goal of the study on radiogenomic bipartite graph representation learning for Alzheimer's disease detection?

A: The main goal of the study is to present a novel approach to radiogenomic bipartite graph representation learning, highlighting the importance of integrating imaging and genomic data for Alzheimer's disease detection.

Q: What is the main challenge in diffusion-SAFE: shared autonomy framework with diffusion for safe human-to-robot driving handover?

A: The main challenge is the need for safe and smooth handover systems for vehicle control.

Q: What is the proposed solution for diffusion-SAFE: shared autonomy framework with diffusion for safe human-to-robot driving handover?

A: The proposed solution is the use of a closed-loop shared autonomy framework leveraging diffusion models to predict human driving behavior, generate safe expert trajectories, and enable smooth handovers.

Q: What is the main goal of the study on correlating account on Ethereum mixing service via domain-invariant feature learning?

A: The main goal of the study is to propose a novel framework that addresses the limitations of existing methods for correlating mixing accounts by leveraging domain-invariant feature learning.

Q: What is the significance of the study on explainability through human-centric design for XAI in lung cancer detection?

A: The study presents a novel approach to explainability through human-centric design for XAI in lung cancer detection, highlighting the importance of developing models that provide transparent and interpretable explanations.

Q: What is the main challenge in neural associative skill memories for safer robotics and modeling human sensorimotor repertoires?

A: The main challenge is the need for robots that can learn and express multiple skills in a safe and adaptive manner.

Q: What is the proposed solution for neural associative skill memories for safer robotics and modeling human sensorimotor repertoires?

A: The proposed solution is the use of a novel framework that utilises self-supervised predictive coding for temporal prediction to unify skill learning and expression.

Q: What is the main goal of the study on automated detection of clinical entities in lung and breast cancer reports using NLP techniques?

A: The main goal of the study is to develop efficient and accurate methods for extracting relevant information from clinical texts.

Q: What is the significance of the study on exploring the generalization of LLM truth directions on conversational formats?

A: The study demonstrates the challenges of developing reliable LLM lie detectors that generalize to new settings, highlighting the importance of further research in this area.

Q: What is the main challenge in WhatsAI: transforming Meta Ray-Bans into an extensible generative AI platform for accessibility?

A: The main challenge is the need for accessible technologies that empower visually impaired individuals.

Q: What is the proposed solution for WhatsAI: transforming Meta Ray-Bans into an extensible generative AI platform for accessibility?

A: The proposed solution is the use of a novel framework that empowers BVI enthusiasts to leverage Meta Ray-Bans to create personalized wearable visual accessibility technologies.

Q: What is the main goal of the study on EdgeAI drone for autonomous construction site demonstrator?

A: The main goal of the study is to a novel approach to Edge-AI-enabled drone-based surveillance system for autonomous multi-robot operations at construction sites.

Q: What is the significance of the study on radiogenomic bipartite graph representation learning for Alzheimer's disease detection?

A: The study presents a novel approach to radiogenomic bipartite graph representation learning, highlighting the importance of integrating imaging and genomic data for Alzheimer's disease detection.