In automotive audio systems, musical beats and drum events can control synchronized in-cabin experiences such as ambient lighting and music-driven visual effects. Compared to beat tracking, automatic drum transcription (ADT) offers richer control signals by detecting and classifying drum onsets of multiple drum classes (e.g., kick, snare, and hi-hat), enabling more precise and musically meaningful synchronization. Deploying ADT in vehicles, however, requires low latency, computational efficiency, and robust performance for various input signals. This paper investigates improvements to low-latency ADT suitable for automotive deployment, using the Separate-Tracks-Annotate-Resynthesize Drums (STAR Drums) dataset and a block-based processing strategy that achieves an average detection delay of around 60 ms. We explore three strategies: (1) lightweight architecture modifications inspired by recent advances in image classification, combined with a temporal convolutional network (TCN); (2) re-rendering STAR Drums to increase drum timbre diversity and augmenting the re-synthesized drum stems; and (3) refinement training with pseudo labels obtained from source-separated mixtures. Our results show that data augmentation and increased drum timbre diversity yield modest performance gains, whereas pseudo-label refinement provides the largest effect, with up to 18 % relative improvement in global F-measure. In the real-time eight-class setting, our best model achieves a global F-measure of 0.76 on MDB Drums, competitive with state-of-the-art offline systems, demonstrating that accurate and efficient ADT is feasible for automotive deployment.
Signal processing using artificial intelligence (AI) has gained increasing interest because it outperforms existing solutions in many fields. A significant challenge for deep neural networks lies in meeting strict requirements regarding latency, computational load and memory, which is vital in automotive audio. This paper presents CUpGAN (Conditional Upmix GAN), a computationally efficient method for extracting upmix signals with low latency, leveraging signal separation for two upmixing concepts using a conditional generative adversarial network (CGAN). One upmix approach utilize spatial positions of direct sources within the stereo image, allowing for the distribu- tion of sources around the listener. The second approach separates direct and diffuse signals to create an ambience signal for rear surround loudspeakers. By employing phase-aware loss functions, integrating residual connections in the generator, and training with coherent input and target signals, we achieve high sound quality in the generated signals. This methodology also facilitates the computation of a cost-efficient complementary signal for both upmixing concepts through the difference between input and generated signals. The proposed technique reduces memory as 96% of the parameters can be shared between both applications, allowing seamless switching between upmixing approaches without the need for parameter loading; instead, parameters are computed by a small control network. The GAN generator is trained on synthetically generated data, enabling control over separation characteristics that surpass traditional methods. We present an evaluation using listening tests and computational metrics, demonstrating the advantages of our approach compared to classical signal processing methods.
As intelligent cockpits develop, music source separation (MSS) is increasingly used in automotive audio to address complex sound mixtures failing to meet users’ diverse needs. In karaoke, it separates vocals and accompaniment for humming and enables independent male/female vocal volume adjustment for duets. For in-vehicle audio up-mixing, extracted stems reconfigure stereo mixes into cockpit-optimized multichannel layouts. For real-time rendering, it is required to enhance specific tracks to adapt to cabin noise and user preferences. However, audio sources other than vocal, bass and drums lack research for real-time automotive applications. This paper proposes targeted optimizations for data augmentation and model structure: for target tracks (guitar, piano, male/female lead/backing vocals), a parallel single-track model is used for piano/guitar separation, and a two-stage model for male/female voice separation (first separating general vocals, then splitting into lead and backing vocals). A "Random Mixing" and "Aligned Mixing" combined method adapts to harmonic overlap in real songs. In terms of loss function, besides time-domain L1-loss and Multi-scale STFT loss, a GAN-based training procedure with individualized discriminators for each instrument stem improves audio quality and separation accuracy. Training uses a dataset from MedleyDB, MoiseDB and 3,000 private songs. To enhance the real-time causal model’s time-dimension receptive field, a modified SCNet with dilated convolutions and source-based band split is adopted. The model achieves 64 ms latency with 4.36M parameters, and its SDR values (6 dB piano, 5.6 dB guitar, 8.9 dB male vocals, 7.6 dB female vocals) outperform SOTA models like DTTNet and SCNet.
With the increasing adoption of electric vehicles (EVs), the number of vehicles equipped with active driving sound (ASD) systems has also grown. Although many EV driving sounds emulate the acoustic characteristics of internal combustion engine (ICE) vehicles, EVs lack engine induced vibrations, resulting in a mismatch between auditory cues and tactile seat sensations. Providing vibrations synchronized with virtual engine sound can mitigate this discrepancy and enhance driver immersion. This study analyzes the seat vibration characteristics of an ICE vehicle and proposes a vibration generation algorithm that integrates vehicle state information with already existing ASD sound. The algorithm was implemented on a DSP platform, and vibration actuators were installed in an EV seat to deliver tactile feedback in conjunction with ASD sound.
Engine orders are a major source of tonal noise in vehicle cabins. In production engine order cancellation (EOC), good performance is not only high attenuation, but also stable behavior in real conditions. After successful cancellation, the target order can become masked by broadband noise (low SNR). In addition, uncorrelated in-band intrusions may occur in the same frequency region. In these cases, observation-based on/off rules that rely on the current visibility of the tone can cause false deactivation and mode chattering. This paper presents a multi-channel EOC method for an in-cabin audio system. Orders are modeled in real time using a quadrature sin–cos basis synchronized with rotational-speed information. The main contribution is an order-level state logic that uses a reconstructed estimate of the current control contribution through secondary-path models. This enables a practical distinction between “absent/irrelevant” and “suppressed but masked,” and supports stable operation by scheduling the adaptation step size and the accumulation length in order periods. The method also accounts for fast RPM changes by shortening the accumulation window while keeping a moderate adaptation rate for tracking. A second contribution is a dynamic order-management layer that periodically evaluates residual-based order salience (accounting for current cancellation) and selects a bounded top-K set for control. The layer includes optional fade-in/out transitions for switching and a lightweight safeguard that de-rates poorly controllable orders to reduce the risk of noise amplification. The method is evaluated on a low-reverberation bench and in a vehicle under steady-speed and run-up conditions, including background noise and audio playback.
Active Road Noise Control (RNC) has become an important complement to passive treatments for mitigating low frequency tire–pavement noise in automotive cabins. While numerous global and local RNC systems have been reported in the literature and implemented in production vehicles, their performance is often evaluated using inconsistent indicators and measurement methodologies, making direct comparison difficult. This paper presents a critical review of performance evaluation metrics and measurement methods for automotive RNC systems. Key performance indicators, including sound pressure level reduction, effective frequency range, spatial effectiveness, adaptability, and robustness, are discussed with an emphasis on noise reduction measurements used in both industrial practice and academic research. Existing standards and test procedures are reviewed alongside commonly used experimental methods, such as single point microphone measurements, artificial head measurements, and small and large microphone array techniques. Reported noise reduction performance of representative global and local RNC systems is summarized to illustrate the influence of system architecture and measurement methodology on published results. The review highlights current limitations in RNC performance assessment, including test scenario complexity, measurement variability, and the lack of unified international standards. These findings provide guidance for selecting appropriate evaluation methods and support future development and standardization of automotive RNC system evaluation and measurement.
This research focuses on two of the main challenges for headrest loudspeakers in cars – their directivity and their low-frequency performance. In part one, we discuss a method to characterize the directivity in meaningful ways despite the complex acoustic target environment (acoustic near field conditions, presence of head and torso, car interior). Headrest speakers of different classes (open back, closed back, panel) are investigated with a nearfield scanning technique, and best practices for measurements are derived. In part two, we study how nonlinear adaptive control of transducers can improve the bass response and the overall quality of sound reproduction of headrest speakers, especially in typical applications relying on linear, time-invariant characteristics. The theoretical advantages of this approach have been discussed in earlier papers. Here, we will provide measurements to quantify the effect for different driver concepts (see part 1), discuss challenges and implications for concerned active sound algorithms such as individual sound zone control, hands-free communication, and active noise control. The results show that nonlinear adaptive loudspeaker control can considerably expand the usable frequency range while retaining robustness, reducing distortion, and providing a stable response even under varying environmental conditions.
AVAS design and optimization is a critical part of electric and hybrid vehicle development for regulatory purposes. Trial-and-error testing is often done to ensure compliance. However, this comes late in the vehicle design phase, take time, effort, and specialized test equipment and facilities, and is subject to testing variance which may overestimate or underestimate AVAS design viability. Increasingly, vehicle manufacturers and suppliers look to support AVAS design using simulation techniques. These can indicate during early design phases the sound levels at the AVAS certification measurement points from a given transducer and the sensitivity to different locations and angles of orientation. In the same simulation, the sound pressure on the vehicle glasses and body panels can be predicted, which can be combined with other simulation methods to predict how much noise from the AVAS transducer is transmitted to the interior and perceived by the vehicle occupants. In combination with objective optimization of transducer positioning for AVAS with acoustic transfer function simulation, subjective evaluation may also be carried out early in the design phase by combining candidate or measured AVAS transducer signals and inputs with the simulated acoustic transfer functions, which act as a filter from the output at the transducer to yield the virtual expected result at the AVAS measurement positions. This allows psychoacoustic evaluation of the expected sound at the AVAS measurement positions.