Minsuk presents a neural vehicle sound synthesis framework based on differentiable digital signal processing (DDSP), conditioned on driving signals collected from the CAN bus of an internal combustion engine (ICE) vehicle, and demonstrates the feasibility of realistic and coherent vehicle sound synthesis within this framework. Three design choices are investigated for the proposed framework: the definition of the fundamental frequency (F0), the configuration of driving signal inputs, and the conditioning representation.
Specifically, a comparsion is made for crank-based and firing-based F0 definitions, multiple driving signal combinations constructed from engine RPM, gear level, accelerator pedal position, vehicle speed, and longitudinal acceleration, and two conditioning representations: direct and encoded conditioning. The framework is evaluated using objective and subjective measures together with qualitative spectrogram analysis. The results show that the crank-based F0 provides more accurate synthesis than the firing-based $F_0$ in the present four-cylinder four-stroke vehicle setting. Driving signal configurations with more complementary signals generally improve synthesis quality, while the contribution of each signal depends on its relationship with the other inputs. Encoded conditioning yields better objective performance, especially when the available driving signals are limited, whereas direct conditioning achieves the best perceptual results under full driving signal configuration and offers practical advantages in simplicity and efficiency. These findings provide practical guidelines for DDSP-based neural vehicle sound synthesis and suggest that conditioning DDSP on driving signals is a promising approach for automotive audio applications such as vehicle sound design and driving simulation.
In automotive audio systems, musical beats and drum events can control synchronized in-cabin experiences such as ambient lighting and music-driven visual effects. Compared to beat tracking, automatic drum transcription (ADT) offers richer control signals by detecting and classifying drum onsets of multiple drum classes (e.g., kick, snare, and hi-hat), enabling more precise and musically meaningful synchronization. Deploying ADT in vehicles, however, requires low latency, computational efficiency, and robust performance for various input signals. This paper investigates improvements to low-latency ADT suitable for automotive deployment, using the Separate-Tracks-Annotate-Resynthesize Drums (STAR Drums) dataset and a block-based processing strategy that achieves an average detection delay of around 60 ms. We explore three strategies: (1) lightweight architecture modifications inspired by recent advances in image classification, combined with a temporal convolutional network (TCN); (2) re-rendering STAR Drums to increase drum timbre diversity and augmenting the re-synthesized drum stems; and (3) refinement training with pseudo labels obtained from source-separated mixtures. Our results show that data augmentation and increased drum timbre diversity yield modest performance gains, whereas pseudo-label refinement provides the largest effect, with up to 18 % relative improvement in global F-measure. In the real-time eight-class setting, our best model achieves a global F-measure of 0.76 on MDB Drums, competitive with state-of-the-art offline systems, demonstrating that accurate and efficient ADT is feasible for automotive deployment.
Signal processing using artificial intelligence (AI) has gained increasing interest because it outperforms existing solutions in many fields. A significant challenge for deep neural networks lies in meeting strict requirements regarding latency, computational load and memory, which is vital in automotive audio. This paper presents CUpGAN (Conditional Upmix GAN), a computationally efficient method for extracting upmix signals with low latency, leveraging signal separation for two upmixing concepts using a conditional generative adversarial network (CGAN). One upmix approach utilize spatial positions of direct sources within the stereo image, allowing for the distribu- tion of sources around the listener. The second approach separates direct and diffuse signals to create an ambience signal for rear surround loudspeakers. By employing phase-aware loss functions, integrating residual connections in the generator, and training with coherent input and target signals, we achieve high sound quality in the generated signals. This methodology also facilitates the computation of a cost-efficient complementary signal for both upmixing concepts through the difference between input and generated signals. The proposed technique reduces memory as 96% of the parameters can be shared between both applications, allowing seamless switching between upmixing approaches without the need for parameter loading; instead, parameters are computed by a small control network. The GAN generator is trained on synthetically generated data, enabling control over separation characteristics that surpass traditional methods. We present an evaluation using listening tests and computational metrics, demonstrating the advantages of our approach compared to classical signal processing methods.
As intelligent cockpits develop, music source separation (MSS) is increasingly used in automotive audio to address complex sound mixtures failing to meet users’ diverse needs. In karaoke, it separates vocals and accompaniment for humming and enables independent male/female vocal volume adjustment for duets. For in-vehicle audio up-mixing, extracted stems reconfigure stereo mixes into cockpit-optimized multichannel layouts. For real-time rendering, it is required to enhance specific tracks to adapt to cabin noise and user preferences. However, audio sources other than vocal, bass and drums lack research for real-time automotive applications. This paper proposes targeted optimizations for data augmentation and model structure: for target tracks (guitar, piano, male/female lead/backing vocals), a parallel single-track model is used for piano/guitar separation, and a two-stage model for male/female voice separation (first separating general vocals, then splitting into lead and backing vocals). A "Random Mixing" and "Aligned Mixing" combined method adapts to harmonic overlap in real songs. In terms of loss function, besides time-domain L1-loss and Multi-scale STFT loss, a GAN-based training procedure with individualized discriminators for each instrument stem improves audio quality and separation accuracy. Training uses a dataset from MedleyDB, MoiseDB and 3,000 private songs. To enhance the real-time causal model’s time-dimension receptive field, a modified SCNet with dilated convolutions and source-based band split is adopted. The model achieves 64 ms latency with 4.36M parameters, and its SDR values (6 dB piano, 5.6 dB guitar, 8.9 dB male vocals, 7.6 dB female vocals) outperform SOTA models like DTTNet and SCNet.
MEMS microphones in general and automotive – a tutorial Just as the semiconductor content in cars is increasing, so do MEMS microphones play an increasing role – both technology and application driven. Technology driven, because the legacy Electret Condenser Microphones are giving way to leveraging the benefits of MEMS technology, in automotive as they have in consumer electronics. Application driven because MEMS microphones play an increasing role in driver and passenger comfort, safety and the way we interact with our cars. This tutorial aims to give a broad overview of MEMS microphones with a particular focus on automotive. The participants will walk away knowing about MEMS microphone technology, its capabilities and (current) limits and applications in vehicles.
With the increasing adoption of electric vehicles (EVs), the number of vehicles equipped with active driving sound (ASD) systems has also grown. Although many EV driving sounds emulate the acoustic characteristics of internal combustion engine (ICE) vehicles, EVs lack engine induced vibrations, resulting in a mismatch between auditory cues and tactile seat sensations. Providing vibrations synchronized with virtual engine sound can mitigate this discrepancy and enhance driver immersion. This study analyzes the seat vibration characteristics of an ICE vehicle and proposes a vibration generation algorithm that integrates vehicle state information with already existing ASD sound. The algorithm was implemented on a DSP platform, and vibration actuators were installed in an EV seat to deliver tactile feedback in conjunction with ASD sound.
Engine orders are a major source of tonal noise in vehicle cabins. In production engine order cancellation (EOC), good performance is not only high attenuation, but also stable behavior in real conditions. After successful cancellation, the target order can become masked by broadband noise (low SNR). In addition, uncorrelated in-band intrusions may occur in the same frequency region. In these cases, observation-based on/off rules that rely on the current visibility of the tone can cause false deactivation and mode chattering. This paper presents a multi-channel EOC method for an in-cabin audio system. Orders are modeled in real time using a quadrature sin–cos basis synchronized with rotational-speed information. The main contribution is an order-level state logic that uses a reconstructed estimate of the current control contribution through secondary-path models. This enables a practical distinction between “absent/irrelevant” and “suppressed but masked,” and supports stable operation by scheduling the adaptation step size and the accumulation length in order periods. The method also accounts for fast RPM changes by shortening the accumulation window while keeping a moderate adaptation rate for tracking. A second contribution is a dynamic order-management layer that periodically evaluates residual-based order salience (accounting for current cancellation) and selects a bounded top-K set for control. The layer includes optional fade-in/out transitions for switching and a lightweight safeguard that de-rates poorly controllable orders to reduce the risk of noise amplification. The method is evaluated on a low-reverberation bench and in a vehicle under steady-speed and run-up conditions, including background noise and audio playback.
Active Road Noise Control (RNC) has become an important complement to passive treatments for mitigating low frequency tire–pavement noise in automotive cabins. While numerous global and local RNC systems have been reported in the literature and implemented in production vehicles, their performance is often evaluated using inconsistent indicators and measurement methodologies, making direct comparison difficult. This paper presents a critical review of performance evaluation metrics and measurement methods for automotive RNC systems. Key performance indicators, including sound pressure level reduction, effective frequency range, spatial effectiveness, adaptability, and robustness, are discussed with an emphasis on noise reduction measurements used in both industrial practice and academic research. Existing standards and test procedures are reviewed alongside commonly used experimental methods, such as single point microphone measurements, artificial head measurements, and small and large microphone array techniques. Reported noise reduction performance of representative global and local RNC systems is summarized to illustrate the influence of system architecture and measurement methodology on published results. The review highlights current limitations in RNC performance assessment, including test scenario complexity, measurement variability, and the lack of unified international standards. These findings provide guidance for selecting appropriate evaluation methods and support future development and standardization of automotive RNC system evaluation and measurement.
This research focuses on two of the main challenges for headrest loudspeakers in cars – their directivity and their low-frequency performance. In part one, we discuss a method to characterize the directivity in meaningful ways despite the complex acoustic target environment (acoustic near field conditions, presence of head and torso, car interior). Headrest speakers of different classes (open back, closed back, panel) are investigated with a nearfield scanning technique, and best practices for measurements are derived. In part two, we study how nonlinear adaptive control of transducers can improve the bass response and the overall quality of sound reproduction of headrest speakers, especially in typical applications relying on linear, time-invariant characteristics. The theoretical advantages of this approach have been discussed in earlier papers. Here, we will provide measurements to quantify the effect for different driver concepts (see part 1), discuss challenges and implications for concerned active sound algorithms such as individual sound zone control, hands-free communication, and active noise control. The results show that nonlinear adaptive loudspeaker control can considerably expand the usable frequency range while retaining robustness, reducing distortion, and providing a stable response even under varying environmental conditions.
AVAS design and optimization is a critical part of electric and hybrid vehicle development for regulatory purposes. Trial-and-error testing is often done to ensure compliance. However, this comes late in the vehicle design phase, take time, effort, and specialized test equipment and facilities, and is subject to testing variance which may overestimate or underestimate AVAS design viability. Increasingly, vehicle manufacturers and suppliers look to support AVAS design using simulation techniques. These can indicate during early design phases the sound levels at the AVAS certification measurement points from a given transducer and the sensitivity to different locations and angles of orientation. In the same simulation, the sound pressure on the vehicle glasses and body panels can be predicted, which can be combined with other simulation methods to predict how much noise from the AVAS transducer is transmitted to the interior and perceived by the vehicle occupants. In combination with objective optimization of transducer positioning for AVAS with acoustic transfer function simulation, subjective evaluation may also be carried out early in the design phase by combining candidate or measured AVAS transducer signals and inputs with the simulated acoustic transfer functions, which act as a filter from the output at the transducer to yield the virtual expected result at the AVAS measurement positions. This allows psychoacoustic evaluation of the expected sound at the AVAS measurement positions.
Currently, no widely accepted measurement methodology exists for in-car RNC systems. Automotive OEMs, suppliers, and research institutions apply different procedures, measurement microphone configurations, driving conditions, and performance metrics, making it difficult to compare system performance across vehicles, development teams, and research studies. In response to this situation, the NVH And Sound (NAS) Technical Subcommittee under TCAA established a dedicated Work Group to look into developing a common measurement and evaluation framework for in-car RNC systems.
The current market landscape for electric vehicles has in-cabin noise trending quieter and quieter with each new generation of vehicles. But as physical and economic limits for acoustic isolation are reached, that same landscape converges on the same exact experience of quietness for every car, regardless of brand. However, what if this same quiet vehicle was instead thought of as a canvas, allowing for the intentional design of every aspect of the user experience? Active Sound Design is one way to harness these ideas, but when augmented with Active Vibration Design, automakers can embed a deep sense of identity within every vehicle. This workshop, hosted by HEAD acoustics and GHSP, invites attendees to consider how a vehicle communicates its identity to its occupants, and how that can be harnessed through sound and vibration design. This workshop details the process of developing, prototyping, tuning, and deploying multiple Active Experience Design profiles within a real vehicle that attendees of the AES Automotive Audio Conference can drive.
In automotive audio, sound design plays a key role in defining brand identity and perceived quality. At the same time, development is constrained by long approval cycles, tight timelines, and the need for high consistency from early concept phases to series production. This workshop presents a holistic workflow, combining state-of-the-art Active Noise Control algorithms and vast Sound Design capabilities into a seamless, production-ready process. The workshop focuses on the integration of m|klang® e by Müller-BBM, a software framework for Active Noise Control and Sound Synthesis, with Max by Cycling ’74, a widely adopted visual programming environment used by sound designers and digital media creatives. Together, these tools enable designers and engineers to collaborate within a shared environment, starting at the earliest creative stages, continuing through in-vehicle tuning and production deployment, seamlessly reiterating these steps as many times as necessary. Realistic vehicle behavior can be made available within the early stage design process by replaying recorded vehicle signals (e.g., CAN traces), connecting to NVH simulators, or interfacing directly with a vehicle. This allows both the acoustic content and the associated control logic to be designed and evaluated under realistic operating conditions. Acoustic consistency is ensured across all development stages; the same sound behavior heard on a laptop during design is reproduced in the target hardware in the vehicle. The workflow further supports live in-vehicle tuning and the generation of production datasets, including post-production sound updates via software or over-the-air deployment.