Minsuk presents a neural vehicle sound synthesis framework based on differentiable digital signal processing (DDSP), conditioned on driving signals collected from the CAN bus of an internal combustion engine (ICE) vehicle, and demonstrates the feasibility of realistic and coherent vehicle sound synthesis within this framework. Three design choices are investigated for the proposed framework: the definition of the fundamental frequency (F0), the configuration of driving signal inputs, and the conditioning representation.
Specifically, a comparsion is made for crank-based and firing-based F0 definitions, multiple driving signal combinations constructed from engine RPM, gear level, accelerator pedal position, vehicle speed, and longitudinal acceleration, and two conditioning representations: direct and encoded conditioning. The framework is evaluated using objective and subjective measures together with qualitative spectrogram analysis. The results show that the crank-based F0 provides more accurate synthesis than the firing-based $F_0$ in the present four-cylinder four-stroke vehicle setting. Driving signal configurations with more complementary signals generally improve synthesis quality, while the contribution of each signal depends on its relationship with the other inputs. Encoded conditioning yields better objective performance, especially when the available driving signals are limited, whereas direct conditioning achieves the best perceptual results under full driving signal configuration and offers practical advantages in simplicity and efficiency. These findings provide practical guidelines for DDSP-based neural vehicle sound synthesis and suggest that conditioning DDSP on driving signals is a promising approach for automotive audio applications such as vehicle sound design and driving simulation.