As intelligent cockpits develop, music source separation (MSS) is increasingly used in automotive audio to address complex sound mixtures failing to meet users’ diverse needs. In karaoke, it separates vocals and accompaniment for humming and enables independent male/female vocal volume adjustment for duets. For in-vehicle audio up-mixing, extracted stems reconfigure stereo mixes into cockpit-optimized multichannel layouts. For real-time rendering, it is required to enhance specific tracks to adapt to cabin noise and user preferences. However, audio sources other than vocal, bass and drums lack research for real-time automotive applications. This paper proposes targeted optimizations for data augmentation and model structure: for target tracks (guitar, piano, male/female lead/backing vocals), a parallel single-track model is used for piano/guitar separation, and a two-stage model for male/female voice separation (first separating general vocals, then splitting into lead and backing vocals). A "Random Mixing" and "Aligned Mixing" combined method adapts to harmonic overlap in real songs. In terms of loss function, besides time-domain L1-loss and Multi-scale STFT loss, a GAN-based training procedure with individualized discriminators for each instrument stem improves audio quality and separation accuracy. Training uses a dataset from MedleyDB, MoiseDB and 3,000 private songs. To enhance the real-time causal model’s time-dimension receptive field, a modified SCNet with dilated convolutions and source-based band split is adopted. The model achieves 64 ms latency with 4.36M parameters, and its SDR values (6 dB piano, 5.6 dB guitar, 8.9 dB male vocals, 7.6 dB female vocals) outperform SOTA models like DTTNet and SCNet.