特徴量寄与度分析に基づいた
歌声合成のための歌詞付き楽譜からの特徴選択

Evaluation Summary

Metric 既存データ 音素+音符長+音高(ヒュリースティック) 位置情報+音素 04_static_trj
Mel-cepstrum distortion (dB)
Lower is better
5.43778 5.53042 5.50459 5.38162 ★
GV distance
Lower is better
0.46402 ★ 0.57736 0.60858 0.48721
F0 RMSE (cent)
Lower is better
74.19561 74.00517 69.39025 ★ 72.32802
F0 correlation
Higher is better
0.97226 0.97221 0.97552 ★ 0.97217
Total voice/unvoice error (%)
Lower is better
2.14051 2.35665 2.13791 ★ 2.10536 ★

★ indicates the best performance for each metric. All experiments used the same test set (10 utterances, 76804 frames).

SHAP Analysis Results

SHAP Summary Plot

SHAP Summary Plot

Summary plot displaying the distribution of SHAP values across different features.

SHAP Group Bar Plot

SHAP Group Bar Plot

Group bar plot showing SHAP values grouped by different categories.

Average SHAP Summary Plot

Average SHAP Summary Plot

Average SHAP summary plot showing the overall feature importance across all samples.

SHAP Analysis Details

Feature Importance by Category

Top 20 Feature Categories by SHAP Value:

1. Pos_Cos (829-843) - 1.03e-04
2. C-Phone_Basic (195-320) - 3.34e-05
3. R-Phone_Basic (321-424) - 2.92e-05
4. Pos_Log (824-826) - 2.91e-05
5. RR-Phone_Basic (425-528) - 2.84e-05
6. L-Phone_Basic (100-194) - 1.79e-05
7. LL-Phone_Basic (5-99) - 1.56e-05
8. C-Note_Dynamic (710-710) - 1.46e-05
9. L-Syllable_Language (571-597) - 9.07e-06
10. RR-Phone_Boin_N (812-819) - 8.11e-06
11. L-Note_Abs_Scale (658-659) - 7.83e-06
12. R-Syllable_Language (631-657) - 7.53e-06
13. C-Phone_Flags (543-549) - 7.18e-06
14. R-Phone_Flags (550-556) - 5.50e-06
15. C-Syllable_Language (601-627) - 5.47e-06
16. RR-Phone_Flags (557-563) - 5.43e-06
17. Pos_Percent (827-828) - 5.31e-06
18. C-Phone_Boin_N (796-803) - 5.00e-06
19. C-Note_Prev_Delta_Abs_Scale (739-740) - 4.86e-06
20. C-Note_Prev_Tie_Slur (706-707) - 4.23e-06

Key Findings:
• Position features (Pos_Cos, Pos_Log, Pos_Percent) show highest importance
• Phone-related features dominate the top contributors
• Note dynamics and scale features are moderately important
• Language-specific features show varying levels of contribution
						

Feature Sparsity Analysis

Feature Sparsity by Category:

High Sparsity (>70% zeros):
• C-Phone_Flags: 71.4% zeros
• R-Phone_Flags: 71.4% zeros
• RR-Phone_Flags: 71.4% zeros
• L-Phone_Flags: 71.4% zeros
• LL-Phone_Flags: 71.4% zeros

Medium Sparsity (25-70% zeros):
• R-Phone_Basic: 24.0% zeros
• RR-Phone_Basic: 24.0% zeros
• L-Phone_Basic: 25.3% zeros
• LL-Phone_Basic: 27.4% zeros
• C-Note_Measure_Position: 25.0% zeros

Low Sparsity (<25% zeros):
• C-Phone_Basic: 19.0% zeros
• Most note and position features: 0% zeros

Implications:
• Phone flag features are highly sparse but still important
• Basic phone features show moderate sparsity
• Position and note features are dense and highly informative
• Sparsity patterns suggest feature selection opportunities