Study on the Impact of Multicollinearity Among Components on Near-Infrared Spectroscopy Quantitative Models
-
摘要: 在近红外光谱结合化学计量学方法进行定量分析中,变量之间存在的多重共线性是影响光谱模型效果的一个关键问题。探究了组分含量间的多重共线性对化学计量学定量模型的影响。设计维生素B6(低浓度)和维生素B1(高浓度)含量之间分别呈强相关性和弱相关性的两个体系,以维生素B6为目标待测组分,利用体系的近红外光谱信息结合偏最小二乘法建立组分含量的预测模型。结果表明,当体系中存在与目标组分具有强相关性的其它较高浓度共存组分时,模型可以借助该共存组分信息实现对较低浓度的目标组分更为准确地预测,从而提高了模型对目标组分定量分析的精确度。通过应用于检测含维生素B6和B1组分的市售口服液,进一步验证了组分含量之间存在强相关的多重共线性能够提高近红外光谱模型的定量预测能力。该研究结论具有很强的理论和实际应用价值,可以将其应用到复杂混合物体系组分的同时定量分析中。Abstract: In quantitative analysis using near-infrared spectroscopy combined with chemometric methods, multicollinearity among variables is a key issue affecting the performance of spectral models. This study investigates the impact of multicollinearity between component concentrations on chemometric quantitative models. Two systems were designed with strong and weak correlations between the concentrations of vitamin B6 (low concentration) and vitamin B1 (high concentration), respectively. Using vitamin B6 as the target component, prediction models for component concentrations were established using near-infrared spectral information combined with partial least squares regression. The results show that when there are coexisting components with high concentrations strongly correlated to the target component in the system, the model can utilize information from these coexisting components to achieve more accurate predictions of the lower-concentration target component, thereby improving the precision of quantitative analysis for the target component. The application of this approach to the detection of commercially available oral solutions containing vitamins B6 and B1 further verified that strong multicollinearity between component concentrations can enhance the quantitative predictive ability of near-infrared spectral models. The conclusions of this study have significant theoretical and practical application value and can be applied to the simultaneous quantitative analysis of components in complex mixture systems.
-
表 1 维生素B6的预测结果
Table 1. Prediction results for vitamin B6
目标组分 溶液体系 最佳主成分数 校正集 预测集 RPD $ {R}_{c}^{2} $ RMSEC $ {R}_{p}^{2} $ RMSEP 维生素B6 维生素B6单组分溶液 2 0.9540 1.2369 0.8988 1.4997 3.0489 强相关性双组分溶液 5 0.9898 0.6101 0.9676 1.0272 4.6435 弱相关性双组分溶液 5 0.9492 1.2657 0.8969 2.1114 2.7562 表 2 实际药物样本的预测结果
Table 2. Prediction results for actual drug samples
目标组分 最佳主成分数 校正集 预测集 RPD $ {R}_{c}^{2} $ RMSEC $ {R}_{p}^{2} $ RMSEP 维生素B6 5 0.9624 1.2435 0.9348 1.6019 2.8418 -
[1] 高峰, 邢雅阁, 罗华平, 等. 基于可见/近红外光谱与化学计量学的杏品种无损鉴别方法[J]. 光谱学与光谱分析, 2024, 44(1): 44-51. [2] WU J, PENG H, LI L, et al. FT-IR combined with chemometrics in the quality evaluation of Nongxiangxing baijiu[J]. Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy, 2023, 284: 121790. doi: 10.1016/j.saa.2022.121790 [3] TAKAMURA A, OZAWA T. Recent advances of vibrational spectroscopy and chemometrics for forensic biological analysis[J]. The Analyst, 2021, 146(24): 7431-7449. doi: 10.1039/D1AN01637G [4] PRUKSAPHA P, KHONGKAEW P, SUWANVECHO C, et al. Chemometrics-assisted spectroscopic methods for rapid analysis of combined anti-malarial tablets[J]. Journal of food and drug analysis, 2023, 31(2): 338-357. doi: 10.38212/2224-6614.3449 [5] LU Y Y, YAO G D, WANG X, et al. Chemometric discrimination of the geographical origin of licorice in China by untargeted metabolomics[J]. Food chemistry, 2022, 380: 132235. doi: 10.1016/j.foodchem.2022.132235 [6] 黄秀, 康嘉诚, 王淇, 等. 基于盲源分离的有机物混合信号特征提取与解析[J]. 计量学报, 2023, 44(4): 645-652. doi: 10.3969/j.issn.1000-1158.2023.04.23 [7] 李鑫, 沈晓君, 王媛媛, 等. 近红外光谱法快速测定晒青毛茶三种儿茶素组分含量[J]. 现代食品科技, 2024, 40(3): 326-332. [8] 李艳坤, 许东情. 基于中红外光谱模型对食用植物油掺伪的判别[J]. 河北大学学报(自然科学版), 2022, 42(6): 605-610. [9] 张正东, 李轲, 丁超民, 等. 利用近红外光谱有效化学信息建模快速识别醇基汽油种类[J]. 计量科学与技术, 2023, 67(12): 3-12. doi: 10.12338/j.issn.2096-9015.2023.0331 [10] 董丽华. 对多重共线性的分析及其补救措施[J]. 绥化学院学报, 2008(3): 171-173. doi: 10.3969/j.issn.2095-0438.2008.03.060 [11] TOKA O. A Comparative Study on Regression Methods in the presence of Multicollinearity[J]. Journal of Statisticians: Statistics and Actuarial Sciences, 2016, 9: 47-53. [12] 程介虹, 陈争光, 衣淑娟. 最小相关系数的多元校正波长选择算法[J]. 光谱学与光谱分析, 2022, 42(3): 719-725. doi: 10.3964/j.issn.1000-0593(2022)03-0719-07 [13] 李艳坤, 董汝南, 张进, 等. 光谱数据解析中的变量筛选方法[J]. 光谱学与光谱分析, 2021, 41(11): 3331-3338. [14] TANG R N, CHEN X P, LI C. Detection of Nitrogen Content in Rubber Leaves Using Near-Infrared (NIR) Spectroscopy with Correlation-Based Successive Projections Algorithm (SPA)[J]. Applied spectroscopy, 2018, 72(5): 740-749. doi: 10.1177/0003702818755142 [15] CHENG J H, SUN J, YAO K S, et al. A variable selection method based on mutual information and variance inflation factor[J]. Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy, 2022, 268: 12065. [16] 邵学广, 宁宇, 刘凤霞, 等. 近红外光谱在无机微量成分分析中的应用[J]. 化学学报, 2012, 70(20): 2109-2114. [17] COZZOLINO D, MORÓN A. The potential of near-infrared reflectance spectroscopy to analyse soil chemical and physical characteristics[J]. The Journal of Agricultural Science, 2003, 140(1): 65-71. doi: 10.1017/S0021859602002836 [18] CHODAK M, NIKLIŃska M, BEESE F O. Near-infrared spectroscopy for analysis of chemical and microbiological properties of forest soil organic horizons in a heavy-metal-polluted area[J]. Biology and Fertility of Soils, 2007, 44: 171-180. doi: 10.1007/s00374-007-0192-z [19] HAO Y, LU Y, LI X Y. Study on robust model construction method of multi-batch fruit online sorting by near-infrared spectroscopy[J]. Spectrochimica acta. Part A, Molecular and Biomolecular Spectroscopy, 2022, 280: 121478. doi: 10.1016/j.saa.2022.121478 [20] YI L, LI X L, LI W J, et al. Detection of chlorpyrifos and carbendazim residues in the cabbage using visible/near-infrared spectroscopy combined with chemometrics[J]. Spectrochimica acta. Part A, Molecular and Biomolecular Spectroscopy, 2021, 257: 119759. doi: 10.1016/j.saa.2021.119759 [21] VISNUPRIYAN R, FLANAGAN B M, HARPER K J, et al. Near infrared spectroscopy combined with chemometrics as tool to monitor starch hydrolysis[J]. Carbohydrate polymers, 2024, 324: 121469. doi: 10.1016/j.carbpol.2023.121469 [22] VEETTIL T C P, WOOD B R. A Combined Near-Infrared and Mid-Infrared Spectroscopic Approach for the Detection and Quantification of Glycine in Human Serum[J]. Sensors (Basel, Switzerland), 2022, 22(12): 4528. doi: 10.3390/s22124528 [23] JIANG Z Q, DU Y P, CHENG F, et al. A simple multiple linear regression model in near infrared spectroscopy for soluble solids content of pomegranate arils based on stability competitive adaptive re-weighted sampling[J]. Journal of Near Infrared Spectroscopy, 2021, 29: 140-147. doi: 10.1177/0967033520982366 [24] YUN Y H, LI H D, DENG B C, et al. An overview of variable selection methods in multivariate analysis of near-infrared spectra[J]. Trends in Analytical Chemistry, 2019, 103: 102-115. [25] HOSSEINI E, GHASEMI J B, DARAEI B, et al. Near-infrared spectroscopy and machine learning-based classification and calibration methods in detection and measurement of anionic surfactant in milk[J]. Journal of Food Composition and Analysis, 2021, 104: 104170. doi: 10.1016/j.jfca.2021.104170 [26] 李轲, 鲁冰, 杜彪, 等. 汽油中乙醇光谱特征谱段的有效选取及应用[J]. 计量科学与技术, 2022, 66(5): 19-24. [27] NICOLAI B M, BEULLENS K, BOBELYN E, et al. Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review[J]. Postharvest Biology and Technology, 2007, 46(2): 99-118. doi: 10.1016/j.postharvbio.2007.06.024 [28] KENNARD R W, STONE L A. Computer aided design of experiments[J]. Technometrics, 1969, 11: 137-148. doi: 10.1080/00401706.1969.10490666 [29] LU Z H, LU R T, YU C, et al. Nondestructive testing of pear based on fourier near-infrared spectroscopy[J]. Foods, 2022, 11(8): 1076. doi: 10.3390/foods11081076 [30] CHAN J Y L, LEOW S M H, BEA K T, et al. Mitigating the multicollinearity problem and its machine learning approach: A Review[J]. Mathematics, 2022, 10(8): 1283. doi: 10.3390/math10081283