Abstract:
In the wake of the advancements in high-throughput sequencing technology, researchers are now equipped with the capacity to conduct in-depth analyses and processing of human genome sequencing data. The quality of these data inevitably serves as a pivotal factor impacting the credibility of analysis results. As such, precise quality assessment becomes a paramount process to circumvent needless loss and to ascertain the accuracy of outcomes. Both the academic and industrial communities place significant emphasis on data quality assessment, having introduced numerous methods for such assessment and developed a multitude of tools like FastQC and Qualimap software, along with various standard materials and standard reference data, which collectively underpin data quality assessment. However, there are scant systematic investigations of toolsets employed in each assessment stage and summarizations of toolset characteristics. Furthermore, the process of data quality assessment is laden with numerous issues and challenges. To aid human genome data assessment endeavors, this paper delves into potential solutions for these problems and puts forth several practically significant suggestions for reference.