Keywords: data indexing algorithms, tree data structure, parallel computing, multithreading, databases, search queries, start node, target node
Development of a data indexing algorithm based on the CW-tree data structure using parallel computing
UDC 004.657
DOI: 10.26102/2310-6018/2021.32.1.013
The purpose of this scientific research was to develop an algorithm for data indexing using paral-lel computing technologies, the use of which would lead to a decrease in the time for processing queries to the database. In the course of the research, an algorithm for data indexing was devel-oped based on a special tree-based data storage structure called CW-tree (Constantly Wide tree). The CW-tree traversal in accordance with the proposed indexing algorithm is carried out in an asynchronous mode, which is achieved through the use of additional start vertices. Testing has shown that, in comparison with existing analogs of indexing algorithms, the proposed algorithm based on the CW-tree has advantages, the main of which is the complete parallelization of data retrieval both at the top level and at the level of the tree leaves. To establish the effect of using the CW-tree algorithm, a comparative study was carried out, namely: reading data using the B-tree, which is widely used in almost all modern database management systems, and reading data using the proposed CW-tree ... In the course of the study, queries to the database were built in such a way that a search was carried out for a specific key, it was established which approach would process a certain number of search queries faster. According to the results of the testing, it was found that searching for data in the CW-tree takes much less time than searching in the B-tree.
1. Nouri Z., Tu Y. GPU-Based Parallel Indexing for Concurrent Spatial Query Processing; Distributed and Parallel Databases. SSDBM'18: Proceedings of the 30th International Confer-ence on Scientific and Statistical Database Management. 2018;1-12. Available at: https://doi.org/10.1145/3221269.3221296 (accessed 20.01.202)
2. Shun J., Blelloch G. E. A Simple Parallel Cartesian Tree Algorithm and its Application to Parallel Suffix Tree Construction. Proceedings of the Thirteenth Workshop on Algorithm En-gineering and Experiments (ALENEX). 2011;(1):48-58.
3. Jarke M., Koch J. Query optimization in database systems. ACM Computing Surveys (CsUR). 1984;16(2):111-152.
4. Smith J. M., Chang P. Y. T. Optimizing the performance of a relational algebra database interface. Communications of the ACM (CACM). 1975;18(10):568-579.
5. Wu S., Li F., Mehrotra S., Chhin Ooi B. Query Optimization for Massively Parallel Da-ta Processing. SoCC Conference. 2011;(12):1–13.
6. Shnaiderman L., Shmueli O. A Parallel Tree Pattern Query Processing Algorithm for Graph Databases using a GPGPU. Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference (EDBT/ICDT). 2020;1330(24):141. Available at: http://ceur-ws.org/Vol-1330/paper-24.pdf (accessed 20.01.2020)
7. Battre D., Angulo D. S. MPI framework for parallel searching in large biological data-bases. Journal of Parallel and Distributed Computing. 2006;66(12):1503–1511.
8. Dewan H. M., Stolfo S. J. Scalable Parallel and Distributed Expert Database Systems with Predictive Load Balancing. Journal of Parallel and Distributed Computing. 1994;22(3):506–522.
9. Ren D. Q. Algorithm level power efficiency optimization for CPU-GPU processing element in data intensive SIMD/SPMD computing. Journal of Parallel and Distributed Computing. 2011;72(2):245–253.
10. Liu P., Sun W., Zhang J., Zheng B. An automaton-based index scheme supporting twig queries for on-demand XML data broadcast. Journal of Parallel and Distributed Computing. 2015;(86):82–97.
11. Wani M. A., Bhat W. A. Dataset for forensic analysis of B-tree file system. Data in Brief. 2018;(18):2013–2018.
Keywords: data indexing algorithms, tree data structure, parallel computing, multithreading, databases, search queries, start node, target node
For citation: Shevskiy V.S. Development of a data indexing algorithm based on the CW-tree data structure using parallel computing. Modeling, Optimization and Information Technology. 2021;9(1). URL: https://moitvivt.ru/ru/journal/pdf?id=905 DOI: 10.26102/2310-6018/2021.32.1.013 (In Russ).
Published 31.03.2021