Malik, Zargham Nazeer (2025) Implementation and parallelization of SOMs in Structured Query Language (SQL). Masters thesis, Universität Rostock.
| ![[img]](https://eprints.dbis.informatik.uni-rostock.de/style/images/fileicons/text.png) | Text Master_Thesis_final.pdf Download (2MB) | 
Abstract
Self-Organizing Maps (SOMs) find extensive applications in pattern discovery and clustering but are computationally intensive to train, particularly with big data. Parallelization of SOM training in PostgreSQL is investigated here by spreading neuron computations over plurality of schemas for improved performance and scalability. The study was originally planned to carry out multi-server parallelization with the assistance of PostgreSQL-XL. As there was no multiple server and PostgreSQL-XL was not implemented in the university environment, the study focused on schema-based parallelization on a single server database. Parallel processing of neuron data among schemas helped the workload run much quicker. The experiment was run on a high-end machine with parallel query execution and PostgreSQL. For the experiment, a dataset size of 100,000 feature vectors was used against a 100x100 grid of SOMs. The experiment showed that the parallelization process based on a schema optimizes computation effectively than the legacy single-schema solution, reducing the training time. Even with all these advancements, there are certain limitations. Scalability with multiple servers was not feasible to test under the study, and the performance improvement is restricted to the capability of a single server. Other distributed database solutions like CitusDB and hybrid solutions with schema-based parallelism with GPU process or cloud process will be studied further. Briefly, distributing SOM training across schemas in PostgreSQL greatly improves efficiency. Parallelization across multiple servers is yet to be attempted on minimal infrastructure, but schema-based distribution is a effective optimization technique. Testing distributed database platforms and hybrid parallelization techniques should be included in future work in an attempt to further improve the scalability of SOM training.
| Item Type: | Thesis (Masters) | 
|---|---|
| Subjects: | Autorenart > Studentische Arbeiten > Masterarbeit Forschungsthemen > Big Data Analytics Autorenart > Studentische Arbeiten | 
| Depositing User: | Dbis Admin | 
| Date Deposited: | 01 Jul 2025 08:03 | 
| Last Modified: | 01 Jul 2025 08:03 | 
| URI: | https://eprints.dbis.informatik.uni-rostock.de/id/eprint/1131 | 
Actions (login required)
|  | View Item | 
 
        