Options
Fast and Accurate SED modelling using Machine Learning
Author(s)
Papaleonidas, Petros
Advisor(s)
Panayidou, Klea
Abstract
Fast and Accurate SED modelling using Machine Learning is a supervised learning project aiming at exploring efficient representations of spectral energy distribution (SED) fitting codes by leveraging the immense capabilities of machine learning algorithms that have emerged during the last decade.
The focus will be on the supervised learning process of two alternative (surrogate) models, a neural network (ANN) and an ensemble regressor (HGB), based on a sufficiently large dataset of simulated galaxy spectra, generated with the CYGNUS (CYprus models for Galaxies and their NUclear Spectra) models.
The project is structured into the following chapters:
Chapter 1 provides a background of the problem surrounding the effective retrieval of the fundamental physical properties of galaxies by means of studying and modelling the vast and complex cosmological and astrophysical data selected by an ever-growing number of sources.
Chapter 2 offers an introduction into the principles of the proposed machine learning models (ANN, HGB), their architecture, and their fundamental functions. The objective is to familiarize the reader with the terminology and the data processing methods referred to at a later stage of the project.
Chapter 3 gives a description of surrogate models, along with the justification of the need to employ such models in the case of non-linear models with intractable parameter distributions.
As a matter of fact, the project follows the supervised learning of selected SED surrogate models to be eventually used in the inverse process of fast and accurate retrieval of physical parameters.
Chapter 4 is dedicated to the specifications and the implementation process of the end-to-end MARGE (Machine learning Algorithm for Radiative transfer of Generated Exoplanets)
package. A distinctive part of MARGE is the deep learning functionality (neural network) used to train the SED surrogate model.
Chapter 5 outlines the complete training pipeline of this project, starting from the justification of the CYGNUS model. The training data for the proposed machine learning models are made up of one million CYGNUS model simulations of the combined form (input model parameters – output spectra). The most important hyperparameters of the predictive models (ANN, HGB)
are optimized on a subset of the data and optimal models are trained on the entire dataset. Finally, regression analysis provides an insight into the appropriateness of both machine
learning models, as well as their comparative performance, by means of aggregate measures and individual image plots.
Appendix 1 specifies the list of code - and data adjustments that need to be applied to the source files for a functional implementation of the MARGE package.
Appendix 2 provides the complete coefficients of determination (R2-scores) achieved in testing by both predictive models.
The focus will be on the supervised learning process of two alternative (surrogate) models, a neural network (ANN) and an ensemble regressor (HGB), based on a sufficiently large dataset of simulated galaxy spectra, generated with the CYGNUS (CYprus models for Galaxies and their NUclear Spectra) models.
The project is structured into the following chapters:
Chapter 1 provides a background of the problem surrounding the effective retrieval of the fundamental physical properties of galaxies by means of studying and modelling the vast and complex cosmological and astrophysical data selected by an ever-growing number of sources.
Chapter 2 offers an introduction into the principles of the proposed machine learning models (ANN, HGB), their architecture, and their fundamental functions. The objective is to familiarize the reader with the terminology and the data processing methods referred to at a later stage of the project.
Chapter 3 gives a description of surrogate models, along with the justification of the need to employ such models in the case of non-linear models with intractable parameter distributions.
As a matter of fact, the project follows the supervised learning of selected SED surrogate models to be eventually used in the inverse process of fast and accurate retrieval of physical parameters.
Chapter 4 is dedicated to the specifications and the implementation process of the end-to-end MARGE (Machine learning Algorithm for Radiative transfer of Generated Exoplanets)
package. A distinctive part of MARGE is the deep learning functionality (neural network) used to train the SED surrogate model.
Chapter 5 outlines the complete training pipeline of this project, starting from the justification of the CYGNUS model. The training data for the proposed machine learning models are made up of one million CYGNUS model simulations of the combined form (input model parameters – output spectra). The most important hyperparameters of the predictive models (ANN, HGB)
are optimized on a subset of the data and optimal models are trained on the entire dataset. Finally, regression analysis provides an insight into the appropriateness of both machine
learning models, as well as their comparative performance, by means of aggregate measures and individual image plots.
Appendix 1 specifies the list of code - and data adjustments that need to be applied to the source files for a functional implementation of the MARGE package.
Appendix 2 provides the complete coefficients of determination (R2-scores) achieved in testing by both predictive models.
Date Issued
2023-10-26
Open Access
No
School
File(s)
No Thumbnail Available
Name
Michaloutsos+Michael+MSc+in+AI+Thesis.pdf
Type
main article
Size
5.37 MB
Format
Checksum