gal_goku Package
This is the main package containing the emulator framework and galaxy clustering models.
Main Modules
gal Module
GalBase Class
LargeScaleGal Class
emus Module
To evaluate all the emulators for the Halo Mass Function. This is the interface of single_fid.py and summary_stats.py to work with the generated emulated Halo Mass Fucntion functions.
- class gal_goku.emus.BaseStatEmu(X, Y, model_error, logging_level='info', multi_bin=False)[source]
Bases:
object- __init__(X, Y, model_error, logging_level='info', multi_bin=False)[source]
The base emu to be inherited by single fidelity emulators built on any summary statistics This is the interface for all classses above.
- class gal_goku.emus.Hmf(data_dir, y_log=True, fid='L2', multi_bin=False, logging_level='INFO', narrow=False, no_merge=True)[source]
Bases:
BaseStatEmu- __init__(data_dir, y_log=True, fid='L2', multi_bin=False, logging_level='INFO', narrow=False, no_merge=True)[source]
data_dir: Directory where the data is stored r_range: Range of r to consider fid: The fiducial cosmology to consider, default is ‘L2’ logging_level: Logging level cleaning_method: Method to clean the negative bins,
default is ‘linear_interp’; otherwuse replace with a small number, e.g. 1e-10
BaseStatEmu Class
- class gal_goku.emus.BaseStatEmu(X, Y, model_error, logging_level='info', multi_bin=False)[source]
Bases:
objectBase emulator class for summary statistics.
Key Methods:
- leave_bunch_out(n_out=5, narrow=0)[source]
Leaves out a random bunch of samples out n_out: Number of samples to leave out
- __init__(X, Y, model_error, logging_level='info', multi_bin=False)[source]
The base emu to be inherited by single fidelity emulators built on any summary statistics This is the interface for all classses above.
Hmf Class
- class gal_goku.emus.Hmf(data_dir, y_log=True, fid='L2', multi_bin=False, logging_level='INFO', narrow=False, no_merge=True)[source]
Bases:
BaseStatEmuHalo Mass Function emulator.
Parameters:
data_dir (str): Directory containing the training data
y_log (bool): Whether to use log-space for predictions
fid (str): Fiducial cosmology identifier
multi_bin (bool): Use multi-bin mode
logging_level (str): Logging verbosity level
narrow (bool): Use narrow parameter range
no_merge (bool): Don’t merge data from multiple sources
- __init__(data_dir, y_log=True, fid='L2', multi_bin=False, logging_level='INFO', narrow=False, no_merge=True)[source]
data_dir: Directory where the data is stored r_range: Range of r to consider fid: The fiducial cosmology to consider, default is ‘L2’ logging_level: Logging level cleaning_method: Method to clean the negative bins,
default is ‘linear_interp’; otherwuse replace with a small number, e.g. 1e-10
emu_cosmo Module
Cosmological emulator utilities.
halo_tools Module
Tools for halo manipulations, mass conversions, and halo model calculations.
init_power Module
Class to generate simulation ICS, separated out for clarity.
- class gal_goku.init_power.SimulationICs(*, seed: int = 9281110, redshifts: float = 99, redend: float = 0, omega0: float = 0.288, omegab: float = 0.0472, hubble: float = 0.7, scalar_amp: float = 2.427e-09, ns: float = 0.97, rscatter: bool = False, m_nu: float = 0, nu_hierarchy: str = 'normal', uvb: str = 'pu', nu_acc: float = 1e-05, unitary: bool = True, w0_fld: float = -1.0, wa_fld: float = 0.0, N_ur: float = 3.044, alpha_s: float = 0, MWDM_therm: float = 0, python: str = 'python')[source]
Bases:
objectClass for creating the initial conditions for a simulation. There are a few things this class needs to do:
Generate CAMB input files
Generate MP-GenIC input files (to use CAMB output)
Run CAMB and MP-GenIC to generate ICs
The class will store the parameters of the simulation.
We also save a copy of the input and enough information to reproduce the results exactly in SimulationICs.json.
Many things are left hard-coded.
We assume flatness.
Init parameters:
redshifts - redshifts at which to generate ICs omegab - baryon density. Note that if we do not have gas particles,
still set omegab, but set separate_gas = False
- omega0 - Total matter density at z=0 (includes massive neutrinos and
baryons)
hubble - Hubble parameter, h, which is H0 / (100 km/s/Mpc) scalar_amp - A_s at k = 0.05, comparable to the Planck value. ns - Scalar spectral index m_nu - neutrino mass unitary - if true, do not scatter modes, but use a unitary gaussian
amplitude.
Remove:
- separate_gas - if true the ICs will contain baryonic particles;
If false, just DM.
- cambfile(maxk=0.5) Tuple[numpy.ndarray, numpy.ndarray][source]
Generate the IC power spectrum using classylss.
Basically is using pre_params feed into class and compute the powerspec files based on the range of redshift and redend. All files are stored in the directory camb_out.
Return:
camb_output (str) : power specs directory. default: “camb_linear/”
- gal_goku.init_power.get_neutrino_masses(total_mass: float, hierarchy: str) numpy.ndarray[source]
Get the three neutrino masses, including the mass splittings. Hierarchy is ‘inverted’ (two heavy), ‘normal’ (two light) or degenerate.
Initial conditions and linear power spectrum calculations using ClassyLSS.
summary_stats Module
Loading the computed summary statistics from the data files.
- class gal_goku.summary_stats.BaseSummaryStats(data_dir, fid, narrow=False, MPI=None, logging_level='INFO')[source]
Bases:
objectBase class for summary statistics
- configure_logging(logging_level='INFO', logger_name='summary_stats')[source]
Sets up logging based on the provided logging level in an MPI environment.
- class gal_goku.summary_stats.HMF(data_dir, fid, z=2.5, mass_range=(11.1, 12.5), narrow=False, no_merge=True, chi2=False, logging_level='INFO')[source]
Bases:
BaseSummaryStatsHalo mass function
- get_coeffs(ind=None, delta_r=None, *kwargs)[source]
Retrun the spline fits in an array Parameters: ————– ind: np.ndarray, optional, default=None
Index of the simulations to fit. If None, fit all simulations
- kwargs: dict
Keyword arguments for utils.ConstrainedSplineFitter
Returns:
2D array of the spline coefficients and the knots (coeffs, knots)
- get_counts()[source]
Return the halo mass function for all the simulations and the corresponding mask for the missing bins. TODO: We can improve this by not removing the very low count bins at high mass end and just rely on the Poisson likelihood to use that bins too.
- Returns:
full_bins (np.ndarray) – The uniform mass bins used for all the simulations
log_hmfs (np.ndarray) – The halo mass function for all the simulations, natural log scale
mult_fac (np.ndarray) – The multiplicative factor to convert the hmf to counts in bins - It is zeros for the massive missing bins.
params (np.ndarray) – The cosmological parameters for each simulation
sim_tags (np.ndarray) – The simulation tags
- get_data(get_counts=False, noise_floor=0.0)[source]
Get the data for the emulator :param get_counts: Whether to get log(hmf) with uncertainties (True) or
halo counts in bins (False)
- Parameters:
noise_floor (float) – The fractional noise floor to add in quadrature to the Poisson errors when get_counts is False
- Returns:
full_bins (np.ndarray) – The uniform mass bins used for all the simulations
log_hmfs or counts (np.ndarray) – if get_counts is False: The halo mass function for all the simulations, log10 scale else: The halo counts in the bins for all the simulations
log_hmf_errs (np.ndarray) –
- if get_counts is False:
The uncertainties for the halo mass function, log10 scale
- else:
Array of zeros
params (np.ndarray) – The cosmological parameters for each simulation
sim_tags (np.ndarray) – The simulation tags
- get_smoothed(x, delta_r=None, ind=None, *kwargs)[source]
Get the smoothed halo mass function evaluated at x Parameters: ————– x: np.ndarray or list of np.ndarray
Array of x values to evaluate the spline at
- ind: np.ndarray, optional, default=None
Index of the simulations to fit. If None, fit all simulations
- kwargs: dict
Keyword arguments for utils.ConstrainedSplineFitter
- class gal_goku.summary_stats.HaloHaloPower(data_dir, z, max_k=0.5, log_k_interp=None, fid='HF', logging_level='INFO')[source]
Bases:
BaseSummaryStatsClass to prepare the propagator for the emulator.
- class gal_goku.summary_stats.MatterPower(data_dir, z, max_k=0.5, log_k_interp=None, fid='HF', logging_level='INFO')[source]
Bases:
BaseSummaryStatsClass to prepare the propagator for the emulator.
- class gal_goku.summary_stats.ProjCorr(data_dir, fid, logging_level='INFO')[source]
Bases:
BaseSummaryStatsProjected correlation function, w_p
- bin_in_param_space(r_range=(0, 100), num_bins=2)[source]
Computet the average wp of all the simulations with low and high values of the cosmological parameters :param r_range: The range of r values to consider :type r_range: tuple :param num_bins: Number of bins to divide the cosmological parameters :type num_bins: int
- Returns:
rp (array) – The r values
av_wp (array) – The average wp for each bin
all_bins (array) – The bin edges for each parameter
- get_labels()[source]
Get the labels we use for each simulation, they are in this format
10p_Box{BoxSize}_Par{Npart}_0001
- class gal_goku.summary_stats.Propagator(data_dir, z, max_k=0.5, fid='HF', logging_level='INFO')[source]
Bases:
BaseSummaryStatsClass to prepare the propagator for the emulator.
- __init__(data_dir, z, max_k=0.5, fid='HF', logging_level='INFO')[source]
Initialize the Propagator class.
- Parameters:
basedir (str) – Base directory where the data files are located.
- get_model_gk(k)[source]
Get the propagator G(k) = P_hm / P_lin at arbitrary k values. Returns: ——- gk: np.ndarray, shape=(n_sims, n_mbins, n_k)
- get_powers()[source]
Get the computed propagator computed for each simulation. It is:
G(k, m) = P_hlin(k, m) / P_init(k, m) where P_hlin is the cross correlation of the halos and the initial power spectrum, P_init is the initial power spectrum, k and m are the wavenumber and mass bins respectively.
- model(k, g0, g2, g4, sigma_d)[source]
The model for the propagator G(k) = P_hm / P_lin G(k) = (g_0 + g_2 k^2 + g_4 k^4) exp(- sigma_d^2_lin k^2 / 2 ) where sigma_d is the Zeldovich displacement. Returns: ——- G(k): np.ndarray, shape=(n_sims, n_mbins)
The propagator G(k) for each simulation and mass bin.
- class gal_goku.summary_stats.Xi(data_dir, fid, mass_range=(11.0, 12.32), narrow=False, MPI=None, logging_level='INFO')[source]
Bases:
BaseSummaryStatsSpeherically Averaged correlation function
- __init__(data_dir, fid, mass_range=(11.0, 12.32), narrow=False, MPI=None, logging_level='INFO')[source]
Calling this will load the xi(r, n1, n2) for all the simulations Parameters: ————– data_dir: str
The directory containing the data
- fid: str
The folder id
- narrow: bool, optional, default=False
Whether it is Goku-narrow or not
- logging_level: str, optional, default=’INFO’
The logging level
- get_wt_err(rcut=(0.2, 61), remove_sims=None)[source]
Return the correlation function for all the simulations and the corresponding uncertainties. NOT: For now we assign very large uncertainties to the bins with NaN values and simulations missing more than alpha_bad fraction of the r-bins for some massive mass pairs. ToDo: Use Jacknife resampling to get the uncertainties :param rcut: The range of r values to consider for fitting :type rcut: tuple :param alpha_bad: The fraction of bad points to consider :type alpha_bad: float
- Returns:
bins (np.ndarray, shape=(n,3)) – The bins of the correlation function in (r, m1, m2) format
corr (np.ndarray, shape=(n*n_bins,)) – The correlation function for the “remaining” bins at (r, m1, m2) bins
corr_err (np.ndarray, shape=(n*n_bins)) – The correlation function errors for the “remaining” bins at (r, m1, m2) bins
params (np.ndarray, shape=(n*n_bins, prams_size)) – The cosmological parameters for the “remaining” bins
- get_wt_err_interped_deprecated(rcut=(0.2, 61), good_sims=False)[source]
- Parameters:
rcut (tuple) – The range of r values to consider for fitting
good_sims (bool) – Whether to use only the good simulations. The good simulations are defined as those with a low fraction of bad points which is hard-coded in the method hard_coded_good_l2_sims() for now. This removes ~ 150 L2 simulations.
- Returns:
bins (np.ndarray, shape=(n,3)) – The bins of the correlation function in (r, m1, m2) format
interped_log_corr (np.ndarray, shape=(n*n_bins,)) – The correlation function for the “remaining” bins at (r, m1, m2) bins. Replacing the missing values with a minimal spline interpolation
corr_err (np.ndarray, shape=(n*n_bins)) – The correlation function errors for the “remaining” bins at (r, m1, m2) bins
params (np.ndarray, shape=(n*n_bins, prams_size)) – The cosmological parameters for the “remaining” bins
- get_xi_n1_n2_r()[source]
Get xi(n1, n2, r) for all simulations Returns: ————– rbins: np.ndarray, shape=(n_rbins,)
The r values
- mass_bins: np.ndarray, shape=(n_mass_bins,)
The mass bins, storing the mass values in descending order. This is the order the corr is stored
- all_corrs: np.ndarray, shape=(n_sims, n_mass_bins, n_mass_bins, n_rbins)
The xi(n1, n2, r) for all the simulations
- hard_coded_good_l2_sims()[source]
Following on https://github.com/qezlou/private-gal-emu/issues/39#issuecomment-3226453844 there are ~ 150 L2-W sims that miss mroe than 25% of their xi(r) values for the largest mass bins, this is the list of those sims
- make_3d_corr(corr, symmetric=False)[source]
Pass xi(n_mass_pairs, n_rbins) and get a 3D array (n_mass_pairs, n_mass_pairs, n_rbins)
- minimal_spline_interp(good_sims, ind_r)[source]
Use Cubic spline only for recovering random NaN values. No extrapolation is done here, if the NaN values are at the edges, they will be replaced by a small value of xi = 1e-4 and a very large uncertainty of 1e6. Parameters: ————– good_sims: np.ndarray
The indices of the simulations to consider. Simulation with too many NaN values should be removed beforehand. A list is provided in hard_coded_good_l2_sims() which is curated for highest mass bin and is applicable for lower mass bins as well.
- ind_r: np.ndarray
The indices of the r values to consider
Returns:
- interpolated_values: np.ndarray, shape=(len(good_sims), n_mass_pairs, len(ind_r))
The xi(r, n1, n2) values for the good simulations with NaN values replaced by interpolated values or a small value of 1e-4.
- corr_err: np.ndarray, shape=(len(good_sims), n_mass_pairs, len(ind_r))
The uncertainty estimates for the interpolated values
- remove_sims(sim_id)[source]
get the indices to the simulations to remove Parameters: ———– sim_id: list
List of simulation IDs to remove
Returns:
- keep_sims: np.ndarray
Indices of the simulations to keep
- spline_nan_interp(mass_pair, spline_s=0.01, rcut=(0.2, 61), alpha_bad=0.35)[source]
Fit univariate spline to xi(r) for a specific mass pair, handling NaN values.
- Parameters:
- Returns:
rbins (ndarray) – Radial bins within rcut range (Mpc/h)
log_corr (ndarray) – Log10 of correlation function values
eval_splines (ndarray) – Spline evaluations at each rbin, it is interpolated log10(xi(r))
bad_sims_mask (ndarray) – Mask of simulations with a fraction of less than alpha_bad valid points
- unconcatenate(corr, bins)[source]
Unconcatenate the large xi arrays used for the emulator :param corr: The correlation function :type corr: np.ndarray, shape=(n_sims, n_bins) :param bins: The bins of the correlation function in (m1, m2, r) format :type bins: np.ndarray, shape=(n_bins, 3)
- Returns:
corr – The correlation function for each mass pair
- Return type:
np.ndarray, shape=(n_sims, len(mass_pairs), len(rbins))
Summary statistics computations from simulation data.
single_fid Module
Train a single fidelity emualtor
- class gal_goku.single_fid.EvaluateSingleFid(X, Y, model_err, logging_level='INFO')[source]
Bases:
object- loo_train_pred(mbins, savefile=None, labels=None, sub_sample=None)[source]
Iterate over the samples to leave one out and train the model Parameters: ——- mbins: array
The projected bins radii that used for training
- savefile:
The file to save the results to
Returns:
mean_pred: the predictions for the left out samples while trained on the rest var_pred: the variance of the predictions
- class gal_goku.single_fid.EvaluateSingleFidMultiBins(X, Y, model_err, logging_level='INFO')[source]
Bases:
EvaluateSingleFidBuild a single fidelity emulator for each bin of the summary statistics
- class gal_goku.single_fid.SingleFid(X, Y, model_err=None, logging_level='INFO')[source]
Bases:
object- normalize(X, Y, model_err)[source]
Normalize the input data such it is between 0 and 1 We don’t normalize the output data as it the median of the stack could be 0. Returns: ——– X_normalized: normalized input data between 0 and 1 X_min: minimum value of the input data X_max: maximum value of the input data mean_func: the mean of the output to be used as the mean function in the GP model
Single-fidelity Gaussian Process emulator implementations.
multi_fid Module
Multi-fidelity Gaussian Process emulator implementations.
utils Module
- class gal_goku.utils.BayesianPiecewisePolynomialFitter[source]
Bases:
object- __init__()[source]
Initialize the fitter for piecewise quadratic polynomials with Bayesian inference.
- fit_piecewise(x, y, m)[source]
Fit piecewise quadratic polynomials to m consecutive bins using Bayesian inference.
Parameters: x (array): 1D array of x-coordinates. y (2D array): 2D array of y-coordinates with shape (num_examples, num_dimensions). m (int): Number of bins to group for fitting.
Returns: list: A list of lists containing posterior samples for polynomial coefficients. Outer list corresponds to dimensions.
- class gal_goku.utils.ConstrainedSplineFitter(degree=2, s=None, constraints=True, logging_level='INFO')[source]
Bases:
object- __init__(degree=2, s=None, constraints=True, logging_level='INFO')[source]
A Spline fitter which requires the second derivative to be all negative. Of particular use in fitting the halo mass function where we expect the slope to be decreasing with mass. THis is similar to what implemented in MiraTitanII arxiv:2003.12116 Parameters: ———– degree: int, default=2
Degree of the spline
- s: int, default=x.size, optional
Smoothing factor, refer to scipy.interpolate.generate_knots. This helps find the internal knots to achieve the desired smoothness.
- constraints: bool, default=True
If True, enforce the constraints on the second derivative to be negative
- fit_spline(x, y, knots, sigma=1)[source]
Fit a spline with constraints on the coefficients of the quadratic term.
Parameters: x (array): 1D array of x-coordinates. y (array): 1D array of y-coordinates. knots (array): Internal knot positions for the spline.
Returns: BSpline: A spline object with the specified constraints.
Utility functions for data processing and manipulation.
plot Module
Plotting utilities for visualization of results.
plot_gal Module
Specialized plotting functions for galaxy clustering visualizations.