gal_goku_sims Package
This package handles simulation data processing and computation of summary statistics.
Main Modules
hmf Module
Halo Mass Function computations from simulation catalogs.
Main Classes:
HMF
Computes the halo mass function from simulation halo catalogs.
Key functionality:
Read halo catalogs from simulations
Compute mass functions with proper binning
Handle multiple cosmologies and redshifts
Export results in standardized formats
Typical Usage:
from gal_goku_sims import hmf
# Initialize HMF computer
hmf_calc = hmf.HMF(
catalog_path='path/to/halos.hdf5',
box_size=1000.0, # Mpc/h
redshift=2.0
)
# Compute mass function
masses, phi = hmf_calc.compute()
xi Module
Correlation function computations from simulation data.
Main Classes:
HaloXi
Computes halo-halo correlation functions from simulation catalogs.
Key functionality:
Compute 2-point correlation functions
Support for mass-threshold samples
Jackknife error estimation
Parallel computation with MPI
Typical Usage:
from gal_goku_sims import xi
# Initialize correlation function computer
xi_calc = xi.HaloXi(
catalog_path='path/to/halos.hdf5',
box_size=1000.0, # Mpc/h
mass_threshold=1e12 # Msun/h
)
# Compute correlation function
r, xi_r = xi_calc.compute()
mpi_helper Module
- gal_goku_sims.mpi_helper.Allgatherv_helper(MPI, comm, data, data_type)[source]
Each rank should call this with data on that rank MPI : pass the mpi4py.MPI comm : The mpi communicator data : The 1D array on each rank. The size of data on each rank could be different. data_type: Type of each elemnt of data array
- gal_goku_sims.mpi_helper.distribute_array(comm, data)[source]
Distribute array “data” equally between ranks and return the laod for each rank individually.
- gal_goku_sims.mpi_helper.distribute_array_split_comm(size, color, data)[source]
Similar to distribute_array(), but useful for split communicator
- gal_goku_sims.mpi_helper.distribute_files(comm, fnames)[source]
Distribute a list of files among available ranks comm : MPI communicator fnames : a list of file names Returns : A list of files for each rank
- gal_goku_sims.mpi_helper.into_chunks(comm, length)[source]
Similar to distribute_array but returns the start and end indexes of all ranks. Use this if each rank needs to know the start and end index of all other ranks. Parameters: comm : MPI communicator length : The total length of the array to be distributed Returns: start, end : The start and end index of the array for each rank, sorted by rank number. If padding is not zero, the start am
MPI utilities for parallel processing of simulation data.
Key Functions:
Process distribution across MPI ranks
Collective operations for data gathering
Efficient parallel I/O
Error handling in MPI context
Typical Usage:
from gal_goku_sims import mpi_helper
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
# Distribute work across ranks
my_tasks = mpi_helper.distribute_tasks(
total_tasks=100,
rank=rank,
size=size
)
Data Formats
Halo Catalogs
Halo catalogs are expected in HDF5 format with the following structure:
halos.hdf5
├── mass # Halo masses [Msun/h]
├── pos # Positions [Mpc/h], shape (N, 3)
├── vel # Velocities [km/s], shape (N, 3)
└── metadata
├── box_size # Box size [Mpc/h]
├── redshift # Redshift
└── cosmology # Cosmological parameters
Correlation Functions
Correlation function outputs are saved in HDF5 format:
xi.hdf5
├── r # Separation bins [Mpc/h]
├── xi # Correlation function values
├── xi_err # Errors (if computed)
└── metadata
├── mass_threshold # Mass threshold [Msun/h]
├── redshift # Redshift
└── n_pairs # Number of pairs per bin
Performance Considerations
MPI Parallelization
For large datasets, use MPI parallelization:
mpirun -np 16 python compute_correlations.py
Memory Management
When working with large catalogs:
Use chunked reading with HDF5
Process data in batches
Clear memory explicitly with
delstatementsMonitor memory usage with
memory_profiler
Optimization Tips
Use pre-computed pair counts when possible
Cache frequently accessed data
Vectorize operations with NumPy
Profile code to identify bottlenecks