Taming randomness in ML models with hypothesis testing and marimo