Tech stack: SnapML, scikit-learn, Matplotlib, Seaborn, Pandas, NumPy.
Dataset used: TLC Yellow Taxi Trip Records of June 2019, link **here.** Sourced from the NYC Taxi and Limousine Commission (TLC).
Description: A Decision Tree Regressor by IBM’s SnapML library was trained on a real taxi trip dataset by the NYC TL Commission. The training features were chosen according to a visualised correlation matrix. The dataset was cleaned and the model was trained on over 2 million data samples in 0.636 seconds using multithreaded CPU/GPU acceleration.
Resulting accuracy metrics: