Predicting the food delivery time
- Machine learning models are mathematical equations that take inputs, called predictors, and try to estimate some future output value, called outcome.
\underset{outcome}{Y} \leftarrow f(\underset{predictors}{X_1,\ldots,X_p})
For example, we want to predict how long it takes to deliver food ordered from a restaurant.
The outcome is the time from the initial order (in minutes).
There are multiple predictors, including:
- the distance from the restaurant to the delivery location,
- the date/time of the order,
- which items were included in the order.
Food Delivery Time Data
- The data are tabular, where the 31 variables (1 outcome + 30 predictors) are arranged in columns and the the n=10012 observations in rows:
time_to_delivery hour day distance item_01 item_02 item_03 item_04 item_27
1 16.1106 11.899 Thu 5.069421 0 0 2 0 0
2 22.9466 19.230 Tue 5.938465 0 0 0 0 0
3 30.2882 18.374 Fri 3.315240 0 0 0 0 0
4 33.4266 15.836 Thu 9.607760 0 0 0 0 1
5 27.2255 19.619 Fri 4.055537 0 0 0 1 1
6 19.6459 12.952 Sat 5.391289 1 0 0 1 0
- Note that the predictor values are known. For future data, the outcome is unknown; it is a machine learning model’s job to predict unknown outcome values.
Outcome Y
Predictor X_1
Regression function
A machine learning model has a defined mathematical prediction equation, called regression function f(\cdot), defining exactly how the predictors X_1,\ldots,X_n relate to the outcome Y: Y \approx f(X_1,\ldots,X_p)
Here is a simple example of regression function: the linear model with a single predictor (the distance X_1) and two unknown parameters \beta_0 and \beta_1 that have been estimated:
\begin{aligned}
Y &\approx \hat{\beta}_0 + \hat{\beta}_1 X_1\\
\\
delivery\,\,time &\approx 17.557 + 1.781\,\times \,distance
\end{aligned}
Predictor X_2
Regression plane
With two predictors, say X_1 and X_2, the linear regression function becomes a plane in the three-dimensional space: Y \approx \hat\beta_0 + \hat\beta_1 X_1 + \hat\beta_2 X_2.
The fitted regression function is delivery\,\,time \approx −11.17 + 1.76 \times distance + 1.77 \times order\,\,time
If an order were placed at 12:00 with a distance of 7 km, the predicted delivery time would be −11.17 + 1.76 \times 7 + 1.77 \times 12 = 22.4 minutes.
Regression spline (non-parametric)
Flexibility versus Interpretability
![]()
Figure 2.7 (ISL). Representation of the tradeoff between flexibility and interpretability across statistical learning methods. In general, as flexibility increases, interpretability decreases.
Regression tree
A different regression function
\begin{align}
delivery \,\, time \approx \:&17\times\, I\left(\,order\,\,time < 13 \text{ hours } \right) + \notag \\
\:&22\times\, I\left(\,13\leq \, order\,\,time < 15 \text{ hours } \right) + \notag \\
\:&28\times\, I\left(\,order\,\,time \geq 15 \text{ hours and }distance < 6.4 \text{ kilometers }\right) + \notag \\
\:&36\times\, I\left(\,order\,\,time \geq 15 \text{ hours and }distance \geq 6.4 \text{ kilometers }\right)\notag
\end{align}
Partition of the predictors space (X_1,X_2)
Predictor X_3
![]()
Figure 2.2 in Kuhn, M and Johnson, K (2023) Applied Machine Learning for Tabular Data. https://aml4td.org/