Gett, previously known as GetTaxi, is an Israeli-developed technology platform solely focused on corporate Ground Transportation Management (GTM). They have an application where clients can order taxis, and drivers can accept their rides (offers). At the moment, when the client clicks the Order button in the application, the matching system searches for the most relevant drivers and offers them the order. In this task, we would like to investigate some matching metrics for orders that did not completed successfully, i.e., the customer didn't end up getting a car.
Please complete the following tasks.
- Build up distribution of orders according to reasons for failure: cancellations before and after driver assignment, and reasons for order rejection. Analyse the resulting plot. Which category has the highest number of orders?
- Plot the distribution of failed orders by hours. Is there a trend that certain hours have an abnormally high proportion of one category or another? What hours are the biggest fails? How can this be explained?
- Plot the average time to cancellation with and without driver, by the hour. If there are any outliers in the data, it would be better to remove them. Can we draw any conclusions from this plot?
- Plot the distribution of average ETA by hours. How can this plot be explained?
- BONUS Hexagons. Using the h3 and folium packages, calculate how many sizes 8 hexes contain 80% of all orders from the original data sets and visualise the hexes, colouring them by the number of fails on the map.
We have two data sets: data_orders and data_offers, both being stored in a CSV format. The data_orders data set contains the following columns:
order_datetime
- time of the orderorigin_longitude
- longitude of the orderorigin_latitude
- latitude of the orderm_order_eta
- time before order arrivalorder_gk
- order numberorder_status_key
- status, an enumeration consisting of the following mapping:4
- cancelled by client,9
- cancelled by system, i.e., a reject
is_driver_assigned_key
- whether a driver has been assignedcancellation_time_in_seconds
- how many seconds passed before cancellation
The data_offers
data set is a simple map with 2 columns:
order_gk
- order number, associated with the same column from the orders data set`offer_id
- ID of an offer
- Create a Jupyter Notebook to finish tasks (and write your answer with figure for each quesion). Submit your code via a Pull Request (PR) back to the original repository.
- Export your Jupyter Notebook (.ipynb file) as a PDF (or write a new PDF file with 1-2 pages) and submit it via the Blackboard system.