## Problem

In most retail businesses, salespeople visit stores regularly for various purposes like product promotion,
market analysis, customer relationship building, and more.
However, manually planning these visits,
considering the geographical locations, frequency of visits, and available time,
often lead to inefficient routes, wasted time, and unnecessary travel expenses.

So let's define our problem here with a clear statement...

**Objective:** Create an optimized scheduling system for a sales team
spread across the country, attending various stores with different locations, frequency,
and time duration constraints. The goal is to maximize the number of stores visited,
minimize travel time, comply with working hours limits, and recommend locations for
new hires if not all stores can be visited.

## Data Available

These are the fixed inputs and known data that define the problem:

**Store Location**- in geographical coordinates;**Salesman Starting Location**- (usually homes) in geographical coordinates;**Store Frequency and Time Duration**- for attendance;**Travel Times and Distances**- required to travel between different locations (provided by API);**Working Hours Limit**- for each salesperson per week (including travel and attendance time).

## Decision Variables

These are the variables that we can control and change to solve the problem:

**Allocation of Salesmen**- How many and which stores are assigned to each salespeople;**Travel Routes**- The travel paths considering stores order to minimize travel time;**New Salesmen Locations**- Locations to relocate or hire new salespeople if necessary.

## Constraints and Assumptions

These are the limitations and rules that the solution must adhere to:

**Working Hours Limit**- Salespeople can't exceed a fixed number of working hours per week;**Salesmen Availability**- A fixed number of salespeople with known locations;**Store Requirements**- The specific frequency and time duration needed for each store;**Travel Constraints**- Consideration of real-world travel conditions, including traffic, routes, etc.

## Objective Function

This is the goal we're trying to achieve. Here, it's a multi-objective function:

**Minimize Travel Time**- Reduce the total travel time for salespeople;**Maximize Store Attendance**- Ensure that the maximum number of stores is attended by salespeople;**Optimize Hiring Locations**- If not possible to visit all stores, find the best locations to hire new salespeople or relocate idle ones.

## Proposed Solution

It's like trying to solve a complex jigsaw puzzle.

The solution I propose harnesses the power of data science to solve this problem efficiently.

**Data Preparation** -> **Clustering** -> **
Task Aggregation
** -> **Allocation Model**

## Data Preparation

The first step in any data science project is preparing the data. We cleaned the unnecessary columns and saved the missing data. We then disentangled correlated rows and added coordinates, which were then saved into a dictionary.

## Clustering

The next step involved running a clustering algorithm, which is like grouping similar things together. In this case, we grouped stores based on their locations to minimize travel distance. The clustering was done in two layers —an initial clustering throughout the whole country and an inner clustering inside the large clusters for task aggregation later.

The ideal number of clusters was determined using the Silhouette Coefficient, a measure of how similar an object is to its own cluster compared to other clusters.

## Task Aggregation

The clusters created were then used to group tasks into daily work packages based on the proximity of locations, creating 'Task units.' The goal here was to establish tasks within a defined limit of working hours. Taking the edge Door and successively adding the closest, based on a distance matrix between all, until reaches the limit.

## Allocation Model

Finally, the task allocation model was implemented using Google's OR-Tools, specifically the Minimum Cost Flow CP-SAT Solver. These tools helped to optimize the workforce allocation based on time and frequency.

## API Endpoints

#### View Clusters Positions

**GET /api/v1/view_clusters_positions/<division>**

Figure 1: Plot of the position of each clusters centroids.

#### View Tasks Cluster

GET /api/v1/view_tasks_cluster/<division>/<n_cluster>

Figure 1: Plot of the position of each clusters centroids.

#### View Allocation Cluster

GET /api/v1/view_allocation_cluster/<division>/<n_cluster>

Figure 1: Plot of the position of each clusters centroids.

#### View Allocation Cluster with Doors

GET /api/v1/view_allocation_cluster_with_doors/<division>/<n_cluster>

Figure 1: Plot of the position of each clusters centroids.

#### View Outliers

**GET /api/v1/view_outliers/<division>/<n_cluster>**

Figure 1: Plot of the position of Doors and STEs outside of US territory.

#### View Allocation Division

GET /api/v1/view_allocation_division/<division>/<n_cluster>

Figure 1: Plot of the whole division allocation throughout the territory.

## Results

We compared the new method with traditional ones using specific metrics like travel time, costs, and task completion rate. The new method outperformed the old ones, showcasing the power of data science in solving real-world problems.

Figure 1: .

Figure 1: .

Figure 1: .

Although this approach significantly improves workforce allocation, it's not without its limitations. Future improvements might include more sophisticated clustering techniques or evolutionary optimization algorithms.

In conclusion, the combination of clustering algorithms and Google's OR-Tools presents a robust solution to the complex problem of workforce allocation in sales. I hope that this method will encourage other data scientists to tackle similar problems in various industries.

## References

For readers interested in learning more about this topic, I've compiled a list of the references and additional resources below. Happy exploring!

Google Machine Learning: Clustering Algorithms

Google OR-Tools: A Comprehensive Guide

Google OR-Tools: Assignment as Minimum Cost Flow Problem

Google OR-Tools: CP-SAT Solver