C Erlang Regression

Published on 20 March 2022 at 13:36

"The best thing to do is create a lagum number of processes. Erlang comes from Sweden. And the word lagum roughly translates means not too few. Not too many. Just about right. Some say this summarises the Swedish character" - Author Joe Armstrong

The above is a nice quote but about the Erlang programming language and not the original man that discovered the formula that we are about to discuss a Python implementation for. I have only recently come across the Erlang C formula but I thought it was so nifty and a bit of a revelation that I thought it would be good to share about it.

Erlang C is a formula created to calculate the number of staff needed in a call center. It was originally developed to determine the number of staff needed in a switching desk where a caller had to call a number and ask the advisor to patch them to the person they wanted to call. Developed by the Danish mathematician A.K. Erlang. The programming language that is his namesake is used to build massively scalable soft real-time systems with requirements on high availability in the telecoms industry.

PS I decided not to share the full code for this there is enough here if your a programmer should be able to recreate it from the below notes and some google searches but I wrote for someone else to help them and it feels dishonest to share code written for someone else with parts relevant to them. Therefore I share this blog as something I learned and if you are an analyst try to make an argument why you should be interested in Erlang C.

Something great about Erlang C is it makes explicit the relationships between the amount of staff and throughput for a work function and lets a business estimate staff requirements.

I am still working on my AI and found a few improvements but I think I am into the long twilight of finding lots of small improvements so I thought I would focus on other projects for the blog while I work on it in the background. But as a update I have got it to say some more words in testing but currently trialling different architectures.

Why I like it

I have done a lot of project planning in my time and usually when doing this type of planning for resources I have defaulted to calculating the number of staff needed by taking the total amount of work working out average throughput and dividing one by another.

You will notice this assumption is naïve in several ways. It assumes that the work is all available all at the start. In reality, most tasks come into the queue randomly there might be busy periods and you might get several jobs that come in all at once. Likewise when tasks come in it is unknown whether a worker or resource will be free or tied up.

What Erlang C does well is to factor in an estimation of the likelihood that a given worker will be randomly available when that random call arrives. It can be applied to similar tasks and a lot of work types can be analysed in this way.

This can be useful for analysts and project planners because it moves the problem of how to resource a task away from a simplified way of thinking where we look to resource a task only to complete the pot of work and can be used to discuss SLA or Service Level Agreement of given we can calculate the likelihood that a random staff will be available when a random job comes in we can estimate the number of staff needed to meet a given wait time.

Then as analysts, we can talk about the problem of how much staff is needed to do a task as trade-offs between the amount of staff needed versus the delays of understaffing the call center (or similar business function).

The Formula

The formula is a bit frightening but this is not a maths blog so let's set about splitting it up into bits that we can analyse. In my code E=A and M=N and X=((E**m)/m!)*(m/(m=E)) and Y = the part in brackets (I am not sure where on my keyboard I could find a sum of a series character). When X and Y is calculated the full sum is X/(X+Y).

It is more simple than you think see below for code.

Making It Code

The full code can be downloaded at the end of the article.

The inputs to the main function look like this. To use Erlang C you need the number of calls and the length of work time in hours this represents. It requires AHT_in_secs which stands for Average Handling Time which means the average time it takes to handle a task. From average handling time and in hours Erlang C will calculate the variable A (as in A in the above formula). This is the minimum amount of workers needed to do the task.

Shrinkage represents things like staff being off ill on breaks or other issues and lets us factor in a safety level of extra staff we might need as we calculate it. This is 0.3 below that means add 1/3 extra staff to account for people being sick, on holiday. This way we can factor in staff being off into our calculations upfront.

def erlangs(
no_calls=100,#number of calls
in_a_period_of_hours=0.5,#timeframe to normalise to one hour
AHT_in_secs=180,#average handling time
Required_service_level=0.80,#target service-level system will autograph the
Target_Answer_Time=20,#target time in seconds to answer,
Max_occupancy=0.85,#target occupancy, occupancy greater than 85% risks burn out as it requires staff to always be availaible
shrinkage=0.3):#peole on break, holiday etc

By repeating the calculations and each time finding the calculations less than desired Service Level which represents a percentage of calls answered immediately and if still below the target SLA increasing number of staff available we can work out when we will forecast to meet our targets. We will start using the below code to calculate the minimum amount of staff needed to just do the work. This is calculated below.

You can see the logic here start with the baseline of how well you perform currently and increase by 1 and recalculate all the different estimates and how they will impact the business.

no_calls/=in_a_period_of_hours#normallised per hour
call_minutes=no_calls*(AHT_in_secs/60)#work out time
call_intensity=call_minutes/60#erlangs in hour#
A=call_intensity
N=call_intensity+1

This means that we will start with N values where essentially SLA will be less than the current so we create some lists and fill with zeroes for the area of the graphs we will create where performance would be worse than currently.

SLA_per_person=[0]*int(N)
total_staff=[]
for i in range(int(N)):
total_staff.append(i+1)

We will initialize variables that we will use later.

probable_wait=0
service_level=0
average_speed_of_answer=0

occupancy_over_time=[0]*int(N)
immediate_answer_over_time=[0]*int(N)
average_speed_of_answer_time=[0]*int(N)
probable_wait_over_time=[0]*int(N)
minimium_staff=[0]*int(N)

AHT_impact=[]
AHT_Service=[]

The code then runs in a While loop. It utilizes two helper functions that will be described below. The code is largely split into two parts calculating the various values of N, A and then using them to calculate the two parts of the formula which I refer to as X and Y below.

With X and Y we can calculate the probable wait a given random caller at a random time will get. This assumes an Erlang distribution of the calls which is similar but sleight different to a Poisson distribution. You can see an Erlang distribution here on Wikipedia Erlang dist CDF - Erlang distribution - Wikipedia. But in short with calls in general you get an average handling time that represents the central tendency but has some long outliers.

Ever been waiting for an answer for several hours well the Erlang C formula assumes that happens and you have some horrendous wait time out there in the wild. This is why it can be used to calculate this type of task type much better.

From this use the Euler number and calculate service level. The average speed of the answer. The immediate answer, number of agents required, Staff needed with and without shrinkage (remember holiday time). Probable wait time and average speed of answer.

while service_level < Required_service_level:

N_FACTORIAL=factorial(N)
A_POWER_n=A**N
X=((A_POWER_n)/N_FACTORIAL)
X*=N/(N-A)
Y=series(A,N)

probable_wait=X/(Y+X)

service_level=1-(probable_wait*(2.71828**-((N-A)*(Target_Answer_Time/AHT_in_secs))))
average_speed_of_answer=(probable_wait*AHT_in_secs)/(N-call_intensity)

immediate_answer=(1-probable_wait)*100#percentage of calls answered immediately

occupancy=(call_intensity/A)*100

No_agents_required=N/(1-shrinkage)

Min_agents=A+1

SLA_per_person.append(service_level*100)

N+=1

total_staff.append(No_agents_required)

occupancy_over_time.append(occupancy)
immediate_answer_over_time.append(immediate_answer)
average_speed_of_answer_time.append(average_speed_of_answer)
probable_wait_over_time.append(probable_wait)
minimium_staff.append(N)

The two helper functions are factorial for calculating N! or A!.

def factorial(value):

a=1
for i in range(int(value)):
a=(i+1)*a
if a==0:

return 1

else:

return a

And what calculates Y using A and N.

def series(A,N):

test=0
for i in range(int(N)):

FACT=factorial(i)
POW=(A**i)

test+=POW/FACT
#print(test)
return test

Now was that as scary as the original formula? If yes we can make it even easier to understand from the below graphs. If no well done you can probably calculate Erlang C.

In Graphs

So this code will produce a set of lists that represents the relationships in the data and can then plot on graphs. So you can see the relationship look below.

You can see the relationship with number of staff to SLA (the percentage of calls answered within the target time) is non linear each staff you add improves SLA slightly less than the first.

The same with immediate answer. The inverse of immediate answer is calls lost so you can estimate

Conclusion

Erlang C is a useful method for call centers and similar tasks whereas analysts/planners need to plan for the amount of staff/resources available to complete a task. The Erlang B method can be used to estimate blocking and probable delays.

Where the Erlang methods fall is where a call center experiences sudden even random surges of callers it assumes that calls might come in bunched up but not that they will come in all at once and suddenly in huge bursts.

I find this quite interesting as a planning method and hopefully if you are a business analyst reading this or planning a project maybe you can steal an idea from telecoms and use this to plan for your project.

« Previous Multidimensional Graphs Starting With Procedural Generation Next »