Computational Complexity of ML algorithms

Prashant
Analytics Vidhya
Published in
6 min readFeb 4, 2021

--

Time and space complexity plays very important role while selecting machine learning algorithm

Space complexity: space complexity of an algorithm denotes the total space used or needed by the algorithm for its working , for various input size. In simple words space it requires to complete the task.

Time complexity: The time complexity is the number of operations on algorithm perform to complete its task with respect to input size. In simple words time it requires to complete the task.

In this blog we will see train and run time complexity of ml algorithms .

k-nearest neighbors (KNN)

Assumption : the knn algorithm assumes that similar things exist in close proximity .In other words similar things are near to each other .

“BIRDS OF FEATHER FLOCK TOGETHER”

As knn doesn’t have training phase explicitly , So knn doesn’t learn any discriminative function in training stage it just calculate distances and sort the data .this is also the reason knn is also called as lazy learner. In test stage given a query point (xq) it calculates the distance with all the points in dataset and then it select the k nearest neighbors ,then according to majority vote it has given class label.

KNN

Knn has two approach :

1] brute force knn\regualar knn

2]kd tree knn

Now here ,

n= number of data points in dataset

d=dimensions

k= number nearest neighbours

1] brute force knn:

Train time complexity: O(knd)

Train space complexity :O(nd)

Test\Run time complexity :O(knd)

Test\Run space complexity :O(nd)

2]kd tree knn:

Train time complexity: O(n d log(n))

Train space complexity :O(nd)

Test\Run time complexity :O(k log(n) d)

Test\Run space complexity :O(nd)

Naive bayes:

Assumption: features are conditional independent of each other.

Let we have k classes. c1,c2,c3,…,ck and we have ’n’ data points. x1,x2,x3,……,xn

In training phase we compute all the likelihood and class probability

now in test phase given a query point xq whichever class have maximum probability for that query point then that query point (xq) belongs to that class.

n= number of data points in dataset

d=dimensions

c= number of classes

Train time complexity: O(n d c)

Train space complexity :O(d c )

Test\Run time complexity:O(d c)

Test\Run space complexity:O( d c)

Logistic regression:

Assumption : Classes are almost/perfectly linearly separable

In logistic regression we use sigmoid function.

sigmoid function

Where f(x) is sigmoid function.Values of sigmoid function lies between 0 and 1 .We use sigmoid function because it is not outlier prone and it has very nice probabilistic interpretation.

It learn discriminative function in training stage. In training phase we learn weights associated with each feature. In test stage let xq is our query point then

Train time complexity : O(nd)

Train space complexity : O(nd)

Test\Run time complexity : O(d)

Test\Run space complexity :O(d)

linear regression:

Assumption: there is linear relationship between input variable and output variable.

Here also It learn discriminative function in training stage. In training phase we learn weights associated with each feature.

Train time complexity: O(nd)

Train space complexity : O(nd)

Test\Run time complexity : O(d)

Test\Run space complexity : O(d)

Support vector machine(SVM):

Assumption: Classes are almost/perfectly linearly separable

Here we use idea of margin maximizing hyper plane. We will learn the alpha (α) values corresponding to each data point and then use the data points which has alpha > 0 as support vector. so only points matter that matters are the support vectors, that’s why it is named as support vector machine

If data points are not linearly separable then we use kernel trick which is just like feature transformation where we convert data points to higher dimensions implicitly , then we can separate the data points linearly

K= number of support vectors

So in train time we calculate support vectors and there corresponding alpha values .So in test time let xq is query point then

Train time complexity: O(n**2 d**2)

Train space complexity :O(n d)

Test\Run time complexity:O( k d)

Test\Run space complexity:O( k d)

Decision tree:

In decision tree in training stage we build decision tree . Here we select feature according to the feature which have highest information gain or feature which have highest reduction in entropy or gini impurity.

Train time complexity: O(n logn d)

Train space complexity : O( number of nodes)

Test\Run time complexity :O(log n )

Test\Run space complexity :O( number of nodes)

depth of tree = log n

number of nodes = 2n- 1

let n=8

let n=4

Ensemble models:

In machine learning ensemble basically means when we have multiple model been combined together/used together into a more powerful model.

1]Bagging : also called as bootstrapped aggregation

i.e. Random forest

· Random forest:

In random forest we want decision trees to be have low bias and variance which means we want our tress to be overfitting .

i.e. decision tree of full or high depth which is going to be have high variance.

K=number of trees

Train time complexity: O(k n logn d )

Train space complexity :O(#nodes k)

Test\Run time complexity :O(k logn )

Test\Run space complexity :O(#nodes k)

2]Boosting:

i.e.Gradient boosting, adaboost, xgboost

· Gradient boosted decision tree(GBDT):

In gbdt we want our model to be high bias and low variance which means we want decision tree to be underfitting.

i.e. decision tree shallow depth

M=number of tress

Gamma m = output values for each leaf in decision trees

Train time complexity: O(M n logn d)

Train space complexity :O(#nodes M + gamma m)

Test\Run time complexity :O(M logn)

Test\Run space complexity :O(# nodes * M + gamma m)

In general gbdt takes more time than random forest because in gbdt we build next tree based on error or residual of previous tree so it cant be parallelized as compared to random forest.

Reference:

--

--