## Machine Learning Algorithms

**Introduction**

The advancements in Science and Technology are making every step of our daily life more comfortable. The use of Machine learning systems, which is an integral part of Artificial Intelligence, has spiked up and is seen playing a remarkable role in the user’s life.

For an instance, the Virtual Personal Assistant that is being used for playing a music track or setting an alarm, the face detection or a voice recognizing applications are the awesome examples of the machine learning systems that we see every day.

### What is Machine Learning?

Machine learning, a subset of the artificial intelligence, is an ability of a system to learn or predict the user’s needs and perform an expected task without the need of human intervention. The learnings or the inputs for the needed prediction are taken from user’s previous performed tasks or from relative examples.

### Types of Machine Learning Algorithms

Diving further into machine learning, we will firstly discuss about the types of algorithms it has. Machine learning algorithms can be classified as:

- Supervised
- Unsupervised

A brief description of the algorithms is given here under:

#### Supervised Machine Learning Algorithms

In this method, to get the output for a new set of user’s input, a model is trained to predict the results by using an old set of inputs and its relative known set of outputs. In other words, the system uses the previous examples given.

A data scientist trains the system on identifying the features and variables, the system should analyze. After training so, these models, compare the new results to the old results and update their data accordingly to improve the prediction pattern.

An example is if there is a basket full of fruits, based on the earlier specifications like color, shape and size given to the system, the model will be able to organize / classify the fruits.

There are 2 techniques in supervised machine learning and a technique to develop a model is chosen based on the type of data it has to work on. Supervised algorithms use either of the following techniques to develop a model based on the type of data.

#### Techniques used in Supervised learning

##### Regression Technique

- In a given dataset, this technique is used to predict a numeric value or continuous values (a range of numeric values) based on the relation between variables obtained from dataset.

- An example is guessing the price of a house based after a year, based on the current price, total area, locality and number of bedrooms.

- Another example is predicting the room temperature in the coming hours, based on volume of the room and current temperature.

##### Classification Technique

- This is used if the input data can be categorized based on patterns or labels.

- For example, an email classification like, recognizing a spam mail or face detection which uses patterns to predict the output.

In summary, regression technique is to be used when predictable data is in quantity and Classification technique is to be used when predictable data is about predicting a label.

### Algorithms that use Supervised Learning

Some of the machine learning algorithms which use supervised learning method are:

- Linear Regression
- Logistical Regression
- Random Forest
- Gradient Boosted Trees
- Support Vector Machines (SVM)
- Neural Networks
- Decision Trees
- Naive Bayes

#### Unsupervised Machine Learning Algorithms

This method does not involve in training the model based on old data, i.e. there is no “teacher” or “supervisor” to provide the model with previous examples.

The system is not trained by providing any set of inputs and relative outputs. Instead the model itself will learn and predict the output based on its own observations.

For example, again consider a basket of fruits which are not labeled/given any specifications this time. The model will only learn and organize them by comparing Color, Size and shape.

### Techniques used in unsupervised learning

The techniques used in unsupervised learning are as under:

#### Clustering

- It is the method of dividing or grouping the data in the given data set based on similarities.

- Data is explored to make a groups or subsets based on meaningful separations.

- Clustering is used to determine the intrinsic grouping among the unlabeled data present.

- An example where clustering principle is being used is in digital image processing where this technique plays its role in dividing the image into distinct regions and identifying image border and the object.

### Dimensionality Reduction

- In a given dataset, there can be multiple conditions based on which data has to be segmented or classified.

- These conditions are the features that individual data element has and may not be unique.

- If a dataset has too many number of such features, it makes it a complex process to segregate the data.

- To solve such type of complex scenarios, dimensional reduction technique can be used, which is a process that aims to reduce the number of variables or features in the given dataset without loss of important data.

- This is done by the process of feature selection or feature extraction.

- Email Classification can be considered as a best example where this technique is used.

### Anomaly Detection

- Anomaly detection also known as Outlier detection.

- In a given set of data, this algorithm identifies suspicious data, i.e. the data which differs from majority of others in the given input or data which are identified as some rare events or items among the available data

- Examples of the usage are identifying a structural defect, errors in text and medical problems.

**Neural Networks**

- Neural network is a framework for many different machine learning algorithms to work together and process complex data inputs.
- It can be thought as a “complex function” which gives some output when an input is given.
- The Neural Network consists of 3 parts which are needed in construction of the model.
- Units or Neurons
- Connections or Parameters.
- Biases

Neural networks are into a wide range of applications such as coastal engineering, hydrology and medicine where they are being used in identifying certain types of cancers.

### Algorithms that use unsupervised learning

Some of the most common algorithms in unsupervised learning are:

- Hierarchical Clustering,
- k-means
- mixture models
- DBSCAN
- OPTICS algorithm
- Autoencoders
- Deep Belief Nets
- Hebbian Learning
- Generative Adversarial Networks
- Self-organizing map

### Machine Learning Algorithms – Explained

So far we have seen the basics of machine learning algorithms, in this section let us discuss each of the machine learning algorithms in detail with suitable examples:

#### Decision Trees

- This algorithm is an example of supervised learning.
- Decision tree is a pictorial representation or a graphical representation which depicts every possible outcome of a decision.
- The various elements involved here are node, branch and leaf where ‘node’ represents an ‘attribute’, ‘branch’ representing a ‘decision’ and ‘leaf’ representing an ‘outcome’ of the feature after applying that particular decision.
- A decision tree is just an analogy of how a human thinks to take a decision with yes/no questions.
- The below decision tree explains a school admission procedure rule, where Age is primarily checked, and if age is < 5, admission is not given to them. And for the kids who are eligible for admission, a check is performed on Annual income of parents where if it is < 3 L p.a. he students are further eligible to get concession on the fees.

### Naive Bayes Classification

- This supervised machine learning algorithm is a powerful and fast classifying algorithm, using the Bayes rule in determining the conditional probability and to predict the results.
- Its popular uses are, face recognition, filtering spam mails, predicting the user inputs in chat by checking communicated text to and to label news articles as sports, politics etc.
- Bayes Rule: The Bayes theorem defines a rule in determining the probability of occurrence of an “Event” when information about “Tests” is provided.
- “Event” can be considered as the patient having a Heart disease while “tests” are the positive conditions that match with the event

### The Autoencoder

- It comes under the category of unsupervised learning using neural networking technique.
- An autoencoder is intended to learn or encode a representation for a given data set
- This also involves the process of dimensional reduction which trains the network to remove the “noise” signal.
- In hand with the reduction it also works in reconstruction where the model tries to rebuild or generate a representation from the reduced encoding which is an equivalent to the original input.
- Without the loss of important and needed information from the given input, an Autoencoder removes or ignores the unnecessary noise and also works on rebuilding the output.
- Most common use of Autoencoder is an application that converts black and white image to color. Based on the content and object in the image (like grass, water, sky, face, dress) coloring is processed.

### Self-Organizing Map

- This stands under unsupervised learning method.
- Self-organizing Map uses data visualization technique by operating on a given high dimensional data.
- It reduces the dimensions of the data to a map, representing the clustering concept by grouping similar data together.
- SOM reduces data dimensions and displays similarities among data.
- SOM uses clustering technique on data without knowing the class memberships of the input data where several units compete for the current object.
- The Self Organizing Map is a two-dimensional array of neurons:
- § M = {m1,m2,……mn}
- In short, SOM breaks down the statistical relationships which are nonlinear and complex into a low dimensional visualization.
- SOM have laid a powerful mark in modeling of topological maps

### Hierarchical clustering

- Hierarchical clustering algorithm has its own clustering techniques which it uses to construct a tree from a given data set. This generated tree is called as “Dendogram”
- Dendogram is the tree depicting a hierarchy of clusters created from matrices of similar elements of the input data points.

The methods used to construct hierarchical clusters are:

- Agglomerative clustering
- Divisive clustering

#### Agglomerative Clustering:

This method uses the bottom-up technique.

- Here, a data point’s own cluster is the beginning point to construct the similarity matrix.
- Similar clusters are identified from thus formed matrices or clusters are joined or merged together using the greedy method forming a hierarchical tree.

#### Divisive clustering:

- Inverse to Agglomerative, top-down technique is used by this method.
- Here all the data points in the given input begin in one same cluster.
- Next, any clustering algorithm like DBSCAN or K-Means or Gaussian Mixture models is used to further divide the cluster into 2 different clusters.
- An iteration cluster division is continued till the model hits desired number of clusters.
- Both of these approaches rely on constructing a similarity matrix between all of the data points.

### Optics Algorithm

- OPTICS is an abbreviation for ordering points to identify the clustering structure.
- OPTICS works in principle like such an extended DB Scan algorithm for an infinite number for a distance parameter which is smaller than a generating distance.
- From a wide range of parameter settings, OPTICS outputs a linear list of all objects under analysis in clusters based on their density.

In the present scenario where most of the tasks are performed automatically, thus, the definition of manual task is gradually changing. With the advancement in technology, we can foresee the advancement coming in the upcoming days. Machine learning is considered as a sub-division of artificial intelligence. It is basically the process that helps the computer perform their designated functions efficiently. It can help the computers play chess, perform surgeries and even can function depending on your personal requirement. One of the revolutionary changes is to witness how these computer tools and techniques have been regularized. So if you are a machine learning newbie, it is important for you to understand the basic machine learning algorithms that are used by data scientists.

### Top 10 Machine Learning Algorithms

- Linear Regression
- Logistic Regression
- Decision tree
- Support Vector Machine
- Naive Bayes
- KNN (K-Nearest neighbor)
- K-Means
- Random Forest
- Dimensionality reduction algorithms
- Gradient boosting and AdaBoost

#### Algorithm 1: Linear Regression

Linear Regression – In this particular algorithm, you can understand its function by visually placing the items in increasing order of their weight and eventually you reach on a conclusion. This is how linear regression works.

#### Algorithm 2: Logistic Regression

Logistic Regression – In this methodology, the most discrete value is estimated based on the set of predictor variables. It helps you predict an occurrence by putting in the data to a logit function. The different methods to improve this logistic regression model are:

- Technique of regularization
- Use of non-linear model
- Addition of interaction terms
- Feature of elimination

#### Algotithm 3: Decision Tree

Decision Tree – This particular algorithm is popular among data scientists for identifying and differentiating the diverse issues. The support tool makes use of several trees lie graphs for making decisions, event outcomes, resource costs, and utilities. Then the population is split into two different homogenous set based on independent variables.

#### Algorithm 4: Support Vector Machine

Support Vector Machine – You can easily classify raw data with the help of this methodology in any dimensional area. Then the values of the individual coordinates are tied to a particular coordinate making it easy to differentiate the diverse sets of data.

#### Algorithm 5: Naive Bayes

Naive Bayes – This algorithm functions differently from the others as it performs based on the assumption that a particular characteristic is independent of the other one. Incase, there is an interrelation between the two features the classifier will calculate the properties individually while reaching the outcome.

#### Algorithm 6: KNN (K-Nearest neighbor)

KNN (K-Nearest neighbor) -This particular algorithm is related to the process of regression however, it can be applied for both classification and for sorting problems In the world of data Science, it is more often used in solving the classification problems. It is an easy form of algorithm, in which it is capable of storing all the relevant available cases and classifies a new case by taking a maximum vote of its k neighbours.

#### Algorithm 7: K-Means

K-Means – This particular algorithm is best used for solving the clustering problems. The individual data sets are arranged in such a way so that the data points in the cluster are homogeneous or heterogeneous in nature from the data present in other clusters.

#### Algorithm 8: Random Forest

Random Forest – It is one of the popular forms of machine learning algorithms that classifies a new object based on its qualities. Collectively the decision trees are called the random forest. Here, each individual tree is sorted and then the tree vote for that particular class. Then the forest chooses the categorization based on maximum votes.

#### Algorithm 9: Dimensionality Reduction Algorithms

Dimensionality reduction algorithms – In several organizations such as government bodies, and other research organizations, large amount of data are analyzed and stored. So, with the help of dimensionality reduction algorithms such as factor analysis, decision tree, mission value, and random forest, all the important and relevant details can be gathered easily without any hassle.

#### Algorithm 10: Gradient Boosting and Ada Boost

Gradient boosting and Ada Boost- These algorithms are used for handling a large amount of data with maximum accuracy and speed. Boosting is a process of improving the power and efficiency combining the power of the base estimators. In other words, it is a combination of all the weak and average predictors to build a strong one.

Here mentioned are just the basics of machine learning algorithms. If you want to make a career in the diverse areas machine learning such as data mining, data labeling, it is important to delve deep and gauge a better understanding of all the machine learning tools.

### Where do we stand in Machine Learning?

The role of Machine learning and Artificial Intelligence in human life became intertwined. With the advent of evolving technologies, AI and ML have marked their existence in all possible aspects. The following are a few examples that support my above statements.

- Financial Services – In identifying financial fraud, identifying good options for investment
- Marketing and Sales – In making personalized product recommendations.
- Healthcare – In detecting health condition, heartbeat, blood pressure and in identifying certain types of cancer.
- Software Applications – Face detection, voice recognition and personal virtual assistants
- Oil and Gas – In analyzing underground minerals

#### Future of Machine Learning

- ML has already made an inseparable mark in our lives. With more advancement in various fields ML will be an integral part of all AI systems.
- ML algorithms are going to be made continuously learning with the day-to-day updating information.
- Digital marketing, education, healthcare and support, industrial, science and research, earth explorations and studies, customer appliances design – machine learning has an immense potential to prove its mettle and bring out the best for the customers in the very coming future.