Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.
A subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers; but not all machine learning is statistical learning. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning. Some implementations of machine learning use data and neural networks in a way that mimics the working of a biological brain. In its application across business problems, machine learning is also referred to as predictive analytics.
Self-learning as a machine learning paradigm was introduced in 1982 along with a neural network capable of self-learning named crossbar adaptive array (CAA).
It is a learning with no external rewards and no external teacher advice. The CAA self-learning algorithm computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about consequence situations. The system is driven by the interaction between cognition and emotion.
The self-learning algorithm updates a memory matrix W =||w(a,s)|| such that in each iteration executes the following machine learning routine:
As a scientific endeavor, machine learning grew out of the quest for artificial intelligence. In the early days of AI as an academic discipline, some researchers were
interested in having machines learn from data. They attempted to approach the problem with various symbolic methods, as well as what was then termed "neural networks"; these were mostly perceptrons and other models that were later found to be reinventions of the generalized linear models of statistics.
Probabilistic reasoning was also employed, especially in automated medical diagnosis. : 488
However, an increasing emphasis on the logical, knowledge-based approach caused a rift between AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation. : 488 By 1980, expert systems had come to dominate AI, and statistics was out of favor.
Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval. : 708–710, 755
Neural networks research had been abandoned by AI and computer science around the same time. This line, too, was continued outside the AI/CS field, as "connectionism", by researchers from other disciplines including Hopfield, Rumelhart and Hinton. Their main success came in the mid-1980s with the reinvention of backpropagation. : 25
Machine learning (ML), reorganized as a separate field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics and probability theory.[
The difference between ML and AI is frequently misunderstood. ML learns and predicts based on passive observations, whereas AI implies an agent interacting with the environment to learn and take actions that maximize its chance of successfully achieving its goals.
As of 2020, many sources continue to assert that ML remains a subfield of AI. Others have the view that not all ML is part of AI, but only an 'intelligent subset' of ML should be considered AI.
Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of
(previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner
accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning,
performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge.
Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.
Machine learning also has intimate ties to optimization: many learning problems are formulated as minimization of some loss function on a training set of examples. Loss functions express the discrepancy between the predictions of the model being trained
and the actual problem instances (for example, in classification, one wants to assign a label to instances, and models are trained to correctly predict the pre-assigned labels of a set of examples).
The difference between optimization and machine learning arises from the goal of generalization: while optimization algorithms can minimize the loss on a training set,
machine learning is concerned with minimizing the loss on unseen samples. Characterizing the generalization of various learning algorithms is an active topic of current research, especially for deep learning algorithms.
Machine learning and statistics are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population inferences from a sample, while machine learning finds generalizable predictive patterns.
According to Michael I. Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.
He also suggested the term data science as a placeholder to call the overall field.
Leo Breiman distinguished two statistical modeling paradigms: data model and algorithmic model,
wherein "algorithmic model" means more or less the machine learning algorithms like Random forest.
Some statisticians have adopted methods from machine learning, leading to a combined field that they call statistical learning
statistical learning
It is a system with only one input, situation s, and only one output, action (or behavior) a. There is neither a separate reinforcement input nor an advice input from the environment. The backpropagated value (secondary reinforcement) is the emotion toward the consequence situation. The CAA exists in two environments, one is the behavioral environment where it behaves, and the other is the genetic environment, wherefrom it initially and only once receives initial emotions about situations to be encountered in the behavioral environment. After receiving the genome (species) vector from the genetic environment, the CAA learns a goal-seeking behavior, in an environment that contains both desirable and undesirable situations.
Several learning algorithms aim at discovering better representations of the inputs provided during training. Classic examples include principal components analysis and cluster analysis. Feature learning algorithms, also called representation learning algorithms, often attempt to preserve the information in their input but also transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions. This technique allows reconstruction of the inputs coming from the unknown data-generating distribution, while not being necessarily faithful to configurations that are implausible under that distribution. This replaces manual feature engineering, and allows a machine to both learn the features and use them to perform a specific task.
Feature learning can be either supervised or unsupervised. In supervised feature learning, features are learned using labeled input data. Examples include artificial neural networks, multilayer perceptrons, and supervised dictionary learning. In unsupervised feature learning, features are learned with unlabeled input data. Examples include dictionary learning, independent component analysis, autoencoders, matrix factorization and various forms of clustering.
Manifold learning algorithms attempt to do so under the constraint that the learned representation is low-dimensional. Sparse coding algorithms attempt to do so under the constraint that the learned representation is sparse, meaning that the mathematical model has many zeros. Multilinear subspace learning algorithms aim to learn low-dimensional representations directly from tensor representations for multidimensional data, without reshaping them into higher-dimensional vectors. Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.
Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensory data has not yielded to attempts to algorithmically define specific features. An alternative is to discover such features or representations through examination, without relying on explicit algorithms.
Sparse dictionary learning is a feature learning method where a training example is represented as a linear combination of basis functions, and is assumed to be a sparse matrix. The method is strongly NP-hard and difficult to solve approximately. A popular heuristic method for sparse dictionary learning is the K-SVD algorithm. Sparse dictionary learning has been applied in several contexts. In classification, the problem is to determine the class to which a previously unseen training example belongs. For a dictionary where each class has already been built, a new training example is associated with the class that is best sparsely represented by the corresponding dictionary. Sparse dictionary learning has also been applied in image de-noising. The key idea is that a clean image patch can be sparsely represented by an image dictionary, but the noise cannot.
In data mining, anomaly detection, also known as outlier detection, is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically, the anomalous items represent an issue such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are referred to as outliers, novelties, noise, deviations and exceptions.
In particular, in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts of inactivity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular, unsupervised algorithms) will fail on such data unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro-clusters formed by these patterns.
Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal, by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherently unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set and then test the likelihood of a test instance to be generated by the model.
Robot learning is inspired by a multitude of machine learning methods, starting from supervised learning, reinforcement learning, and finally meta-learning (e.g. MAML).
The first three habits surround moving from dependence to independence
(self-mastery):
Proactivity is about taking responsibility for one's reaction to one's own experiences, taking the initiative to respond positively and improve the situation. Covey discusses recognizing one's circle of influence and circle of concern. Covey discusses focusing one's responses and focusing on the center of one's influence.
Covey discusses envisioning what one wants in the future (a personal mission statement) so one can work and plan towards it, and understanding how people make important life decisions. To be effective one needs to act based on principles and constantly review one's mission statements, says Covey. Covey asks: Are you—right now—who you want to be? What do you have to say about yourself? How do you want to be remembered? If habit 1 advises changing one's life to act and be proactive, habit 2 advises that "you are the programmer". Grow and stay humble, Covey says.
Covey says that all things are created twice: Before one acts, one should act in one's mind first. Before creating something, measure twice. Do not just act; think first: Is this how I want it to go, and are these the correct consequences?
Covey talks about what is important versus what is urgent. Priority should be given in the following order (in brackets are the corresponding actions from the Eisenhower matrix, which Dwight D. Eisenhower attributed to a former college president):
The order is important, says Covey: after completing items in quadrant I, people should spend the majority of their time on II, but many people spend too much time in III and IV. The calls to delegate and eliminate are reminders of their relative priority.
If habit 2 advises that "you are the programmer", habit 3 advises: "write the program, become a leader". Keep personal integrity by minimizing the difference between what you say versus what you do, says Covey.
The next three habits talk about interdependence
(e.g., working with others):
Seek mutually beneficial win–win solutions or agreements in your relationships, says Covey. Valuing and respecting people by seeking a "win" for all is ultimately a better long-term resolution than if only one person in the situation had gotten their way. Thinking win–win isn't about being nice, nor is it a quick-fix technique; it is a character-based code for human interaction and collaboration, says Covey.
Use empathetic listening to genuinely understand a person, which compels them to reciprocate the listening and take an open mind to be influenced. This creates an atmosphere of caring, and positive problem-solving.
Habit 5 is expressed in the ancient Greek philosophy of three modes of persuasion:
The order of the concepts indicates their relative importance, says Covey.
Combine the strengths of people through positive teamwork, so as to achieve goals that no one could have done alone, Covey exhorts.
The final habit is that of continuous improvement in both the personal and interpersonal spheres of influence.
Covey says that one should balance and renew one's resources, energy, and health to create a sustainable, long-term, effective lifestyle. He primarily emphasizes exercise for physical renewal, good (meditation, yoga, etc.), and good reading for mental renewal. He also mentions service to society for spiritual renewal.
Covey explains the "upward spiral" model. Through conscience, along with meaningful and consistent progress, an upward spiral will result in growth, change, and constant improvement. In essence, one is always attempting to integrate and master the principles outlined in The 7 Habits at progressively higher levels at each iteration. Subsequent development on any habit will render a different experience and one will learn the principles with a deeper understanding. The upward spiral model consists of three parts: learn, commit, do. According to Covey, one must be increasingly educating the conscience in order to grow and develop on the upward spiral. The idea of renewal by education will propel one along the path of personal freedom, security, wisdom, and power, says Covey.