Deep Learning (Goodfellow et al)

Deep Learning (Goodfellow et al) - Chapter 3 review

After a long time starting my review with the third chapter!

This chapter jots down all necessary concepts of Probability and Information theory in regards to the future scope of the book. While probability theory is introduced as a mathematical framework for representing uncertain statements, the information theory is introduced as the quantifying concept for the uncertainty. For deeper understanding, an additional resources would be good to consult like Jaynes(2003).

Truly computer science does not require the study of uncertainty until the machines start to learn from data. Because data and learning processes have uncertainty. The authors describe about the source of uncertainty. There are three possible sources of uncertainty:

- Inherent stochasticity in the system.

- Incomplete Observability.

- Incomplete Modeling.

There is always scope for uncertainty in decision making. A simple uncertain rule is more practical than certain and a complex but certain one.

Probability theory was initially developed to analyse the frequency of the events. For cases like drawing a card, if an outcome has probability of p, then repeating the events for infinite time would result in a proportion of p that results in the outcome. In case of doctor diagnosing a patient for a disease, a probability of p would be the degree of belief that the disease may be in the patient. The former is introduced as frequentist probability, while the latter is termed as Bayesian probability. The need for probability was rightly summarised in a line. "Probability theory provides a set of formal rules for determining the likelihood of a proposition being true given the likelihood of other propositions."

Herd of concepts for probability are introduced. What are random variables what are the categories of their probability distribution. The conditions for a probability mass function and a probability density functions were introduced. Marginal probability and conditional probability were defined and also sum rule and product rules (chain rule for conditional probability) were listed.

Random variables are independent if their distribution can be represented as the product of the individual probabilities. Expectation, variance and covariance were defined. Common probability distributions and mixture of those distributions were described. Common functions associated with probability theory, their properties and use cases are discussed. Bayes rule is introduced.

Information theory is introduced as a branch of applied mathematics that is about quantifying the data/information present in a signal. Learning an occurrence of an unlikely event is more informative than learning an occurrence of a likely event. The books suggest ways to quantify the information.

- Likely events should have low information contents.

- Less likely events should have higher information content.

- Independent events should have additive informations.

A self-information was defined as the metrics to measure the amount of information. The uncertainty in the distribution can be measured using Shannon entropy. For different probability distributions the difference can be measured using Kullback-Leibler (KL) divergence.

The last pages of the chapter has a good cover on the structured probabilistic model which would come in handy in the future as the ML algorithms will involve probability distribution over very large number of random variables. Using a single function to represent the joint probability distribution would make the function complex. Instead it could be represented as a product of many factors. Schematic representing the factorization was referred as the structured probabilistic model.

And this wraps up the review for the chapter. The chapter covered a good amount of probability theory, a small yet focussed text on information theory and an introduction to the structured probabilistic model. The next review will release next week on the 4th chapter "Numerical Computation".

Data Science, Machine Learning and more..

Search This Blog

Deep Learning (Goodfellow et al) - Chapter 3 review

Comments

Post a Comment

Popular posts from this blog

CodeSignal - Almost Strictly Increasing Sequence