Skip to main content

Deep Learning (Goodfellow et al) - Chapter 3 review

 After a long time starting my review with the third chapter!

   This chapter jots down all necessary concepts of Probability and Information theory in regards to  the future scope of the book. While probability theory is introduced as a mathematical framework for representing uncertain statements, the information theory is introduced as the quantifying concept for the uncertainty. For deeper understanding, an additional resources would be good to consult like Jaynes(2003). 

    Truly computer science does not require the study of uncertainty until the machines start to learn from data. Because data and learning processes have uncertainty. The authors describe about the source of uncertainty.  There are three possible sources of uncertainty:

- Inherent stochasticity in the system.

- Incomplete Observability.

- Incomplete Modeling.                                                                                                                                    


     There is always scope for uncertainty in decision making. A simple uncertain rule is more practical than certain and a complex but certain one.

     Probability theory was initially developed to analyse the frequency of the events.  For cases like drawing a card, if an outcome has probability of p, then repeating the events for infinite time would result in a proportion of p that results in the outcome. In case of doctor diagnosing a patient for  a disease, a probability of p would be the degree of belief that the disease may be in the patient. The former is introduced as frequentist probability, while the latter is termed as Bayesian probability. The need for probability was rightly summarised in a line. "Probability theory provides a set of formal rules for determining the likelihood of a proposition being true given the likelihood of other propositions."

    Herd of concepts for probability are introduced. What are random variables  what are the categories of their probability distribution.  The conditions for a probability mass function and a probability density functions were introduced. Marginal probability and conditional probability were defined and also sum rule and product rules (chain rule for conditional probability) were listed. 

     Random variables are independent if their distribution can be represented as the product of the individual probabilities. Expectation, variance and covariance were defined. Common probability distributions and mixture of those distributions were described. Common functions associated with probability theory, their properties and use cases are discussed.  Bayes rule is introduced.   

    Information theory is introduced as a branch of applied mathematics that is about quantifying the data/information present in a signal. Learning an occurrence of an unlikely event is more informative than learning an occurrence of a likely event.  The books suggest ways to quantify the information.

- Likely events should have low information contents.

- Less likely events should have higher information content.

- Independent events should have additive informations. 

  A self-information was defined as the metrics to measure the amount of information. The uncertainty in the distribution can be measured using Shannon entropy. For different probability distributions the difference can be measured using Kullback-Leibler (KL) divergence.  

  The last pages of the chapter has a good cover on the structured probabilistic model which would come in handy in the future as the ML algorithms will involve probability distribution over very large number of random variables. Using a single function to represent the joint probability distribution would make the function complex. Instead it could be represented as a  product of many factors. Schematic representing the factorization was referred as the structured probabilistic model.    

      And this wraps up the review for the chapter. The chapter covered a good amount of probability theory, a small yet focussed text on information theory and an introduction to the structured probabilistic model. The next review will release next week on the 4th chapter "Numerical Computation".                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

Comments

Popular posts from this blog

CodeSignal - Almost Strictly Increasing Sequence

I came across an interesting question in Code Signal.   "Given a sequence of integers as an array, determine whether it is possible to obtain a strictly increasing sequence by removing no more than one element from the array."                                                                                                         It is strictly increasing if every element in the array is greater than its successor.     For a strictly increasing sequence we can check for each element whether it is greater than the next. In that case we can come to a conclusion that this sequence is not strictl...