Skip to main content

Deep Learning (Goodfellow et al) - Chapter 3 review

 After a long time starting my review with the third chapter!

   This chapter jots down all necessary concepts of Probability and Information theory in regards to  the future scope of the book. While probability theory is introduced as a mathematical framework for representing uncertain statements, the information theory is introduced as the quantifying concept for the uncertainty. For deeper understanding, an additional resources would be good to consult like Jaynes(2003). 

    Truly computer science does not require the study of uncertainty until the machines start to learn from data. Because data and learning processes have uncertainty. The authors describe about the source of uncertainty.  There are three possible sources of uncertainty:

- Inherent stochasticity in the system.

- Incomplete Observability.

- Incomplete Modeling.                                                                                                                                    


     There is always scope for uncertainty in decision making. A simple uncertain rule is more practical than certain and a complex but certain one.

     Probability theory was initially developed to analyse the frequency of the events.  For cases like drawing a card, if an outcome has probability of p, then repeating the events for infinite time would result in a proportion of p that results in the outcome. In case of doctor diagnosing a patient for  a disease, a probability of p would be the degree of belief that the disease may be in the patient. The former is introduced as frequentist probability, while the latter is termed as Bayesian probability. The need for probability was rightly summarised in a line. "Probability theory provides a set of formal rules for determining the likelihood of a proposition being true given the likelihood of other propositions."

    Herd of concepts for probability are introduced. What are random variables  what are the categories of their probability distribution.  The conditions for a probability mass function and a probability density functions were introduced. Marginal probability and conditional probability were defined and also sum rule and product rules (chain rule for conditional probability) were listed. 

     Random variables are independent if their distribution can be represented as the product of the individual probabilities. Expectation, variance and covariance were defined. Common probability distributions and mixture of those distributions were described. Common functions associated with probability theory, their properties and use cases are discussed.  Bayes rule is introduced.   

    Information theory is introduced as a branch of applied mathematics that is about quantifying the data/information present in a signal. Learning an occurrence of an unlikely event is more informative than learning an occurrence of a likely event.  The books suggest ways to quantify the information.

- Likely events should have low information contents.

- Less likely events should have higher information content.

- Independent events should have additive informations. 

  A self-information was defined as the metrics to measure the amount of information. The uncertainty in the distribution can be measured using Shannon entropy. For different probability distributions the difference can be measured using Kullback-Leibler (KL) divergence.  

  The last pages of the chapter has a good cover on the structured probabilistic model which would come in handy in the future as the ML algorithms will involve probability distribution over very large number of random variables. Using a single function to represent the joint probability distribution would make the function complex. Instead it could be represented as a  product of many factors. Schematic representing the factorization was referred as the structured probabilistic model.    

      And this wraps up the review for the chapter. The chapter covered a good amount of probability theory, a small yet focussed text on information theory and an introduction to the structured probabilistic model. The next review will release next week on the 4th chapter "Numerical Computation".                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

Comments

Popular posts from this blog

CodeSignal - Almost Strictly Increasing Sequence

I came across an interesting question in Code Signal.   "Given a sequence of integers as an array, determine whether it is possible to obtain a strictly increasing sequence by removing no more than one element from the array."                                                                                                         It is strictly increasing if every element in the array is greater than its successor.     For a strictly increasing sequence we can check for each element whether it is greater than the next. In that case we can come to a conclusion that this sequence is not strictly increasing. If every element is greater than the successor we get to know it is a strictly increasing.    For worst case(The sequence is strictly increasing), the algorithmic complexity is O(n).     If we use similar approach for the question, we have to remove each element and pass the sequence to the above function.      So for n elements, we use a fun

Deep Learning (Goodfellow et al) - Chapter 2 review

     This chapter is the first chapter of the part - 1. This part prepares you for Deep learning and in this chapter Linear Algebra is covered. The author advice you to skip the chapter if you are already familiar with the concepts. Also it is not that you can become Gilbert Strang after reading this chapter! The chapter is so concise that what may be relevant to our future acquaintance with the book are only present here. Only at the end of the chapter some concepts author speaks on the application part. The focus is on mathematical concepts involving Linear Algebra.       Though Science and Engineering use Linear Algebra efficiently, Computer scientist have lesser experience with it. This is the motivation of the chapter and if you have experience it then author provides you with a reference (The Matrix Cookbook - Petersen and Pedersen, 2006) and waves you good bye to meet in the next chapter.      Definitions of scalar, vector, matrix and tensor are listed one by one