Skip to main content

Deep Learning (Goodfellow et al) - Chapter 1 review

Chapter 1 - Introduction
 
    This is the introductory chapter to the book. Here the authors introduce deep learning, advises what the book is actually about and finally gives the path that deep learning  has traveled to make it to the current state.

Introduction to Deep Learning...

    They are clearly giving the picture that it is a book for intermediate. The motivation for data, the hype of Artificial intelligence and the societal impacts, which are typical of an introductory chapter of an introductory book is cut down to a very minimal level here. The book says that Artificial intelligence has solved many problems that are beyond human ability but the actual challenge of AI is solving the intuitive problems which humans do in their day to day life. And that the concentration of the book will be on solving these kinds of problem.

    Deep learning is introduced as hierarchy of concepts, where complicated concepts are built upon more simpler concepts. Computation is easy for computer and common sense is easy for human, but if the tasks are swapped both would turn to be a difficult problem. Common sense of human is because of their knowledge base they have inculcated. If this knowledge base has to be store in a computer and help in solving a problem it cannot accurately deal with unseen conditions. Cyc, an inference engine faced a similar problem. So knowledge of an AI system must not be based on inference rule to keep things simple.

    Systems must have an ability to build its own knowledge based on data. This is  termed as machine learning(ML). The book tells the importance of the form of representation of data to the system. Since the knowledge is based on the input to the system, the representation of this input influences our output. The feature has to be relevant to make clear decision and this may require a good expertise. Which feature to extract is a problem and this can be solved by representation learning. 
    This has proved to be more efficient than manual feature selection. This will save a huge time and effort. Then they explain the role of auto-encoder in representation learning. The features has to be selected in such a way that it separates the factors of variation of the observed data. These factors need not be those that are observed in the physical world. They can be a combination of features, a concept. Developing such a concept is not an easy task. A pixel in red car appears black in night. Most of such concept can be easy for human to understand but not for an AI system.

    Having proposed a difficulty in the representation learning they suggest Deep learning(DL) as the solution. It builds complex concepts on the top of simpler concepts. Now they dive into Deep learning and explains Multilayer Perceptrons(MLP).  MLP is a mathematical function mapping an input to an output. The function is complex function built on simpler functions. These simpler functions can be thought of those that does the representation that would be easier for solving the problem. They brief about the importance of sequential instructions,  how an instruction looks back at a previous instructions, the hidden layers, state information etc., 
    There are two ways of measuring the depth of a DL model. It can be the number of operations to compute from an input feature to an output target or it can be the number of concepts to reach the target. They have good set of ideas to back both. At the end of this section, by a venn diagram, the deep learning is shown to be a subset of machine learning, which in turn is shown to be a subset of AI.

The structure of the book...

    According to the authors, the book is for students and software engineers. The book has been organized into three parts,
  • Mathematical Tools and Machine Learning Concepts.
  • Established Deep Learning Algorithms
  • Deep Learning Research 
   The book has a schematic with the organization of the book and advice of when to learn what. It is advised for those good at math and basic ML concepts to skip the first section. Though i am good at both i would like to make a review on the first part and mean while i would get good skill at deep learning algorithms parallely and would be able to review the second part in future.

   In this section they mention some prerequisites for this book, again proving this is a book for intermediate. The readers are assumed to have a computer science background (programming, performance and basic calculus).

Trends in Deep Learning...
  
  The trend in Deep learning is explained as three waves in the book. The first wave 'Cybernetics' has more influence from Neuroscience and this was depriving during 1940-60. Attempts were to replicate the human brain leading to a name for deep learning model as Artificial Neural Network(ANN). The networks are similar to a neural network. The book is of the view that though deep learning has a strong root in Neuroscience, it need not be viewed from the point of view of neuroscience. 
    Linear models were prevalent during this period and Stochastic gradient decent algorithm was devised during this period. Their inability to learn non-linear models (like XOR ) were criticized. Also brain and the natural neural networks are very complex and aiming to replicate them would take more time. These reasons let to the decline of the first wave. According to them now there are separate fields for study of replicating brain(computational neuroscience) and for simple Artificial Neural Networks (deep learning).

   The second wave was called connectionism or parallel distributed processing. It was cognitive science leading the wave. Distributed representation (representing an input as many features), back propagation algorithm and LSTM (Long Term Short Memory) are the major milestones of this era. But there was a decline due to advances in other kernel based ML algorithm and graphical models. 

    The third wave emerged around 2006 and is continuing till date. Improvement in communication and setting up of dedicated research environments like CIFAR-NCAP was the propeller of this wave according to the authors.
   Increase in the data size with time, helped to get better results with DL networks as they are usually data hungry. They term the present as the age of big data and compares the size of some classical data sets over the last century with a schematic. Increase in model size also helped learning difficult tasks. Increase in model size was supported by increase in processor capabilities during this period. The advent of GPU, faster network and improvement in distributed computing helped accelerate the growth. Implementation of various networks in real world application inculcated an urge to increase the accuracy of networks. They site an example from Imagenet challenge. The Convolutional network (CNN) have given a very good accuracy in this case and it was due to imagenet challenge in 2012. Various examples like  speech recognition, pedestrian detection, machine translation, reinforcement learning based applications were also cited. 
     The application of neural networks in Tech companies and increasing number of packages in deep learning are also responsible for keeping the momentum of the third wave. Finally the authors conclude the chapter giving a hope of challenge filled future of the field. 

Endnote
This is the review of  chapter 1. Feedback are most welcomed! 
Review of next chapter "Applied Math and Machine Learning Basics" will be published on Feb 6th,2020. Hope to see you then...

Comments

Popular posts from this blog

CodeSignal - Almost Strictly Increasing Sequence

I came across an interesting question in Code Signal.   "Given a sequence of integers as an array, determine whether it is possible to obtain a strictly increasing sequence by removing no more than one element from the array."                                                                                                         It is strictly increasing if every element in the array is greater than its successor.     For a strictly increasing sequence we can check for each element whether it is greater than the next. In that case we can come to a conclusion that this sequence is not strictl...

Deep Learning (Goodfellow et al) - Chapter 3 review

 After a long time starting my review with the third chapter!    This chapter jots down all necessary concepts of Probability and Information theory in regards to  the future scope of the book. While probability theory is introduced as a mathematical framework for representing uncertain statements, the information theory is introduced as the quantifying concept for the uncertainty. For deeper understanding, an additional resources would be good to consult like Jaynes(2003).      Truly computer science does not require the study of uncertainty until the machines start to learn from data. Because data and learning processes have uncertainty. The authors describe about the source of uncertainty.  There are three possible sources of uncertainty: - Inherent stochasticity in the system. - Incomplete Observability. - Incomplete Modeling.                                  ...