Data Science, Machine Learning and more..

Posts

Showing posts from 2020

366DaysofDataScience Catalogue

No Day Date Topic link_category Link lag 1 2 10/11/19 some errors in SQL 2/6 github https://github.com/viswanathanc/SQL-for-Data-Science 1 2 14 10/23/19 Matplolib – Visualization kaggle https://www.kaggle.com/viswanathanc/beginner-to-intermediate-matplotlib-visualizing 12 3 22 10/31/19 Role of EDA in Model Building kaggle https://www.kaggle.com/viswanathanc/role-of-eda-in-model-building 19 4 27 11/05/19 Beginner to intermediate matplotlib visualization kaggle https://www.kaggle.com/viswanathanc/beginner-to-intermediate-matplotlib-visualizing 23 5 32 11/10/19 bunch to dictionary github https://github.com/viswanathanc/basic_python 27 6 33 11/11/19 Stratified sampling kaggle https://www.kaggle.com/viswanathanc/stratifiedshufflesplit-working-with-less-data?scriptVersionId=23291002 27 7 33 11/11/19 ...

Deep Learning (Goodfellow et al) - Chapter 2 review

This chapter is the first chapter of the part - 1. This part prepares you for Deep learning and in this chapter Linear Algebra is covered. The author advice you to skip the chapter if you are already familiar with the concepts. Also it is not that you can become Gilbert Strang after reading this chapter! The chapter is so concise that what may be relevant to our future acquaintance with the book are only present here. Only at the end of the chapter some concepts author speaks on the application part. The focus is on mathematical concepts involving Linear Algebra. Though Science and Engineering use Linear Algebra efficiently, Computer scientist have lesser experience with it. This is the motivation of the chapter and if you have experience it then author provides you with a reference (The Matrix Cookbook - Petersen and Pedersen, 2006) and waves you good bye to meet in the next chapter. Definitions ...

Some Terminologies - Module, Package, Framework, API,..

Following Article is a list of some terminologies and explanations in software development with a focus on Data Science. Module: What can we do to reuse a function or a class that we have already written? Suppose we write a function to find Euclidean norm and the least square error while writing code for linear regression. We may need the same while writing code for K-nearest neighbors. For the purpose of reusing the codes we can write such sharable functions in a single file. Lets save it as a 'lin_alg.py' and import it while we are writing code for linear regression and KNN. import lin_alg lin_alg.least_sq_er(y,y_pred) Package: We can have a module to reuse code. But how much will it contain? There may be 100-200 functions or more used. We have two problems here, first is the space required to store every function. If we want to just calculate the euclidean ...

CodeSignal - Almost Strictly Increasing Sequence

I came across an interesting question in Code Signal. "Given a sequence of integers as an array, determine whether it is possible to obtain a strictly increasing sequence by removing no more than one element from the array." It is strictly increasing if every element in the array is greater than its successor. For a strictly increasing sequence we can check for each element whether it is greater than the next. In that case we can come to a conclusion that this sequence is not strictl...

Deep Learning (Goodfellow et al) - Chapter 1 review

Chapter 1 - Introduction This is the introductory chapter to the book. Here the authors introduce deep learning, advises what the book is actually about and finally gives the path that deep learning has traveled to make it to the current state. Introduction to Deep Learning... They are clearly giving the picture that it is a book for intermediate. The motivation for data, the hype of Artificial intelligence and the societal impacts, which are typical of an introductory chapter of an introductory book is cut down to a very minimal level here. The book says that Artificial intelligence has solved many problems that are beyond human ability but the actual challenge of AI is solving the intuitive problems which humans do in their day to day life. And that the concentration of the book will be on solving these kinds of problem. Deep learning is introduced as hierarchy of concepts, where complicated conc...

Deep Learning (Goodfellow et al) Book Review

I bought a Deep Learning textbook this week! Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville. I bought it after a very thorough research on the contents and previous reviews. And actually it is a good choice. This book has developed to be one of the fundamental textbooks of Deep learning and is the latest of them. I am excited about the book when i started reading it. But i feel reading is very lonely, i want to say what is actually exciting. I want to help some one who wants to know what the book is about. So i thought of writing a chapter wise review of the book. I plan to release review of each chapter on Fridays starting today(31/01/2019). I will keep updating this post with the links to the review of the chapter. This is my first experience in technical book review and i am hoping to do my best. Chapter No Name Link Published/Scheduled Date 1 Introduction https:...

Why use CNN instead of normal Feed forward Network?

While we can use a simple Feed Forward Network, we use Convolutional Neural Network... What is the need for such a network? I took a picture of me with my mobile camera which has 12 MP resolution ( Poco F1). I imported it as numpy array. Number of pixel in that image was 2756160. I have nearly 3 million pixel! If i use a Feed Forward Network it means i have 3 million features! I will require more number of data. A 10 million photos will satisfy my need. Also this would require a large memory and long time for training! We can reduce the dimension? PCA? PCA of 3 million X 10 million will still be a great problem. Moreover we cannot get 10 million photos. We have to train better with small amount of data. How else can we reduce the dimension? Feature extraction is a good option, we can have multiple filters for a image and use the results of filter as the feature. This significantly reduces our data. 128 filters...

Enhancing the Python codes...

Once i came across a problem in Hacker Rank, i solved it. But when i had to revisit the problem, i was not able to read my code. The way the code was written made it very unpleasant to read and understand. Then, I had a spark to better enhance my code. And when i came across PEP-8 ( Python Enhancement Proposal) i thought it is a better enhancement available and followed widely. This article will be about discussion on PEP - 8. And this is a summary of the documentation available at https://www.python.org/dev/peps/pep-0008/ Indentation: - Use 4 spaces per Indentation level. - In case a long line has to be continued in the next line, the continuation line should align wrapped elements either vertically using python's implicit line joining inside parenthesis or use hanging indent. - In hanging indent, there should be no argument in the first line and further indentation has to be used to di...