Following Article is a list of some terminologies and explanations in software development with a focus on Data Science.
Module:
Module:
What can we do to reuse a function or a class that we have already written? Suppose we write a function to find Euclidean norm and the least square error while writing code for linear regression. We may need the same while writing code for K-nearest neighbors.
For the purpose of reusing the codes we can write such sharable functions in a single file. Lets save it as a 'lin_alg.py' and import it while we are writing code for linear regression and KNN.
import lin_alg
lin_alg.least_sq_er(y,y_pred)
Package:
We can have a module to reuse code. But how much will it contain? There may be 100-200 functions or more used. We have two problems here, first is the space required to store every function. If we want to just calculate the euclidean distance and we import all mathematical functions, this may consume huge space in RAM. The second problem is locating the file. If we have 200 functions or classes in a single file, then it would be analogous to books without catalogue in a district library. It would be difficult to reach one of the functions or classes.
A good solution to this problem could be package. A package is a file that contains a list of modules or another sub-package.
A single Module can be imported by
import mod1
A Module from a package can be imported by
from pkg import mod1
import pkg.mod1
By saving and importing a package of modules like this, we can reduce the memory space. Here we only load the module we want. We cannot have a lot of modules or we cannot have many objects/functions in a single module. Also we can easily locate the required function from the module in which it is present.
We can store a list of distance metrics in a file 'dist.py' and list of error metrics in a 'error.py'. We can create a package starting with '__init__.py'.
lin_alg
__init__.py
dist.py
error.py
Using Least square error
import lin_alg
lin_alg.error.least_sq_er(y,y_pred)
or
from lin_alg import error
error.least_sq_wr(y,y_pred)
or
from lin_alg.error import least_sq_error
least_sq_wr(y,y_pred)
Library:
Library is a generic term given to a package that fulfils a particular functionality. We can say pandas is a library for data analysis and manipulation.
Framework:
Framework is a package of packages that aids you in creating something. scikit-learn is a framework for creating machine learning models. It has necessarily tools for building a machine learning model like preprocessing, feature_selection, metrics etc.,
API:
Set of codes, protocols and functions to work and interact with software application. When you want to wrangle a data, every time you cannot go to the interface and enter your query manually. Suppose you want to join two data using a website. You may have to input the data into the websites interface and extract the output. What if we have a huge amount of data to be joint. If we have a key to access and use the feature in our code this problem could suffice. This is why API is for. Google Map APIs are used for cab-rental services, Real estate etc.,
Web crawler and scrapper:
Suppose a website/third party does not provide you an API to access their website but you still want to automate the process of extracting data. Then we can use a web scrapper. Crawlers extracts the indices of the data. We are actually using a crawler in our day-to-day life 'google search engine'.
Platform:
Set of codes, protocols and functions to work and interact with software application. When you want to wrangle a data, every time you cannot go to the interface and enter your query manually. Suppose you want to join two data using a website. You may have to input the data into the websites interface and extract the output. What if we have a huge amount of data to be joint. If we have a key to access and use the feature in our code this problem could suffice. This is why API is for. Google Map APIs are used for cab-rental services, Real estate etc.,
Web crawler and scrapper:
Suppose a website/third party does not provide you an API to access their website but you still want to automate the process of extracting data. Then we can use a web scrapper. Crawlers extracts the indices of the data. We are actually using a crawler in our day-to-day life 'google search engine'.
Platform:
Platform can be stated as a combination of hardware and software. Knowing the platform would be better in creating a framework or some functions(relating to OS). Tensorflow is a platform which can be used with different hardware, software combinations.
PS:
If we are going to analyze every terminologies then it will never end. One can read more about SDK, containers, software and more.. but they can be done when you have a need to. This article is just an overview of some terminologies and often the projects may not be categorized into a single category. They may offer a high level API, low level API and also offer packages.
Some frameworks can be used as a library. We can use sklearn for just its test_train_split.
Ref:
https://stackoverflow.com/questions/19198166/whats-the-difference-between-a-module-and-a-library-in-python
https://www.quora.com/What-is-the-difference-between-Python-modules-packages-libraries-and-frameworks
https://towardsdatascience.com/what-is-api-and-how-to-use-youtube-api-65525744f520
https://stackoverflow.com/questions/25028243/what-is-the-difference-between-a-framework-and-a-platform
Comments
Post a Comment