Skip to main content

Some Terminologies - Module, Package, Framework, API,..

    Following Article is a list of some terminologies and explanations in software development with a focus on Data Science.

Module: 

    What can we do to reuse a function or a class that we have already written? Suppose we write a function to find Euclidean norm and the least square error while writing code for linear regression. We may need the same while writing code for K-nearest neighbors. 
    For the purpose of reusing the codes we can write such sharable functions in a single file. Lets save it as a 'lin_alg.py' and import it while we are writing code for linear regression and KNN. 

import lin_alg
lin_alg.least_sq_er(y,y_pred) 
Package: 

    We can have a module to reuse code. But how much will it contain? There may be 100-200 functions or more used. We have two problems here, first is the space required to store every function. If we want to just calculate the euclidean distance and we import all mathematical functions, this may consume huge space in RAM. The second problem is locating the file. If we have 200 functions or classes in a single file, then it would be analogous to books without catalogue in a district library. It would be difficult to reach one of the functions or classes. 
    A good solution to this problem could be package. A package is a file that contains a list of modules or another sub-package.

A single Module can be imported by 

import mod1

A Module from a  package can be imported by 

from pkg import mod1
import pkg.mod1

    By saving and importing a package of modules like this, we can reduce the memory space. Here we only load the module we want. We cannot have a lot of modules or we cannot have many objects/functions in a single module. Also we can easily locate the required function from the module in which it is present. 

    We can store a list of distance metrics in a file 'dist.py' and list of error metrics in a 'error.py'. We can create a package starting with '__init__.py'. 

lin_alg
    __init__.py
    dist.py
    error.py

Using Least square error

import lin_alg
lin_alg.error.least_sq_er(y,y_pred) 

or 

from lin_alg import error
error.least_sq_wr(y,y_pred)


or 

from lin_alg.error import least_sq_error
least_sq_wr(y,y_pred)

Library:

    Library is a generic term given to a package that fulfils a particular functionality. We can say pandas is a library for data analysis and manipulation. 

Framework:

    Framework is a package of packages that aids you in creating something. scikit-learn is a framework for creating machine learning models. It has necessarily tools for building a machine learning model like preprocessing, feature_selection, metrics etc.,  

API:

    Set of codes, protocols and functions to work and interact with software application. When you want to wrangle a data, every time you cannot  go to the interface and enter your query manually. Suppose you want to join two data using a website. You may have to input the data into the websites interface and extract the output. What if we have a huge amount of data to be joint. If we have a key to access and use the feature in our code this problem could suffice. This is why API is for. Google Map APIs are used for cab-rental services, Real estate etc.,

Web crawler and scrapper:

    Suppose a website/third party does not provide you an API to access their website but you still want to automate the process of extracting data. Then we can use a web scrapper. Crawlers extracts the indices of the data. We are actually using a crawler in our day-to-day life 'google search engine'.


Platform:

    Platform can be stated as a combination of hardware and software. Knowing the platform would be better in creating a framework or some functions(relating to OS). Tensorflow is a platform which can be used with different hardware, software combinations.
      

PS:
   If we are going to analyze every terminologies then it will never end. One can read more about SDK, containers, software and more.. but they can be done  when you have a need to. This article is just an overview of some terminologies and often the projects may not be categorized into a single category. They may offer a high level API, low level API and also offer packages.
Some frameworks can be used as a library. We can use sklearn for just its test_train_split. 
 
Ref: 

https://stackoverflow.com/questions/19198166/whats-the-difference-between-a-module-and-a-library-in-python

https://www.quora.com/What-is-the-difference-between-Python-modules-packages-libraries-and-frameworks

https://towardsdatascience.com/what-is-api-and-how-to-use-youtube-api-65525744f520

https://stackoverflow.com/questions/25028243/what-is-the-difference-between-a-framework-and-a-platform

Comments

Popular posts from this blog

CodeSignal - Almost Strictly Increasing Sequence

I came across an interesting question in Code Signal.   "Given a sequence of integers as an array, determine whether it is possible to obtain a strictly increasing sequence by removing no more than one element from the array."                                                                                                         It is strictly increasing if every element in the array is greater than its successor.     For a strictly increasing sequence we can check for each element whether it is greater than the next. In that case we can come to a conclusion that this sequence is not strictly increasing. If every element is greater than the successor we get to know it is a strictly increasing.    For worst case(The sequence is strictly increasing), the algorithmic complexity is O(n).     If we use similar approach for the question, we have to remove each element and pass the sequence to the above function.      So for n elements, we use a fun

Deep Learning (Goodfellow et al) - Chapter 2 review

     This chapter is the first chapter of the part - 1. This part prepares you for Deep learning and in this chapter Linear Algebra is covered. The author advice you to skip the chapter if you are already familiar with the concepts. Also it is not that you can become Gilbert Strang after reading this chapter! The chapter is so concise that what may be relevant to our future acquaintance with the book are only present here. Only at the end of the chapter some concepts author speaks on the application part. The focus is on mathematical concepts involving Linear Algebra.       Though Science and Engineering use Linear Algebra efficiently, Computer scientist have lesser experience with it. This is the motivation of the chapter and if you have experience it then author provides you with a reference (The Matrix Cookbook - Petersen and Pedersen, 2006) and waves you good bye to meet in the next chapter.      Definitions of scalar, vector, matrix and tensor are listed one by one