husniakbarblog.blogspot.com: Data Science - Tools Used In Data Science

Data Science

Tools Used In Data Science

Overview Of Numpy

Data scientists use many different tools to analyse and synthesise data.

Python is increasingly being used as a scientific language.

Matrix and vector manipulations are extremely important for scientific Computations.

Both NumPy and Pandas have emerged to be essential Ilibraries for any scientific computation.

NumPy stands for 'Numerical Python' or 'Numeric Python'.

It is an open source module of Python which provides fast mathematical computation on arrays and matrices.

Numpy can be imported into the notebook using

>>> import numpy as np

NumPy's main object is the homogeneous multidimensional array.

It is a table with same type elements, i.e, integers or string or characters (homogeneous), usually integers.

In NumPy, dimensions are called axes.

The number of axes is called the rank.

There are several ways to create an array in NumPy like np.array, np.zeros, no.ones, etc.

Each of them provides some flexibility.

Some of the important attributes of a NumPy object are:

- Ndim: displays the dimension of the array

- Shape: returns a tuple of integers indicating the size of the array

- Size: returns the total number of elements in the NumPy array

- Dtype: returns the type of elements in the array, i.e. int64, character

- Itemsize: returns the size in bytes of each item

- Reshape: Reshapes the NumPy array

Numpy has many functions useful in statistical analysis, Iike:

Mean
Median
Standard deviation

So we saw how Numpy helps us in data analysis.

It's really fun once you begin using it.

Let's check your progress.

Numpy stands for

Select the correct answer

A. number pie

B. numerical pie

C. number python

D. numeric python

Answer : D.

Which statement allows us to import Numpy into the notebook?

Select the correct answer

A. >>> import numpy as np

B. >>> import numpy as nb

C. >>> import numpy as nv

D. >>> import numpy as npp

Answer : A.

Overview Of Pandas

Similar to NumPy, Pandas is one of the most widely used python libraries in data science.

It provides high-performance, easy to use structures and data analysis tools.

Unlike NumPy library which provides objects for multi-dimensional arrays,

Pandas provides in-memory 2d table object called Dataframe.

It is like a spreadsheet with column names and row labels.

Hence, with 2d tables, pandas is capable of providing many additional functionalities like creating pivot tables, computing columns based on other columns and plotting graphs.

Pandas can be imported into Python using

>>> import pandas as pd

Some commonly used data structures in pandas are:

🔹 Series objects: 1D array, similar to a column in a spreadsheet

🔹 DataFrame objects: 2D table, similar to a spreadsheet

🔹 Panel objects: Dictionary of DataFrames, similar to sheet in MS Excel Each row is provided with an index and by default is assigned numerical values starting from 0.

Like NumPy, Pandas also provides the basic mathematical functionalities like addition, subtraction and conditional operations and broadcasting.

Pandas dataframe object represents a spreadsheet with cell values, column names, and row index labels.

Dataframe can be visualized as dictionaries of Series.

Dataframe rows and columns are simple and intuitive to access.

Pandas also provide SQL-like functionality to filter, sort rows based on conditions.

New columns and rows can be easily added to the dataframe.

In addition to the basic functionality, pandas dataframe can be sorted by a particular column.

Dataframes can also be easily exported and imported from CSV, Excel, JSON, HTML and SQL database.

Some other essential methods that are present in Dataframes are:

1. head(): returns the top 5 rows in the dataframe object

2. tail(): returns the bottom 5 rows in the dataframe

3. info(): prints the summary of the dataframe

4. describe(): gives a nice overview of the main aggregated values over each column

Data Science

husniakbarblog.blogspot.com

AsegGasiaBlog

Data Science - Tools Used In Data Science

Popular Posts

Label

Ads

Ad Code

Python