Pandas Made Simple: Analyze Real-World Data Like a Pro
Pandas for Data Analysis
Table of Contents
Why Pandas ?
Time line
Pandas
Installation
Dependencies
---
config:
sankey:
showValues: false
---
sankey-beta
%% source,target,value
Numpy, required, 30
python-dateutil, required, 30
pytz, required, 30
tzdata, required, 30
Visualization, matplotlib, 10
Visualization, Jinja2, 10
Visualization, tabulate, 10
optional, Visualization, 30
Performance, numexpr, 10
Performance, bottleneck, 10
Performance, numba, 10
optional, Performance, 30
Computation, Scipy, 10
Computation, xarray, 10
optional, Computation, 20
Excel Files, xlrd, 10
Excel Files, xlsxwriter, 10
Excel Files, openpyxl, 10
Excel Files, pyxlsb, 10
Excel Files, python-calamine, 10
optional, Excel Files , 50
BeautifulSoup4, HTML Files, 10
html5lib, HTML Files, 10
lxml, HTML Files, 10
HTML Files, optional, 30
lxml, XML, 10
XML, optional, 10
SQLAlchemy, PostgresSQL, 10
psycopg2, PostgresSQL, 10
adbc-driver-postgresql, PostgresSQL, 10
PostgresSQL, DataBase, 30
SQLAlchemy,MySQL,10
pymysql,MySQL,10
MySQL, DataBase, 20
SQLAlchemy, SQLite,10
adbc-driver-sqlite, SQLite, 10
SQLite, DataBase, 20
DataBase, optional, 70
required, pandas , 120
pandas, optional, 120
Required Dependencies
The list of packages gets installed along with Pandas for its operations.
kanban
required[Required Dependency]
numpy[Numpy]@{ticket: Version, assigned: '1.224' }
dateutil[python-dateutil]@{ticket: Version, assigned: '2.8.2' }
pytz[pytz]@{ticket: Version, assigned: '2020.1' }
tzdata[tzdatal]@{ticket: Version, assigned: '2022.7' }
Optional Dependencies
It has many optional dependencies, to improvise the performance, visualization, accessing particular API's or methods.
Performance
While working with large files, it is advised to install performance dependencies for pandas using
kanban
[Performance]
[numexpr]
[bottleneck]
[numba]
Visualization : Plotting & Formatting
For plotting graph using pandas api, optional dependency matplotlib
can be installed along with pandas using pip-extra [plot
] for visualization and for markdown and DataFrame styles using output-formatting
.
kanban
[Plot]
[matplotlib]
[Output-formatting]
[Jinja2]
[tabulate]
Computation
For N-dimensional data and statistical functions
kanban
[Computation]
[Scipy]
[xarray]
Excel files
To work with excel files, it necessary to install optional dependencies along with pandas.
kanban
[Excel Files]
[xlrd]@{ticket: reading Excel}
[xlsxwriter]@{ticket: Writing Excel}
[openpyxl]@{ticket: read/write xlsx files}
[pyxlsb]@{ticket: reading xlsb files}
[python-calamine]@{ticket: Reading for xls/xlsx/xlsb/ods files}
[HTML Files]
[BeautifulSoup4]@{ticket: requires lxml or html5lib}
[html5lib]
[lxml]
[XML]
[lxml]
HTML file
kanban
[PostgresSQL]
[SQLAlchemy]@{assigned: 'postgresql'}
[psycopg2]@{assigned: 'postgresql'}
[adbc-driver-postgresql]@{assigned: 'postgresql'}
[MySQL]
[SQLAlchemy]@{assigned: 'mysql'}
[pymysql]@{assigned: 'mysql'}
[SQLite]
[SQLAlchemy]@{assigned: 'sql-other'}
[adbc-driver-sqlite]@{assigned: 'sql-other'}