Skip to content

Pandas Made Simple: Analyze Real-World Data Like a Pro

Pandas Python Made Simple Pandas Python Made Simple

Pandas for Data Analysis

Table of Contents

Why Pandas ?

Time line

time line time line

Pandas

Installation

$ rye init --virtual
$ rye add pandas
$ uv init .
$ uv pip install pandas
$ pip install pandas
$ conda install pandas

Dependencies

---
config:
  sankey:
    showValues: false
---

sankey-beta

%% source,target,value
Numpy, required, 30
python-dateutil, required, 30
pytz, required, 30
tzdata, required, 30



Visualization, matplotlib,  10
Visualization, Jinja2,  10
Visualization, tabulate,  10
optional, Visualization, 30

Performance, numexpr,  10
Performance, bottleneck,  10
Performance, numba,  10
optional, Performance, 30

Computation, Scipy,  10
Computation, xarray,  10
optional, Computation, 20

Excel Files, xlrd, 10
Excel Files, xlsxwriter, 10
Excel Files, openpyxl, 10
Excel Files, pyxlsb, 10
Excel Files, python-calamine, 10
optional, Excel Files , 50

BeautifulSoup4, HTML Files, 10
html5lib, HTML Files, 10
lxml, HTML Files, 10
HTML Files, optional, 30

lxml, XML, 10
XML, optional, 10

SQLAlchemy, PostgresSQL, 10
psycopg2, PostgresSQL, 10
adbc-driver-postgresql, PostgresSQL, 10
PostgresSQL, DataBase, 30

SQLAlchemy,MySQL,10
pymysql,MySQL,10
MySQL, DataBase, 20

SQLAlchemy, SQLite,10
adbc-driver-sqlite, SQLite, 10
SQLite, DataBase, 20

DataBase, optional, 70

required, pandas , 120
pandas, optional, 120

Required Dependencies

The list of packages gets installed along with Pandas for its operations.

kanban
required[Required Dependency]
  numpy[Numpy]@{ticket: Version, assigned: '1.224' }
  dateutil[python-dateutil]@{ticket: Version, assigned: '2.8.2' }
  pytz[pytz]@{ticket: Version, assigned: '2020.1' }
  tzdata[tzdatal]@{ticket: Version, assigned: '2022.7' }

Optional Dependencies

It has many optional dependencies, to improvise the performance, visualization, accessing particular API's or methods.

Performance

While working with large files, it is advised to install performance dependencies for pandas using

kanban
  [Performance]
    [numexpr]
    [bottleneck]
    [numba]
$ rye add pandas[performance]
$ uv pip install pandas[performance]
$ pip install pandas[performance]
$ conda install pandas[performance]

Visualization : Plotting & Formatting

For plotting graph using pandas api, optional dependency matplotlib can be installed along with pandas using pip-extra [plot] for visualization and for markdown and DataFrame styles using output-formatting.

kanban
  [Plot]
    [matplotlib]

  [Output-formatting]
    [Jinja2]
    [tabulate]
$ rye add pandas[plot, output-formatting]
$ uv pip install pandas[plot, output-formatting]
$ pip install pandas[plot, output-formatting]
$ conda install pandas[plot, output-formatting]

Computation

For N-dimensional data and statistical functions

kanban
  [Computation]
    [Scipy]
    [xarray]  
$ rye add pandas[computation]
$ uv pip install pandas[computation]
$ pip install pandas[computation]
$ conda install pandas[computation]

Excel files

To work with excel files, it necessary to install optional dependencies along with pandas.

kanban
  [Excel Files]
    [xlrd]@{ticket: reading Excel}
    [xlsxwriter]@{ticket: Writing Excel}
    [openpyxl]@{ticket: read/write xlsx files}
    [pyxlsb]@{ticket: reading xlsb files}
    [python-calamine]@{ticket: Reading for xls/xlsx/xlsb/ods files}

  [HTML Files]
    [BeautifulSoup4]@{ticket: requires lxml or html5lib}
    [html5lib]
    [lxml]

  [XML]
    [lxml]
$ rye add pandas[excel]
$ uv pip install pandas[excel]
$ pip install pandas[excel]
$ conda install pandas[excel]

HTML file

$ rye add pandas[html]
$ uv pip install pandas[html]
$ pip install pandas[html]
$ conda install pandas[html]

kanban
  [PostgresSQL]
    [SQLAlchemy]@{assigned: 'postgresql'}
    [psycopg2]@{assigned: 'postgresql'}
    [adbc-driver-postgresql]@{assigned: 'postgresql'}

  [MySQL]
    [SQLAlchemy]@{assigned: 'mysql'}
    [pymysql]@{assigned: 'mysql'}

  [SQLite]
    [SQLAlchemy]@{assigned: 'sql-other'}
    [adbc-driver-sqlite]@{assigned: 'sql-other'}