SQL databases and pandas

Structured information is typically stored as tables within relational databases. Structured Query Language (SQL) is then used to interact with these databases—users can write SQL queries to execute a variety of CRUD (Create, Read, Update, Delete) functions. In particular, the Read function is used to retrieve specific information from these tables.

Although there are different types of relational databases (each with its API), the pandas.io.sql module offers SQL query wrappers to facilitate data retrieval from tabular formats. The pandas.io.sql module also reduces dependency on database-specific API.

Drivers for connecting to SQL

Besides using SQLAlchemy, a Python tool kit that gives us SQL’s full power and flexibility, we also need an SQL driver to serve as a database adaptor for accessing databases of different SQL dialect types. Depending on the database type, different driver libraries are needed. Some examples include:

  • psycopg2 for PostgreSQL

  • PyMySQL for MySQL

  • python-oracledb for Oracle SQL

  • pyodbc for Microsoft SQL Server

For this lesson, we’ll work with MySQL because it’s one of the most commonly used databases. This means that the driver library that serves as our MySQL engine is PyMySQL.

Get hands-on with 1300+ tech skills courses.