Extraction From MySQL’s Binary Log

Most popular relational database management system (RDBMS) solutions keep a log file of all changes to the database, including changes due to data manipulation language (DML) statements such as INSERT, UPDATE, and DELETE, as well as data definition language (DDL) statements such as CREATE and DROP.

Each database calls this log file by a different name, but all serve the same purpose.

  • In Microsoft SQL Server, the log file is called the transaction log.

  • In Oracle, the log file is called the redolog.

  • In PostgreSQL, the log file is called the write-ahead log (WAL).

  • In MySQL, the log file is called the binary log or binlog. We’ll explore this log file in detail in this lesson.

Why do we need a log file?

These log files are useful in case of database crashes and help keep the database up to date. This allows in-memory changes that have not yet materialized on disk to be reconstructed from the log file in case of a crash.

The log file is also used for syncing clusters of distributed databases. Without going into too much depth, a common architecture choice for distributed databases is leader-follower architecture, where one database node is the leader and receives all the write requests from users. The rest of the nodes are read-only databases.

When the leader receives a write request and changes a record in its version of the database, it sends its log file to the other nodes so that they can perform the same actions as the leader. This way, the whole distributed database cluster is synced again.

Another usage of this log file is that it can be a great data source in ETL pipelines for extracting log activity data about the company.

Example: Extracting data from MySQL’s binary log

One way to extract log activity data from MySQL’s binlog is using Python’s BinLogStreamReader. For example, we’ll connect to MySQL’s binlog and extract the log activity data about the table customers.

The table had several insert and update queries applied to it as shown below.

Get hands-on with 1400+ tech skills courses.