Understanding Deep Learning Applications in Rare Event Prediction/

...

Preparing and Analyzing Time Series Data with LSTM Models

Discover how to prepare and analyze time series data with LSTMs, from importing libraries to data temporalization and scaling.

We'll cover the following...

Imports and data
Temporalizing the data
Data splitting
Scaling temporalized data

In this lesson, we explore time series analysis using Long Short-Term Memory (LSTM) models. We begin by importing necessary libraries, including TensorFlow for LSTM functionalities and user-defined libraries for data preprocessing and visualization. Our primary focus is on temporalizing the data, a crucial step for LSTM modeling. We’ll explore the significance of temporalization and proceed to split and scale the data, setting the foundation for building robust time series models.

Imports and data

We get started with importing the libraries. LSTM related classes are taken from the TensorFlow library. Also, the user-defined libraries, e.g., datapreprocessing, performancemetrics, and simpleplots, are imported.

Press + to interact

Python 3.8

import pandas as pd 
import numpy as np
import tensorflow as tf
from tensorflow.keras import optimizers
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input 
from tensorflow.keras.layers import Dense 
from tensorflow.keras.layers import Dropout 
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Bidirectional
from tensorflow.python.keras import backend as K 
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split 
import matplotlib.pyplot as plt
import seaborn as sns
# user-defined libraries
import datapreprocessing as dp 
import performancemetrics as pm 
import simpleplots as sp
from numpy.random import seed 
seed (1)
SEED = 123 # used to help randomly select the data points
DATA_SPLIT_PCT = 0.2
from pylab import rcParams 
rcParams['figure.figsize'] = 8, 6 
plt.rcParams.update({'font.size': 22})
print( " Data split percent: ", DATA_SPLIT_PCT )
print( " Random generator seeds: ", SEED )
print( " Size of figures to be plotted later: ", rcParams['figure.figsize'] )

Lines 33–34: We set the value for two constants. We set SEED to 123, which will be used as the seed for random number generation, and DATA_SPLIT_PCT to 0.2, indicating a 20% data split for train-test separation.
Lines 36–38: We import rcParams from Matplotlib’s pylab module. We set the default figure size for plots to 8 by 6 inches and update the font size for Matplotlib plots.
Lines 40–42: We print informative messages about the data split percentage, random generator seeds, and the size of figures that will be plotted later in the code. This provides insights into the settings being used in the script.

Next, the data is read, and the basic preprocessing steps are performed.

Press + to interact

Python 3.8

df = pd.read_csv("processminer-sheet-break-rare-event-dataset.csv")
df.head(n=5) # visualize the data.
# Convert Categorical column to Dummy
hotencoding1 = pd.get_dummies(df['Grade&Bwt'])
hotencoding1 = hotencoding1.add_prefix('grade_') 
hotencoding2 = pd.get_dummies(df['EventPress']) 
hotencoding2 = hotencoding2.add_prefix('eventpress_')
df = df.drop(['Grade&Bwt', 'EventPress'], axis=1)
df = pd.concat([df, hotencoding1 , hotencoding2], axis =1)
# Rename response column name for ease of understanding
df = df.rename(columns={'SheetBreak': 'y'})
# Shift the response for training the model early prediction.
df = dp.curve_shift(df, shift_by=-2)
# Sort by time and drop the time column.
df['DateTime'] = pd.to_datetime(df.DateTime) 
df = df.sort_values(by='DateTime')
df = df.drop(['DateTime'], axis=1)
print(df)

Getting Started

Rare Event Prediction

Multi-Layer Perceptrons (MLPs)

Long Short-Term Memory (LSTM) Networks

Convolutional Neural Networks (CNNs)

Autoencoders

Conclusion

Preparing and Analyzing Time Series Data with LSTM Models

Imports and data

Temporalizing the data