...

/

Preparing and Analyzing Time Series Data with LSTM Models

Preparing and Analyzing Time Series Data with LSTM Models

Discover how to prepare and analyze time series data with LSTMs, from importing libraries to data temporalization and scaling.

In this lesson, we explore time series analysis using Long Short-Term Memory (LSTM) models. We begin by importing necessary libraries, including TensorFlow for LSTM functionalities and user-defined libraries for data preprocessing and visualization. Our primary focus is on temporalizing the data, a crucial step for LSTM modeling. We’ll explore the significance of temporalization and proceed to split and scale the data, setting the foundation for building robust time series models.


Imports and data

We get started with importing the libraries. LSTM related classes are taken from the TensorFlow library. Also, the user-defined libraries, e.g., datapreprocessing, performancemetrics, and simpleplots, are imported.

Press + to interact
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras import optimizers
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Bidirectional
from tensorflow.python.keras import backend as K
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
# user-defined libraries
import datapreprocessing as dp
import performancemetrics as pm
import simpleplots as sp
from numpy.random import seed
seed (1)
SEED = 123 # used to help randomly select the data points
DATA_SPLIT_PCT = 0.2
from pylab import rcParams
rcParams['figure.figsize'] = 8, 6
plt.rcParams.update({'font.size': 22})
print( " Data split percent: ", DATA_SPLIT_PCT )
print( " Random generator seeds: ", SEED )
print( " Size of figures to be plotted later: ", rcParams['figure.figsize'] )
  • Lines 33–34: We set the value for two constants. We set SEED to 123, which will be used as the seed for random number generation, and DATA_SPLIT_PCT to 0.2, indicating a 20% data split for train-test separation.
  • Lines 36–38: We import rcParams from Matplotlib’s pylab module. We set the default figure size for plots to 8 by 6 inches and update the font size for Matplotlib plots.
  • Lines 40–42: We print informative messages about the data split percentage, random generator seeds, and the size of figures that will be plotted later in the code. This provides insights into the settings being used in the script.

Next, the data is read, and the basic preprocessing steps are performed.

Press + to interact
df = pd.read_csv("processminer-sheet-break-rare-event-dataset.csv")
df.head(n=5) # visualize the data.
# Convert Categorical column to Dummy
hotencoding1 = pd.get_dummies(df['Grade&Bwt'])
hotencoding1 = hotencoding1.add_prefix('grade_')
hotencoding2 = pd.get_dummies(df['EventPress'])
hotencoding2 = hotencoding2.add_prefix('eventpress_')
df = df.drop(['Grade&Bwt', 'EventPress'], axis=1)
df = pd.concat([df, hotencoding1 , hotencoding2], axis =1)
# Rename response column name for ease of understanding
df = df.rename(columns={'SheetBreak': 'y'})
# Shift the response for training the model early prediction.
df = dp.curve_shift(df, shift_by=-2)
# Sort by time and drop the time column.
df['DateTime'] = pd.to_datetime(df.DateTime)
df = df.sort_values(by='DateTime')
df = df.drop(['DateTime'], axis=1)
print(df)

Temporalizing the data

From an LSTM modeling standpoint, a usual two-dimensional input, also referred to as planar data, doesn’t ...