...
/Preparing and Analyzing Time Series Data with LSTM Models
Preparing and Analyzing Time Series Data with LSTM Models
Discover how to prepare and analyze time series data with LSTMs, from importing libraries to data temporalization and scaling.
We'll cover the following...
In this lesson, we explore time series analysis using Long Short-Term Memory (LSTM) models. We begin by importing necessary libraries, including TensorFlow for LSTM functionalities and user-defined libraries for data preprocessing and visualization. Our primary focus is on temporalizing the data, a crucial step for LSTM modeling. We’ll explore the significance of temporalization and proceed to split and scale the data, setting the foundation for building robust time series models.
Imports and data
We get started with importing the libraries. LSTM related classes are taken from the TensorFlow library. Also, the user-defined libraries, e.g., datapreprocessing
, performancemetrics
, and simpleplots
, are imported.
import pandas as pdimport numpy as npimport tensorflow as tffrom tensorflow.keras import optimizersfrom tensorflow.keras.models import Modelfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Inputfrom tensorflow.keras.layers import Densefrom tensorflow.keras.layers import Dropoutfrom tensorflow.keras.layers import LSTMfrom tensorflow.keras.layers import Flattenfrom tensorflow.keras.layers import Bidirectionalfrom tensorflow.python.keras import backend as Kfrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import train_test_splitimport matplotlib.pyplot as pltimport seaborn as sns# user-defined librariesimport datapreprocessing as dpimport performancemetrics as pmimport simpleplots as spfrom numpy.random import seedseed (1)SEED = 123 # used to help randomly select the data pointsDATA_SPLIT_PCT = 0.2from pylab import rcParamsrcParams['figure.figsize'] = 8, 6plt.rcParams.update({'font.size': 22})print( " Data split percent: ", DATA_SPLIT_PCT )print( " Random generator seeds: ", SEED )print( " Size of figures to be plotted later: ", rcParams['figure.figsize'] )
- Lines 33–34: We set the value for two constants. We set
SEED
to123
, which will be used as the seed for random number generation, andDATA_SPLIT_PCT
to0.2
, indicating a 20% data split for train-test separation. - Lines 36–38: We import
rcParams
from Matplotlib’spylab
module. We set the default figure size for plots to8
by6
inches and update the font size for Matplotlib plots. - Lines 40–42: We print informative messages about the data split percentage, random generator seeds, and the size of figures that will be plotted later in the code. This provides insights into the settings being used in the script.
Next, the data is read, and the basic preprocessing steps are performed.
df = pd.read_csv("processminer-sheet-break-rare-event-dataset.csv")df.head(n=5) # visualize the data.# Convert Categorical column to Dummyhotencoding1 = pd.get_dummies(df['Grade&Bwt'])hotencoding1 = hotencoding1.add_prefix('grade_')hotencoding2 = pd.get_dummies(df['EventPress'])hotencoding2 = hotencoding2.add_prefix('eventpress_')df = df.drop(['Grade&Bwt', 'EventPress'], axis=1)df = pd.concat([df, hotencoding1 , hotencoding2], axis =1)# Rename response column name for ease of understandingdf = df.rename(columns={'SheetBreak': 'y'})# Shift the response for training the model early prediction.df = dp.curve_shift(df, shift_by=-2)# Sort by time and drop the time column.df['DateTime'] = pd.to_datetime(df.DateTime)df = df.sort_values(by='DateTime')df = df.drop(['DateTime'], axis=1)print(df)
Temporalizing the data
From an LSTM modeling standpoint, a usual two-dimensional input, also referred to as planar data, doesn’t ...