Analyzing smartwatch data using Python takes multiple steps. It includes loading the dataset, cleaning the dataset, exploring the dataset, visualizing the data, and getting useful insights from the data.
Libraries to use: For this data analysis task, we’ll use matplotlib and Plotly Graph Objects.
Dataset: The dataset that we will be using in this analysis task can be downloaded from
Let’s start the analysis task by importing the required libraries or modules.
To import the libraries, follow the code given below:
import pandas as pdimport seaborn as snsimport numpy as npimport matplotlib.pyplot as plt # import required librariesimport plotly.express as pximport plotly.graph_objects as go
To import the dataset, follow the code given below:
df = pd.read_csv("/content/dailyActivity_merged.csv")
Then, print the records:
df.head()
See which columns have null values and drop the null values:
Columns_with_null = df.isnull().sum()print(Columns_with_null)
Change the data type of the column:
df["TotalDistance"] = df["TotalDistance"].astype('int64')print(df.info())
Sum all of the minutes in a column Total_Minutes
and convert the minutes into hrs:
df["Total_Minutes"] = df["VeryActiveMinutes"] + df['FairlyActiveMinutes'] + df["LightlyActiveMinutes"] + df["SedentaryMinutes"]print(df.info())df["Total_Hours"] = df["Total_Minutes"]/60print(df.head())
Change the ActivityDate
from object to datetime
:
df['ActivityDate'] = pd.to_datetime(df['ActivityDate'] , format='%m/%d/%Y')
Create a pie chart to see the distribution of active and inactive minutes during the day:
labels = ['Very Active Minutes', 'Fairly Active Minutes', 'Lightly Active Minutes', 'Inactive Minutes']counts = df[['VeryActiveMinutes', 'FairlyActiveMinutes', 'LightlyActiveMinutes', 'SedentaryMinutes']].max()colors = ['red','green', "pink", "blue"]fig = go.Figure(data=[go.Pie(labels=labels, values=counts)])fig.update_layout(width = 500, height = 400,paper_bgcolor="white", autosize=False, showlegend=True)fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=15,marker=dict(colors=colors, line=dict(color='black', width=1)))fig.show()
Add the day of the week to the dataset:
df["Day"] = df["ActivityDate"].dt.day_name()print(df)
See the days of the week with highly active minutes and fairly active minutes:
fig = go.Figure()fig.add_trace(go.Bar(x= df['Day'],y= df['VeryActiveMinutes'],name= 'Very Active',marker_color = 'red'))fig.add_trace(go.Bar(x= df['Day'],y= df['FairlyActiveMinutes'],name= 'Fairly Active',marker_color = 'blue'))fig.update_layout(barmode='group', xaxis_tickangle=-45)fig.show()
Count the number of steps covered in each day:
day = df["Day"].value_counts()label = day.indexcounts = df["TotalSteps"]colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]fig = go.Figure(data=[go.Pie(labels=label, values=counts)])fig.update_layout(width = 500, height = 400, paper_bgcolor="white", autosize=False, showlegend=True)fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=15,marker=dict(colors=colors, line=dict(color='black', width=1)))fig.show()
Find how many calories were burned in a day:
calorie_count = df["Day"].value_counts()label = calorie_count.indexcounts = calorie_count.valuescolors = ['blue', 'green', 'pink', 'purple', 'skyblue', 'orange', 'brown']fig = go.Figure(data=[go.Bar(x=label, y=counts, marker_color=colors)])fig.update_layout(width = 500, height = 400, paper_bgcolor="white", autosize=False, showlegend=True, title = "Calorie count per day", xaxis_title='Day', yaxis_title='Calories')fig.show()
Create a pie chart to see the total distance covered each day in integers:
distance_covered = df["Day"].value_counts()labels = distance_covered.indexcounts = df["TotalDistance"]color = ['blue', 'green', 'pink', 'purple', 'skyblue', 'orange', 'brown']fig = go.Figure(data=[go.Pie(labels=labels, values=counts, marker_colors= color)])fig.update_layout(width = 500, height = 400, paper_bgcolor="white", autosize=False, showlegend=True, title ='Distance covered each day')fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=15,marker=dict(line=dict(color='black', width=1)))fig.show()
After reviewing the dataset, Tuesday emerges as a particularly active day for individuals, displaying the highest calorie burn compared to other weekdays. However, it’s noteworthy that despite this heightened activity level, the total distance covered on Tuesdays appears comparatively lower.
This incongruity might be attributed to potential inaccuracies in the smartwatch’s positioning data. Supplementary data on the precision of smartwatch recordings could provide additional insights.
Among the days analyzed, Sunday emerges as the least active for individuals, evidenced by the lowest calorie burn and minimal step count. Interestingly, while Sunday registers low activity levels, it doesn’t consistently demonstrate the lowest total distance covered. This observation underscores the importance of scrutinizing the accuracy of smartwatch data recording mechanisms.
Click the “Run” button and then click the link provided under it to open the Jupyter Notebook.
Please note that the notebook cells have been pre-configured to display the outputs for your convenience and to facilitate an understanding of the concepts covered. You are encouraged to actively engage with the material by changing the variable values.
Free Resources