#Time is passing! Climate change will have and already has profound effects on our lives if we don’t take a step. It will seriously impact the diversity in the world. We need to act now!

#I’ll be investigating historical temperature data in this project to better understand the trends and patterns of climate change in the UK. This project aims at conducting a data analysis on climate change by creating data visualisations.

#DATASET #The dataset we’ll be working with includes monthly average temperatures from 1884 through 2022, as well as seasonal and annual averages. #We will obtain insights into how temperatures have changed throughout time and find any notable trends that may be indicative of climate change by studying this data. #We’ll start by loading the dataset into R and doing some exploratory data analysis.

#Cleaning and formatting the data, as well as computing summary statistics to better understand the temperature distribution across time, are all part of this process.

#We will also create graphics to help detect any patterns or trends in the data. #Then, we’ll dig deeper into the data by looking at annual average temperature trends. #We’ll use techniques like linear regression and moving averages to uncover long-term trends in temperature rises.

#We’ll look at seasonal temperature patterns in addition to annual trends. #We may investigate how climate change may be altering weather patterns throughout the year by splitting down the data by season. #This data is critical for comprehending the possible impacts of climate change on agriculture, water resources, and ecosystems.

#Lastly, we’ll bring the research to a close by presenting our findings in a clear and visually appealing manner. #We’ll produce well-designed plots and charts to effectively communicate our findings and help potential recruiters understand the important insights from our investigation.

#LOADING PACKAGES

library(tidyverse)   # for data wrangling
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)   # for working with dates
library(ggplot2)     # for plotting
library(ggthemes)    # for more plotting options
library(gridExtra)   # for arranging plots
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
library(readxl)
library(dplyr)

#LOADING DATASET

# Load the dataset
temperature <- read_xlsx("mean_temp_UK.xlsx")

#DATA EXPLORATION

# Examine the first few rows of the dataset
head(temperature)
## # A tibble: 6 × 18
##    year   jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov   dec
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  1884   5     4.2   5.1   6.2   9.7  12.5  14.5  15.3  13     8.3   4.6   3.3
## 2  1885   2     4.2   3.5   6.6   7.6  12.3  14.5  12.4  10.9   6.2   4.8   3  
## 3  1886   1.1   0.8   2.9   6.3   8.8  11.9  14.2  14.1  12     9.9   5.7   1.2
## 4  1887   2.1   3.5   3.3   5.4   8.7  14.3  15.6  14    10.6   6.4   3.9   2.1
## 5  1888   2.8   1.1   2.1   5.3   9.2  11.8  12.3  12.8  11.2   7.6   6.5   4.3
## 6  1889   3     2.2   3.8   6    11.4  14    13.6  13.4  11.5   7.6   6.1   3.1
## # ℹ 5 more variables: win <dbl>, spr <dbl>, sum <dbl>, aut <dbl>, ann <dbl>
# Check the dimensions of the dataset
dim(temperature) #The dataset contains 139 years of temperature data
## [1] 139  18
# Get the summary statistics for each variable
summary(temperature)
##       year           jan              feb              mar       
##  Min.   :1884   Min.   :-1.900   Min.   :-2.300   Min.   :1.800  
##  1st Qu.:1918   1st Qu.: 2.400   1st Qu.: 2.400   1st Qu.:3.900  
##  Median :1953   Median : 3.400   Median : 3.700   Median :5.100  
##  Mean   :1953   Mean   : 3.189   Mean   : 3.324   Mean   :4.848  
##  3rd Qu.:1988   3rd Qu.: 4.200   3rd Qu.: 4.450   3rd Qu.:5.800  
##  Max.   :2022   Max.   : 6.300   Max.   : 6.800   Max.   :8.000  
##       apr              may              jun             jul       
##  Min.   : 4.300   Min.   : 7.500   Min.   :10.50   Min.   :12.30  
##  1st Qu.: 6.300   1st Qu.: 9.300   1st Qu.:12.30   1st Qu.:13.90  
##  Median : 7.000   Median : 9.900   Median :12.80   Median :14.50  
##  Mean   : 7.022   Mean   : 9.991   Mean   :12.85   Mean   :14.62  
##  3rd Qu.: 7.900   3rd Qu.:10.750   3rd Qu.:13.40   3rd Qu.:15.30  
##  Max.   :10.700   Max.   :12.100   Max.   :14.90   Max.   :17.80  
##       aug             sep             oct              nov       
##  Min.   :11.70   Min.   : 9.90   Min.   : 5.900   Min.   :2.300  
##  1st Qu.:13.70   1st Qu.:11.70   1st Qu.: 8.300   1st Qu.:5.000  
##  Median :14.30   Median :12.40   Median : 9.200   Median :5.700  
##  Mean   :14.39   Mean   :12.33   Mean   : 9.119   Mean   :5.747  
##  3rd Qu.:15.10   3rd Qu.:13.00   3rd Qu.: 9.800   3rd Qu.:6.600  
##  Max.   :17.30   Max.   :15.20   Max.   :12.200   Max.   :8.800  
##       dec             win              spr             sum       
##  Min.   :-0.90   Min.   :-0.290   Min.   :5.420   Min.   :12.23  
##  1st Qu.: 3.00   1st Qu.: 2.755   1st Qu.:6.700   1st Qu.:13.39  
##  Median : 3.90   Median : 3.520   Median :7.160   Median :13.99  
##  Mean   : 3.84   Mean   : 3.459   Mean   :7.289   Mean   :13.97  
##  3rd Qu.: 4.80   3rd Qu.: 4.260   3rd Qu.:7.870   3rd Qu.:14.42  
##  Max.   : 7.90   Max.   : 5.790   Max.   :9.120   Max.   :15.76  
##       aut              ann        
##  Min.   : 6.970   Min.   : 7.020  
##  1st Qu.: 8.595   1st Qu.: 8.075  
##  Median : 9.100   Median : 8.350  
##  Mean   : 9.067   Mean   : 8.468  
##  3rd Qu.: 9.670   3rd Qu.: 8.855  
##  Max.   :11.350   Max.   :10.030
# Check for missing values
colSums(is.na(temperature))
## year  jan  feb  mar  apr  may  jun  jul  aug  sep  oct  nov  dec  win  spr  sum 
##    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
##  aut  ann 
##    0    0
# Calculate the seasonal averages
temperature_v2 <- temperature %>%
  mutate(winter_avg = round(apply(select(., jan, feb, dec), 1, mean, na.rm = TRUE), 2),
         spring_avg = round(apply(select(., mar, apr, may), 1, mean, na.rm = TRUE), 2),
         summer_avg = round(apply(select(., jun, jul, aug), 1, mean, na.rm = TRUE), 2),
         autumn_avg = round(apply(select(., sep, oct, nov), 1, mean, na.rm = TRUE), 2))
# Compute the average temperature for each year
yearly_avg <- temperature_v2 %>%
  select(year, ann)

# Create a line plot to visualize the trend of average temperature over time
ggplot(yearly_avg, aes(x = year, y = ann)) +
  geom_line(color = "black", size = 1) +
  geom_point(color = "blue", size = 2) +
  theme_minimal() +
  labs(title = "Yearly Average Temperature Trend",
       x = "Year",
       y = "Average Temperature (°C)")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Create a line plot to visualize the trend of average temperature over time, with a trend line
ggplot(yearly_avg, aes(x = year, y = ann)) +
  geom_line(color = "black", size = 1) +
  geom_point(color = "blue", size = 2) +
  geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed") +
  theme_minimal() +
  labs(title = "Yearly Average Temperature Trend",
       x = "Year",
       y = "Average Temperature (°C)")
## `geom_smooth()` using formula = 'y ~ x'

#The line plot illustrates the increasing level of global warming.

# Filter first 20 and last 20 summer average values
first_30_summer <- temperature_v2 %>%
  head(30) %>%
  select(year, summer_avg)

last_30_summer <- temperature_v2 %>%
  tail(30) %>%
  select(year, summer_avg)

# Combine the first 20 and last 20 summer average values
summer_comparison <- rbind(first_30_summer %>%
                             mutate(period = "First 30 Years"),
                           last_30_summer %>%
                             mutate(period = "Last 30 Years"))

# Create a bar plot to visualize the comparison and add a linear regression line
ggplot(summer_comparison, aes(x = year, y = summer_avg, fill = period)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_smooth(method = "lm", se = FALSE, color = "blue", linetype = "solid") +
  theme_minimal() +
  labs(title = "Comparison of First 30 and Last 30 Summer Average Temperatures",
       x = "Year",
       y = "Summer Average Temperature",
       fill = "Period")
## `geom_smooth()` using formula = 'y ~ x'

#The regression line indicates a rising trend in the last 30 years, in contrast to the first 30-year period, which displayed a more linear progression. Furthermore, a significant increase in temperature levels is also evident.

# Reshape the data to a long format with only monthly temperature columns
monthly_data_long <- temperature_v2 %>%
  select(year, jan:dec) %>%
  gather(month, temperature, -year)

# Calculate the overall average temperature for each month
monthly_averages <- monthly_data_long %>%
  group_by(month) %>%
  summarise(average_temp = mean(temperature, na.rm = TRUE))

# Identify the warmest and coldest months
warmest_month <- monthly_averages[which.max(monthly_averages$average_temp), "month"]
coldest_month <- monthly_averages[which.min(monthly_averages$average_temp), "month"]

# Filter the data to include only the warmest and coldest months
warmest_coldest_data <- monthly_data_long %>%
  filter(month %in% c(warmest_month, coldest_month))

# Create box plots for the warmest and coldest months
ggplot(warmest_coldest_data, aes(x = month, y = temperature, fill = month)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Temperature Distribution for Warmest and Coldest Months",
       x = "Month",
       y = "Temperature",
       fill = "Month") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_manual(values = c("darkorange", "dodgerblue"))

# Calculate monthly averages
monthly_avg <- colMeans(temperature_v2[, c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")], na.rm = TRUE)

# Create a data frame for plotting
monthly_avg_df <- data.frame(month = names(monthly_avg), average = monthly_avg)

# Bar plot for monthly averages
ggplot(monthly_avg_df, aes(x = month, y = average)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  theme_minimal() +
  labs(title = "Monthly Average Temperatures", x = "Month", y = "Average Temperature") +
  theme(plot.title = element_text(hjust = 0.5))

# Bar plot for monthly averages with values
ggplot(monthly_avg_df, aes(x = month, y = average)) +
  geom_bar(stat = "identity", fill = "blue") +
  geom_text(aes(label = round(average, 2)), vjust = -0.5, size = 3.5) +
  theme_minimal() +
  labs(title = "Monthly Average Temperatures", x = "Month", y = "Average Temperature") +
  theme(plot.title = element_text(hjust = 0.5))

# Plot the annual average temperatures
ggplot(temperature, aes(x = year, y = ann)) +
  geom_line() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Annual Average Temperatures (1884-2021)",
       x = "Year",
       y = "Temperature (°C)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

#CALCULATING SEASONAL AVERAGES

# Select the specified columns from the temperature dataset
seasonal_data <- temperature %>%
  select(year, win, spr, sum, aut, ann)

# Rename the columns
seasonal_data <- seasonal_data %>%
  rename(winter = win,
         spring = spr,
         summer = sum,
         autumn = aut,
         annual = ann)

# Merge the seasonal data with the original dataset
temperature_v2 <- left_join(temperature_v2, seasonal_data, by = "year")
# Create a dataset with average temperature for each month
monthly_avg <- temperature_v2 %>%
  summarise_at(vars(jan:dec), mean, na.rm = TRUE) %>%
  gather(key = "month", value = "temperature") %>%
  mutate(month = factor(month, levels = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")))

# Create a season variable
monthly_avg$season <- case_when(
  monthly_avg$month %in% c("dec", "jan", "feb") ~ "winter",
  monthly_avg$month %in% c("mar", "apr", "may") ~ "spring",
  monthly_avg$month %in% c("jun", "jul", "aug") ~ "summer",
  monthly_avg$month %in% c("sep", "oct", "nov") ~ "autumn"
)

# Preview the dataset
head(monthly_avg)
## # A tibble: 6 × 3
##   month temperature season
##   <fct>       <dbl> <chr> 
## 1 jan          3.19 winter
## 2 feb          3.32 winter
## 3 mar          4.85 spring
## 4 apr          7.02 spring
## 5 may          9.99 spring
## 6 jun         12.9  summer
# Calculate seasonal averages for each year
seasonal_avg_by_year <- temperature_v2 %>%
  gather(key = "season", value = "temperature", c("winter_avg", "spring_avg", "summer_avg", "autumn_avg")) %>%
  group_by(year, season) %>%
  summarise(avg_temperature = mean(temperature, na.rm = TRUE))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# Create a line plot to visualize seasonal averages by year
ggplot(seasonal_avg_by_year, aes(x = year, y = avg_temperature, color = season, group = season)) +
  geom_line(size = 1.1) +
  theme_minimal() +
  labs(title = "Seasonal Average Temperature by Year",
       x = "Year",
       y = "Average Temperature",
       color = "Season")

#For each season, an upward trend in the seasonal average can be observed.

#Moving Average

# Load the zoo package
#install.packages("zoo")
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# Calculate moving averages
temperature_v2$ma_5_year <- rollmean(temperature_v2$annual, k = 5, fill = NA, align = "right")
temperature_v2$ma_10_year <- rollmean(temperature_v2$annual, k = 10, fill = NA, align = "right")

# Reshape the data to a long format
moving_avg_long <- temperature_v2 %>%
  select(year, ma_5_year, ma_10_year) %>%
  gather(key = "moving_avg", value = "temperature", c("ma_5_year", "ma_10_year"))

# Create a line plot to visualize moving averages
ggplot(moving_avg_long, aes(x = year, y = temperature, color = moving_avg, group = moving_avg)) +
  geom_line(size = 1) +
  theme_minimal() +
  labs(title = "Moving Averages of Annual Average Temperatures",
       x = "Year",
       y = "Average Temperature",
       color = "Moving Average") +
  scale_color_discrete(labels = c("5-Year", "10-Year"))
## Warning: Removed 13 rows containing missing values (`geom_line()`).

#Moving averages indicate a general upward trend, demonstrating that average temperatures have risen over time. This finding is compatible with existing understanding of global warming and climate change. Compared to the 5-year moving average, the 10-year moving average offers a clearer picture of long-term temperature patterns because it collects more data points and is less subject to short-term variations. Briefly, our moving average plot supports the idea of rising temperatures over time, especially over the last 50 years.

#HIGHEST AND LOWEST AVERAGE TEMPERATURES

# Identify the years with the highest and lowest annual average temperatures
max_year <- temperature_v2[which.max(temperature_v2$ann), c("year", "ann")]
min_year <- temperature_v2[which.min(temperature_v2$ann), c("year", "ann")]

# Create a bar plot to visualize the highest and lowest annual average temperatures
ggplot(rbind(max_year, min_year), aes(x = factor(year), y = ann, fill = factor(year))) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Years with the Highest and Lowest Annual Average Temperatures",
       x = "Year",
       y = "Annual Average Temperature",
       fill = "Year") +
  scale_fill_discrete(name = "Year") +
  geom_text(aes(label = paste(year, ann, sep = ": ")), 
            position = position_stack(vjust = 0.5), 
            color = "white", size = 3)

#CONCLUSION #In this exploratory data analysis, I analyzed annual and seasonal temperature data collected from the UK Met Office. By applying various statistical analyses and visualizations, I identified trends and patterns in the temperature data. The analysis showed a noticeable increase in the annual average temperature over the years, as well as the years with the highest and lowest annual average temperatures in 1892 and 2022, respectively. Additionally, I explored the temperature distribution by season. With further investigation, we can gain more insights and knowledge about the UK’s climatic patterns. In conclusion, it is crucial to address the negative impacts of climate change and take steps towards mitigating them.